So for this months blog we are going to delve into adding a new EBS Volume to an EC2 instance and although this is a reasonably simple operation, we will drill down into some of the options you have and how these can impact performance long-term.
We will look at the following:
- GPT v MBR
- IO Block Size is not the same as NTFS Allocation Size
- Should you format your drive to 64KB
- Adding a Volume
I will use the terms volume/drive/disk interchangeably throughout this blog.
Ok, lets get into this and firstly look at GPT v MBR. GPT (Guid Partition Table) and MBR (Master Boot Record) these are two partitioning schemes that are used for hard drives. We’ll keep this at a fairly high level and summarize that GPT is probably the way to go when you adding an EBS Volume. MBR has a size limit of 2TB, where as GPT does not, also GPT gives you some extra protection against corruption as it writes info into several drive areas and includes a secondary backup table.
One of the biggest myths in the data world, is the misconception that I/O Block size and Allocation Unit size are the same thing. This is just not true. You may often have read the SQL Server does I/O operations in (8x8kb) (An extent), and this is where the recommendation for an allocation unit size of 64kb comes from. In reality SQL Server I/O can be of varying sizes depending on what operation is being performed.
Lets take Transaction Log Writes as an example, these need to be hardened to disk every time a commit is issued, so these writes might only be 512 or 4096 bytes in size, depending on the underlying storage. (Transaction log writes are sector-aligned in size, more on this later). Even with the use of the log buffer, where transaction writes are kept in cache and flushed when it becomes full (if not committed), log flushes can typically be up to 60kb in size, but a WAL protocol may also cause the log buffer to flush. So in short Transaction Log writes can be anything from 512 bytes to 60kb in size.
Other examples of SQL Server I/O operations that would not conform to a size of 64KB would be :
- Bulk Loads
- Backup and Restore Operations, will depend on where you are backing up to
- Checkpoint
- ColumnStore
- File Initialization
Ok then, is it worth re-formatting a drive that is to be used by SQL Server to 64KB, what exactly is this doing for us?
So a quick recap, disks are made up of a series of tracks, these are too large to be a single unit of storage and so these are broken down further into what are called sectors. So sectors are the smallest unit of storage that can be on a drive, and they can vary in size from 512 bytes to 4kb, depending on the disk. Remember we said Transaction Log IO’s are aligned to the sector size on the disk.
A Cluster or Allocation Unit, is a group of sectors, typically between 4 to 8.
So lets run over that again. A Sector is the smallest unit of storage that can be written and an Allocation Unit (Cluster) is the smallest unit of storage a file can take up.
On windows drives on your laptop, they are typically formatted to 4kb in size, this makes sense as you would likely have many small files, and so you don’t want to have loads of wasted space. However on a drive that is storing database files, these are generally very large it makes sense to have a larger allocation unit size, for SQL Server that is recommended to be 64kb.
So how does having a bigger Allocation Unit Size help then?
The bigger the AU, the less metadata that you need to store, remember to read a file, you must firstly read in all the metadata, this will give you information as to where to find the physical location of sectors, but if you have large files and a small AU, then you will have a lot more metadata that needs to be read in, thereby increasing the amount of I/O that is required.
So really once a file has a growth, the work of the allocation unit is done, during a physical read it is the block size that determines the impact on I/O.
Useful command for examining the details of your drive, such as formatting size, sector size…..
fsutil fsinfo ntfsinfo <drive>:
Right, that’s all great to know, now how do we actually attach an EBS volume to an EC2 instance?
- EC2 Console – Volumes – Create Volume – Select all your desired settings
- Volume State = ‘Available’ – this means it has not yet been attached to an EC2 instance
- Actions – Attach Volume – Select EC2 instance you want to attach to
- Volume State = ‘In-use’, happy days, but we are not done yet, we need to make the new drive available to the Windows System.
- RDP to EC2 – Disk Management – Right Click on Disk and bring online – Right Click and Initialize Disk
- MBR or GPT – OK – Disk will now be online
So what’s left, we must specify the size of the disk and if we want to format.
- Right Click – New Simple Volume – Specify disk size – Assign drive letter
- Next – Allocation Unit Size – Change from default (4kb) and select 64kb for drive hosting data files – this may take a few mins and then you’ll be done 🙂
I hope this month’s blog post gives you a little better understanding of how the underlying storage volumes work and how the type and formatting used can impact IO performance on your SQL Server Instance.
As always, if you liked this blog please follow me on Twitter to be kept up-to-date with my latest blog releases and more fun stuff.
Until next time, Slán!
Disclaimer: Never just believe everything you read in a post, test this out for yourself, we all can make mistakes! Happy coding.
