MS SQL Server Page Storage

So MS SQL Server stores all data in rows that fit onto 8kb pages.
There are many different types of pages in MS SQL Server but for this blog we are mainly concerned with three, these are:
1. Data Pages
2. Text/Image Pages, sometimes referred to as LOB pages.
3. Index Pages

The names are fairly self-explanatory but for completeness sake, I will list out the most common data types stored by each.

  • Data Pages:
    • All Integer types
    • All money types
    • numeric/decimal
    • bit
    • float/real
    • All date and time types
    • char(n)/nchar(n)
    • varchar(n)/nvarchar(n)
    • binary
    • varbinary(n)
  • Text/Image Pages:
    • Text/nText
    • Image
    • Xml

Now this is where things get interesting, Varchar(max) and Nvarchar(max) where do you think these are stored?

  1. So by default Varchar(max) columns are stored inline which means they will be stored on data pages.
  2. But that is only up to a max of 8k bytes after which the data will be moved from data pages into text pages, the reason for the 8k limit, well because a page is 8k and no row can span multiple pages.
  3. If the row size exceeds 8k, then again SQL Server will look to move some columns to out of row pages and it will pick the biggest column to move first so again varchar(max) columns in this instance will have their data moved from a data page to
    a text page.
  4. In list item one, I said by default varchar(max) columns are stored inline, but if you want you can set these to be always stored out of row in LOB pages.
    The stored procedure sp_tableoption allows you to control the in-row behaviour of tables that have LOB data types.
E.g.
exec sp_tableoption 'dbo.TableA', 'large value types out of row',1;

But why would you ever want to store a column out of row? Lets read on to find out.


So what is the reason why SQL Server uses different pages?

Well it all comes down to storage efficiency. Data pages are optimized to store several rows compacted together where as LOB data pages are optimized for storing large data types.

MS SQL Server stores pages of the same type together to achieve optimal storage and to help it to organize the pages it uses “Allocation Units”.

An Allocation Unit is nothing more than a way for SQL Server to split different types of pages into their different groups.

There are 3 type of allocation units that MS SQL Server uses:

  1. IN_ROW_DATA – These contain all of Data and Index pages that are stored in-row.
  2. ROW_OVERFLOW_DATA – These are also made up of Data and Index pages but are used to store variable data types for columns on rows that have grown too big to fit on one page.
  3. LOB_DATA – These are the name suggests stored LOB Data pages.

Column values that are stored in the last two allocation units (ROW_OVERFLOW_DATA & LOB_DATA) are replaced in-row with a 16 byte pointer to the data pages in these allocation units that store the actual value.

Some of the advantages of splitting out the different types of pages and not storing them together are the following:
  1. By having all of the Data pages stored together, this can reduce the amount of fragmentation that might otherwise exist if combined with LOB pages. So the data pages can be all squeezed in together.
  2. By having all of the frequently accessed pages stored together, it allows SQL Server to take advantage of its page read-a-head technology meaning pages are brought into memory before they are explicitly requested, thereby giving faster response times for the user.
  3. It also means that for the most part memory can be taken up with the more frequently accessed data that would be found in Data and Index pages and not being wasted on LOB pages.
  4. Compression can be applied on the Data and Index pages, which are much more likely to contain values that will compress well.

An important point to note here is the following: MS SQL Server is designed to work efficiently with small data types and it assumes that these data types will be accessed more frequently. Now SQL Server is nothing if not obliging, so it will also allow you to store large data types but it does not want this to be at the cost of storage efficiency, and so these data types are stored off-row, i.e. rarely accessed SQL Server is hoping.


So this brings us back to that original question, why would you want to store varchar(max) out-of-row intentionally?

Lets look at an example, so lets say we had a table called “MyFirstTable” and it had the following schema:

create table dbo.MyFirstTable(col1 int not null primary key, col2 int, col3 int, somewidecol varchar(max));

Now say you ran the following query:

select top 2000 col1, col2 from dbo.MyFirstTable;

Lets assume that the the column somewidecol, is 7500 bytes in size, very large but still small enough to fit on one page and so is stored in-row.

So because SQL Server uses row-based storage then even though we only want two of the four columns in that table, we get all four brought into memory, well because the entire row is stored as one unit on a page and SQL Server reads pages from disk. So because we want to read the top 2000 rows, and each row takes up its own page at a minimum we need to have 2k data pages belonged to that table in memory. This is requires a lot of I/O reading the data in from disk and also is taking up valuable space in memory with data that we didn’t need.

Ok so yes SQL Server is row based, so we will always pay some price, i.e. there is nothing we can do with the integer column col3, but what if we stored the LOB column “somewidecol” out of row. Now our rows become tiny, i.e. each row is the size only of 3 integer values, plus the pointer to the LOB value, that’s a row size of approximately 28 bytes compared to the 7512 bytes that our row had previously been. So now instead of having to read in 2k pages, we only have to read in 8 pages. That is some saving.

The great Jeff Moden is the master supreme when it comes to Index Maintenance and he is a believer that in nearly all cases, all LOB data types should be stored off row and even goes a further step in that when creating columns of type LOB that are going to be stored off-row give them a default 16 byte space so that when the 16 byte pointer does get added it doesn’t cause a bad page split. E.g. You have 10 rows fitting snugly on a page, you update one of the rows with an LOB value, you think that this is not going to cause any page splits for your in-row data as its being stored out-of-row but wait, a bad page split occurs as you do not have room on the page for the 16 byte pointer. Bad!

So to summarize todays blog, the key take away points are the following:

  1. MS SQL Server stored LOB pages very differently to Data and Index Pages
  2. Use LOB data types sparingly in SQL Server
  3. In many cases it makes sense to force VARCHAR(MAX) & NVARCHAR(MAX) to be stored off-row.

So I hope you enjoyed todays blog, until next time, Slán!

Published by sqlrebel

Data Engineer.

2 thoughts on “MS SQL Server Page Storage

  1. I believe the mighty Jeff also suggests making sure your MAX cols are also set to NOT NULL and given a default (empty string) to avoid that nasty update causing a page split issue.

    Liked by 1 person

Leave a comment