How PostgreSQL handles out of row storage

The PostgreSQL storage engines uses a fixed page size of 8kb and it does not allow tuples to span multiple pages. So how do they handle large values? Lets read on to find out.

So as we mentioned PostgreSQL does not allow a row to span multiple pages, so to be able to handle large values, PostgreSQL performs a technique known as TOAST (The Oversized-Attribute Storage Technique). The basic idea behind this is that large values are compressed and stored off-row and broken up into multiple physical rows. We’ll get into the details of when TOASTing kicks in but first lets look at where exactly this off-row data is stored.

When you create a table that has any variable length columns immediately in the background a corresponding TOAST table is also created that is linked to the original table. This will be used to store any values that are too big to store in-row in the main table, so this is an important point, even if the TOAST table is never needed it will be created anyways.

If a column on a tuple becomes TOASTED the physical value (the rather large data) is moved to the TOAST table and replaced with a pointer. The actual TOAST table is made up of a unique ‘chunk_id’ per value and each ‘chunk_seq’ to keep track of all of the chunks that make up the value and then the actual chunk data.

Little more on the pointer that is now stored in-row, this is an 18 byte pointer that contains the:

  • OID of the associated TOAST table
  • OID of the Chunk_id
  • Info on the compression used if any

To allow for faster access a unique b-tree index is automatically created on ‘chunk_id’ and ‘chunk_seq’. This is important to keep this in mind later on when we are calculating the cost of reading for TOAST and all of the extra page reads which are necessary.

When you create a table in Postgres, for each column you can specify where should the value be stored (i.e. Should it be inline or stored in a TOAST table) with the ‘SET STORAGE’ property on the CREATE TABLE or ALTER TABLE commands. There are 4 possible values:

  • PLAIN – value is kept in-line and uncompressed. Is used for fixed length columns.
  • MAIN – value is kept in-line, and data is compressed. Anything over 8kb will be truncated.
  • EXTERNAL – The data is stored in the TOAST table if tuple size >= TOAST_TUPLE_TARGET. The data is uncompressed. Setting an attribute to this is recommended for bytea columns which cannot be compressed anymore and attempting to do is a waste of cpu resources. Also for TEXT and Bytea columns on which substring operations will be performed this increases performance as it means only specific chunks need to return rather than deTOASTing the entire text.
  • EXTENDED – (Default) The data is stored in the TOAST table if tuple size >= TOAST_TUPLE_TARGET and the data is compressed.

Don’t worry about what size a tuple needs to be for TOASTing to kick in, we’ll get back to that, lets dig a little deeper into what happens during the TOASTing process itself.

When a tuple becomes eligible for TOASTing, Postgres performs what is known as passes, these are attempts to reduce the size of the tuple. Postgres will carry out 4 passes in total.

  1. It looks at all attributes which have the ‘Extended’ property (i.e. variable length cols) and it will compress the value of the attribute, if value size still greater than 2kb, then begin to TOAST these values. It will then look at attributes which have the ‘External’ property and begin to TOAST them if they are > 2kb in size.
  2. If tuple size is still > 2kb, then any attributes which have ‘External’ or ‘Extended’ regardless of size are TOASTed.
  3. If tuple size is still > 2kb, then any attributes which have the ‘MAIN’ property are compressed.

Ok, all good so far, but didn’t I mention something about compression, lets look some more into that.

The compression used by Postgres is ‘PGLZ’ which is a member of the LZ compression family algorithm. LZ when compared to something like ‘GZIP’ which is used in MS SQL Server gives you a much lower compression ratio however it is also much quicker. In Postgres 14 there is a new compression option available called LZ4 (Lossless compression algorithm), which claims to provide even faster compression than ‘PGLZ’.

  • If you are on Postgres 14 (lucky you), its worth testing out how much quicker the LZ4 compression algorithm is, here are some useful commands to get you started.
SHOW default_toast_compression; --The default is pglz
SET default_toast_compression TO lz4; --Only available on PG14 and up
--Alternatively you can specify compression algorithm on an attribute
ALTER TABLE mytable ALTER COLUMN mycol SET COMPRESSION LZ4;

Ok, lets summarize what we know so far.

  • When you create a table with a variable length column, Postgres will automatically create a corresponding TOAST table linked to this behind the scenes.
  • By default the storage value of each variable length column in your table will be ‘Extended’ which basically means Postgres will try to store the data in-row but if the tuple exceeds 2kb, it will first compress the data and if it is still too big then it will look to store some columns out-of-row in the TOAST table and inline will be replaced with a 18 byte pointer.
  • These columns that have been TOASTed will be stored in 2kb tuples in the TOAST table, where by large values that exceed 2KB are split into multiple 2kb tuples (chunks) that could potentially span multiple pages of the TOAST table. Hence the need for a chunk_seq to be associated with a chunk_id.
  • Oh and the max amount of text that can be TOASTed, stored off-row in PostgreSQL is 1GB.

So we have seen how TOASTing works but it has can have some serious performance implications.

  1. Lets think about the impact of reading a value once it has become toasted. Lets take a simple table called dbo.MyTable(id int, somevalue as text); before toasting to read the column somevalue for one particular tuple would mean reading one page. Post TOASTing, the data is now stored off-row, so now you still read the page in the table that the tuple is on, you then take the TOAST pointer and traverse a B-tree index to find the corresponding page in the TOAST table. The minimum a B-tree index can be is 2 levels, so that is two extra page reads at least and that is just the index, you must still read the page or pages where the value now resides. So we have gone from 1 page read to 4, while this might not seem that much, imagine you are reading 1 million rows, then the difference between reading 1 page or 4 becomes huge.
  2. I want to make a special mention here on the impact that Toasting can have on JSONB. The JSONB data type in postgres stores a binary representation of a JSONB document, which in itself is its own object and not an atomic value. If you were to reference two different keys from the same jsonb (document) column, then this would mean that the entire jsonb column would be de-TOASTED twice, not once. E.g. SELECT jsoncontent -> ‘namekey’, jsoncontent -> ‘agekey’ from dbo.someothertable, then even though you are accessing the same column, jsoncontent twice in the select statement, it will still go through the de-TOASTing process twice.
  3. You also have the additional overhead of decompressing the data. Now this could also happen in-row, but again this is an additional cost.
  4. So is TOAST all bad for performance? Well not if you don’t reference the column that has been TOASTed, in that case the value is left untouched and is not de-toasted, so for example, if you had a table whereby a column in it had been TOASTed but this was a column that was not accessed frequently, then this could improve your query performance as the original table will be much smaller and so more pages of it will fit into memory and since we rarely access the column who’s value is stored in the TOAST table we don’t have to worry about paying the cost of de-toasting.
  5. Database bloat can become an issue, let me explain why. We know that postgres uses MVCC, so when a row is updated it is not actually deleted but rather a new version of the row with the updated value is created. If you have a table that has a high update rate on columns that are TOASTED, you could see database bloat as these large values could cause the database to grow as multiple versions of the row could exist until such time as the row becomes obsolete (no longer visible to any active transactions) and vacuum runs to mark the space as re-usable but that space will never be returned back to the operating system. While this is not a problem directly with TOAST, it is a problem with doing high number of updates on large values in Postgres.

Some useful ad-hoc points also worth remembering:

  • In Aurora Postgres you cannot view the associated TOAST tables.
  • Compression will not be performed on an attribute if it estimates that the size reduction is < 25%, as it deems this a waste of CPU resources, when the PGLZ compression algorithm is used, but if the LZ4 algorithm is uses then as long as the compressed value is smaller than the original, compression will be maintained.
  • During an Update operation, values of unchanged fields are preserved as is, so an UPDATE on a row that has out-of-row values will not incur the cost of updating the TOAST (again provided none of these TOASTed values were changed.).

Three important values to remember here:

  • TOAST_TUPLE_THRESHOLD – The TOASTing process is triggered only when a row value to be stored in a table is wider than this value in bytes (usually this is 2kb).
  • TOAST_TUPLE_TARGET – When the TOASTing process kicks in because a value is higer than the threshold specified in the TOAST_TUPLE_THRESHOLD we have just covered, then that value will be compressed and moved off-row until such time as the entire row is shorter than this ‘TOAST_TUPLE_TARGET’, (and again this is usually 2kb). This value can be adjusted for each table with the below command:
    • ALTER TABLE <tablename> SET (toast_tuple_target = N)
    • The allowed values for this threshold are between 128 and 8160 bytes.
    • It can be sometimes be a useful performance boost to reduce the size of this threshold, doing so pushes more data into TOAST, making our table much smaller and less I/O required to read it. But again this would only be useful if the values you were pushing into TOAST were not often read.
  • TOAST_MAX_CHUNK_SIZE – This is the size at which TOASTED values are split up into when stored out of row. The default is 2kb, allowing for 4 rows per page.
  • So to finish up with today’s blog I hope you have a better understanding of how data in PostgreSQL is now stored off-row. The performance impact of TOAST on jsonb is for me a major hurdle that PostgresSQL needs to overcome, JSONB is such a great data type and all of the bells and whistles that come with it, such as indexing and pre-parsing, but the performance of accessing a jsonb value once it becomes toasted just increases linearly regardless of the nesting level. This is something the PostgreSQL community is very aware of and working hard at finding possible solutions. For some additional information on where TOAST and JSONB is heading in the future, check out this video (JSONB Perfomance) by the great Oleg Bartunov. That’s all folks.

Oh, before we go here are some useful queries for determining TOAST table and attribute sizes.

--Get size of table and associated Toast Table
select 
    pg_size_pretty(pg_relation_size(oid)) as table_size,
    pg_size_pretty(pg_relation_size(reltoastrelid)) as toast_size
FROM
    pg_class
where 
    relname = 'mytable'

--Get the size of the column
select 
    pg_column_size(mylargecol) --No. of bytes, possibly compressed
    ,octet_length(mylargecol) --Total no. of bytes 
from
    dbo.mytable;

Until next time, Slán!

DisclaimerNever just believe everything you read in a post, test this out for yourself, we all can make mistakes! Happy coding.

Published by sqlrebel

Data Engineer.

Leave a comment