CTE’s in PostgreSQL

So today’s topic is Common Table Expressions in PostgreSQL and how they can differ to CTE’s used in MS SQL Server depending on how you set these up.

So I’m a PostgreSQL newbie, coming over from the world of MS SQL Server, the first major version of PostgreSQL I worked on being V11. When coming to terms with the many difference’s between the two databases one that stuck out was CTE’s. In MS SQL Server creating a CTE is a logical construct, where possible the query within the CTE will be inlined in the outer query and so for all intents and purposes its treated as just a sub query allowing predicate pushdown and all that good stuff. What it does give you is a neater way to construct your query. In PostgreSQL v11 a CTE behaves very differently, the query within is ring-fenced off from the main query, in essence what is happening is the query within the CTE is executed 1st, results stored in memory and then execution of the outer query begins with the results of the CTE being a “black box” to the optimizer. i.e. So no query in-lining or predicate pushdown can happen. This is not necessarily a bad thing just need to be aware of how different it is to the behavior you see in MS SQL Server.

Anyways enough about the past, as from Postgres V12, the default behavior for a CTE is to be in-lined into the outer query.

So the current major version of PostgreSQL is V15, so why blog about this now. Well we have an interesting scenario during the week where by a function that was being executed against a read-replica instance was failing, when we investigated this issue it was due to a temp table being used in the function and of course on a read-replica no writes can be done against the instance and yes this does include temporary tables. So then why not use a Materialized CTE.

So with a MATERIALIZED CTE, the query is ring-fenced off and executed 1st, the results stored in memory and then a join is done when the CTE is referenced in the outer query.

with myfirstmaterialized_cte(somethingid) as MATERIALIZED (
   select somethingid
   from dbo.mysomething
   where mysomething.region = 1)
select 
   myotherthing.myotherthingid,
   myotherthing.myothername
from 
   dbo.myotherthing
   inner join myfirstmaterialized_cte as mycte
      on mycte.somethingid = myotherthing.somethingid;

Ok cool, so then what are the differences between a temporary table and a Materialized CTE in PostgreSQL?

  • You can add indexes to a temporary table, you cannot use indexes on materialized cte’s.
  • A temporary table exists for the life of the session where as a materialized cte will exist only until execution of the query that references it is on-going.
  • Temp tables can require vacuuming.
  • Temp table writes need to be logged

So if you are from a MS SQL Server background, you probably coming to the same conclusion that I did, are these not very similar to table variables in MS SQL Server. They are for sure many similarities around the optimizer basically looking upon them as black boxes, i.e. (no idea of statistics, no indexes, not logged and up until MS SQL Server 2019, no idea of the cardinality estimate). A big difference between the two is the scope of their lifetime, where a table variable resembles more a temporary table and lives for the scope of the session.

So to summarize todays blog post:

  • We learned that in PostgreSQL a CTE can be automatically in-lined into the outer query (provided the conditions are right), which is the default from V 12 onwards and in most cases this is the behavior that you would want.
  • If you want the CTE executed 1st and results stored in memory independent of the outer query, you can still achieve this in newer versions on PostgreSQL simply by attaching the keyword ‘MATERIALIZED’ in the CTE definition.
  • Think about choosing a Materialized CTE over a temporary table if you need to connect to a read-replica or if its going to contain a small result and only needs to be referenced in one place within your batch.

Until next time, Slán!

DisclaimerNever just believe everything you read in a post, test this out for yourself, we all can make mistakes! Happy coding.

Published by sqlrebel

Data Engineer.

Leave a comment