AWS DMS Replication Recovery Checkpoint

The Data Migration Service or DMS for short is one of AWS’s most useful service. It allows you to migrate data from one data source to another, provided at least one of the those data sources.

Before we get into the main topic of this article, ‘Explaining the Recovery Checkpoint’ of DMS, lets have a quick recap of what DMS offers.

It offers both an initial ‘re-Load’ and continuous replication feature. For the ‘re-Load’ a schema modification lock is taken out on all of the tables specified in the DMS task briefly at the beginning of the process and then a shared lock while the data is read from the table directly. This allows you to reload all of the data.

For continuous replication, either MS- Replication or MS-CDC is used to allow what ever changes are happening to to be replicated over from the source to the target. This is done by reading from the transaction log, the changes that are being written to it. It prevents unread changes from being truncated from the log by creating a dummy transaction in the transaction log. It has been found that DMS can lose track of this transaction and it can just run and run, thus preventing the transaction log from being truncated and also from any changes from being replicated over, best thing to do in this case is to stop and restart the DMS task. If a table has a Primary Key on it, MS-Replication will handle the changes and for changes without a PK, then CDC needs to be enabled on those tables.

Ok so, lets get back to explaining about the recovery checkpoint.

When you stop a DMS ‘Replication Only’ DMS task, (More on the Replication Only bit later), you can either ‘Resume’ that DMS task or ‘Restart’, so what’s the difference?

‘Resuming’ the DMS task will just continue the DMS replication from this point on, so any changes which were made to the database while the DMS task was stopped do not get replicated over, but any changes from this point onwards will.

‘Restarting’ a DMS task allows you to continue the replication from a specific point, that can either be the following:

  • Specific point in time
  • A log sequence number
  • A recovery checkpoint

Lets explore each option.

The ‘Specific Point in time’, allows you to specify down to the second from which time in the past you wish this DMS task to resume from. AWS DMS is able to convert this point in time into a native start point such as LSN for SQL Server. See official docs here.

The ‘Specified LSN’ allows you to enter a LSN in the case of SQL Server or SCN in the case of Oracle, this is a specific log record that was written out to the transaction log.

The ‘Recovery Checkpoint’ , is the last checkpoint that was recorded by the DMS Task, this can either be obtained from the public.awsdms_txn_state table in the target database, provided the ‘TaskRecoveryTableEnabled’ checkpoint is enabled in the DMS Task, then this table will be created or else the checkpoint can be obtained from the DMS Task UI.

This ability to ‘Restart a DMS Task from a specific point in time’ will only work for ‘Replication Only’ DMS tasks, as if you restart a DMS task that ‘Loads and Replicates’ it will attempt to do a full re-load.

Ok so that’ s super interesting but how is the DMS task able to do this?

So as we mentioned at the beginning of the article, DMS continuousreplication reads from the transaction log. Taking the example of SQL Server being the source, what it is essentially doing is calling an API that is reading from the function fn_dblog, this returns the changes that are currently in the transaction log. Ok great, but one small problem, don’t most databases have regular transaction log backups taken, say every 15 minutes which will truncate the transaction log, yes they do, so if DMS can’t find what it is looking for by calling the function fn_dblog, it can also call the function fn_dump_dblog(), which allows DMS to read from the transaction log backups, provided those backups still exist and have not been moved to some other location, then you have a problem!

Ok so this feature, can be super useful, you can stop a DMS task for a number of hours say and edit that task and tell it, when I restart you I want you to start from this point in time, or this Log Sequence Number or this checkpoint. This feature can be used to save costs where a replication instance might only run during daytime hours or even for DR, where a snapshot could be taken daily of the target database, this could then be restored to a particular location if needed and a replication-only DMS task created and told to re-start replication from this date and time onwards. Again, the only thing to make sure is that the transaction log backup is still accessible.

** If you like my monthly blogs, please do follow me on Twitter. @sqlrebel

Until next time Slán!

Disclaimer: Never just believe everything you read in a post, test this out for yourself, we all can make mistakes! Happy coding.

Published by sqlrebel

Data Engineer.

2 thoughts on “AWS DMS Replication Recovery Checkpoint

    1. It depends on your requirement, if you just want to get all of the data from where you left off then ‘Recovery Checkpoint’ is the way to go, however if for some reason you need data from a specific time or transaction, such as you are only interested in data after midnight on sunday then using and lsn or point in time would be the option to choose. Thanks.

      Like

Leave a reply to richardoriordanhotmailcom Cancel reply