AWS DMS Hidden Gotcha’s Part I

So everyday at work we use AWS DMS for replicating data from our core main databases which are SQL Server instances installed on EC2 machines to various data stores in the AWS cloud. And don’t get me wrong this is a great service that is super fast and also fairly durable but there are a few hidden gotcha’s that can crop up depending on how you initially setup the first DMS task to run on your source server.

When creating a DMS Task to pull from a source Database on MS SQL Server, if the user that DMS uses to connect to SQL Server is a sysadmin, then DMS will automatically handle all the MS-Replication setup for you. (One caveat being that Replication is enabled on the instance).

Wow, this is great, so it takes care of setting up the publications, the artifacts to be copied, the log reader agent job, so what’s the problem? Well there not problems more things that the user should be aware of.

First up, the user needs to be SYSADMIN, now I know if you do this properly then that user should only be used for DMS Replication and its password should be in the secret store, but it is still a sysadmin, and it doesn’t need to be, if you are prepared to setup the Replication yourself, rather than asking DMS to do it, the user does not need this elevated permission. Remember we should always go be the rule of ‘Minimum Privileges to do your job’, we are not following that rule in this case.
Because DMS has done the replication setup for you, its easy to gloss over what is actually going on under the hood. For example, DMS Tasks allow you the option to filter out data, i.e. but did you know that this filter is not applied on the source database but on the Replication instance. So if you had a table called items and you wanted to exclude items that were made in ‘Russia’ then you could add a filter on the country column on that row, but items made in ‘Russia’ would still be replicated to the Replication instance, they just don’t make it to the target. Think of all the extra I/O that is happening, not to mention the extra cost if you are sending data to a different AZ. If you had instead added a filter on the artifact itself in the source database this would not happen.
Again similar to above is that all columns on a table are replicated to the Replication instance. So again going back to our example of the ‘items’ table if this table was 70 columns wide but you only needed 10 of those columns, this doesn’t matter, DMS has setup the replication to replicate all the columns so all 70 are replicated to the DMS Replication instance and it is only at this point that the columns are filtered out before the data is sent to the target.
Stopping/Deleting a DMS task does not stop the replication on the source database. The Log Reader Agent will continue to read the transaction log and for all artifacts marked for replication send the changes to the Distribution database where they will live for a while before being deleted. But again think of the extra CPU that is being used here when its no longer needed or the extra I/O and memory. Its very easy to forget about the underlying replication when you allow DMS to do the setup.
My final point on this is a more of ‘Beware of this potentially happening’ one. We replicate a table at work that has a number of columns, one of them being of type ‘text’ (yes I know, this is bad!) and we have the DMS Task setup the do limited LOB of 35KB, all good so far. We also monitor blocking on that table and sometimes we receive blocking messages that our AWS DMS user is attempting to query this text column for a particular row in the table but it is being blocked by a session that is updating that row. But why would it be reading from the table? DMS uses transactional replication which reads from the transaction log. I hold my hands up and say I cannot reproduce this on our test environment but we do see this a few times a month, so does DMS need to query the table for text columns on certain occasions?

Again let me say, these are not bugs or flaws with AWS DMS it is correct to not apply filtering at the source database and to replicate all columns, but its just that a lot of the time the user is actually unaware of what is going on in the back-end.

As always, if you liked this blog please follow me on Twitter to be kept up-to-date with my latest blog releases and more fun stuff.

Until next time, Slán!

Disclaimer: Never just believe everything you read in a post, test this out for yourself, we all can make mistakes! Happy coding.

AWS DMS Hidden Gotcha’s Part I

Published by sqlrebel

One thought on “AWS DMS Hidden Gotcha’s Part I”

Leave a comment Cancel reply

Share this:

Related

Published by sqlrebel

One thought on “AWS DMS Hidden Gotcha’s Part I”

Leave a comment Cancel reply