How to migrate MySQL database to PostgreSQL

I’ve migrated a couple of old Django projects from MySQL to PostgreSQL lately and decided to document the process here to help make it go faster in the future. If your old database is not exotic in any way the migration process is pretty fast and simple. The hard part is figuring out how and how much you should tweak in the old database to get rid of warnings/errors if there are some.

Prerequisites

You can go about this many ways. This guide uses pgloader and local databases (for example with Docker). Before you start:

Install pgloader (apt-get install pgloader on Linux or brew install pgloader on macOS)
Start local MySQL and import the old database
Start local PostgreSQL and create the new database + needed roles etc
Optional: install and configure your favourite graphic MySQL management tool to do fast small edits if needed

Note: you can obviously use remote databases for this as well, but the actual commands and networking options may vary. Working over the internet can also be quite a bit slower depending on your database size and internet connection.

The Migration

This command (just change the connection details) starts the migration and displays possible warnings plus a summary table after the migration is finished:

pgloader mysql://[root]:[docker]@[127.0.0.1]/[djuninen] postgres://[postgres]@[127.0.0.1]/[djuninen]

Cleanig Up The Data

In the real world the data in your old database might have some issues and the summary report might have lots of errors / missing rows. Depending on the project and the data in question you might want to make this process more strict and have for example tests to measure the number of objects etc.

I’m mostly working with non-sensitive data where its not a major issue if a blog post is missing a tag or if some individual rows get deleted in the migration so my tactic was just to find the source for any errors / warnings and iterate the process as many times as needed.

Most typical cases for me were individual rows that had for example NOT_NULL date_modified columns but were undefined in the data. I just went through these by hand and added a date to these. I also bumped into few missing or invalid relations. Depending on the data I fixed or removed these, whatever made more sense. Having a tool in hand to make quick tweaks to the data helped a lot. I was also working with very small (<200Mb) databases so there weren’t that much work. For bigger databases or more important data I would just spend whatever time is needed to make sure the data is correct first before even trying to migrate it. The older and bigger your database is, the more work you’ll probably need to massage the data beforehand.

Testing

For critical production data (anything else than just marketing blog posts) I would write two kinds of tests; 1) simple tests that count rows, sums, whatever important and easy metrics you come up with that can be easily be verified before and after the migration, and 2) end-to-end tests that hit at the very least the most important pages on the site and make sure they are identical before and after the migration.

For personal projects with noncritical data I didn’t really see the point as the normal e2e tests already cover that the db change itself didn’t screw anything up and as long as there weren’t any visible issues with the pgloader script itself, everything should be fine.

Lastly, you of course should keep backups of both before and after databases at hand for a while in case something comes up after the migration.