Pretty much everyone has a junk drawer somewhere in their home
(surely I can’t be the only one?). You’ve got some elastic bands
in there, some matchbooks, some spare change, maybe a spare set of
keys.

The Junk Drawer Problem stated succinctly is: my junk drawer is full, what do I do now?

Over time, databases can develop junk drawers as well. We were using
a MySQL table to store little bits of member-related information —
things that didn’t really belong in the member table
itself, like a given member’s list of pinned social networks and other
UI related preferences.

The schema of this table was basically a memberid-key-value
triple, where key and value were character strings. It was becoming
clear that this arrangement was not going to scale very well in the
long term — we foresaw many situations where storing something in
this table would be the natural solution and the current arrangement
just wasn’t going to cut it. Kind of like when your junk drawer is
full and you’re finding it more and more difficult to cram stuff
in there.

One solution to the junk drawer problem is to get a larger and more
easily re-arranged junk drawer. So that’s what we did: we decided
to migrate this table from MySQL to Mongo, and also to do it without
any downtime.

The first and most straightforward step was to create the Mongo-backed model and duplicate the existing set and retrieve capabilities of the SQL model. Unit tests are very important here: they give confidence that existing code continues to work properly when it uses the new model. I ran all of our unit tests with the new model in place, not just the focused unit tests related to this model. It’s used by many parts of the system and I wanted to be very sure that my changes did not break anything.

Next is the script to migrate the data from MySQL to MongoDB. It’s also pretty straightforward. You can probably imagine it: for each row in the MySQL table, create and store a MongoDB document. In this specific case it was made even easier by the relatively low number of rows in the source table: fewer than 500,000. They could all be loaded into memory and processed in one go.

Okay, now assuming the data has been migrated, we need a way to
immediately switch from using MySQL to MongoDB for this data. A simple
Feature Flag (as described in this nice Flickr blog post: Flipping Out)
suffices, e.g. a flag called *JD_USE_MONGO*. When that flag is false, the
original MySQL model is used, and when the flag is set to true, the
new MongoDB model is used.

Simplified example:

function setSomeMemberData($memberId, $data)
{
if (FeatureEnabled(‘JD_USE_MONGO’)) {
MongoDBSetSomeMemberData($memberId, $data);
} else {
MySQLSetSomeMemberData($memberId, $data);
}
}

The code that calls set and get functions of this kind (like an ajax action that pins/unpins a social network in the HootSuite compose window) does not need to change, but the underlying database can be swapped at any time by toggling the dark launch flag. In this specific case, it was easy because the get and set functions, with only one exception, were all written as wrappers around only three functions that directly interacted with the database. So only those three functions needed to look at the value of the *JD_USE_MONGO* flag.

Feeling very pleased with myself, I then set out to do a quick back
of the envelope calculation to estimate how long the migration
script would take. It turned out to be on the order of 10 minutes
(in the end it actually took a little over 12 minutes). During that
time the site would be up running, and users would be free to pin
and unpin social networks and do other things that modify the MySQL
table.

This is definitely a problem. Any changes made to the MySQL data while
the migration is happening could be missing from the target Mongo collection, if the changes
apply to a MySQL row that has already been migrated.

I asked myself the same question I always do whenever I run into a problem: “Can I use a Dark Launch Flag to solve this?” In this case, fortunately the answer was “yes.” I created the flag is *JD_LOG_CHANGES*. When it’s enabled, any changes to the MySQL table are logged to a temporary SQL table. The clever part was that the changes were logged as MongoDB update commands so that the contents of the log table could be played directly into the MongoDB collection to bring it up to date.

For example, if this MySQL update is made:

UPDATE `table` SET foo=’bar’ WHERE member_id=32709;

Record this to the temporary table:

db.table.update({'_id': 32709},{‘$set’: {‘foo’:’bar’}}, true);

Note that the log is kept in chronological order so it does not matter if the same property is changed multiple times — the last change made via MySQL will be the last change re-played into Mongo.

So the steps to migrate the table while the site is running are:

  1. Enable *JD_LOG_CHANGES* Note that it does not matter if this is enabled “early” (that is, before the migration script is started) as any changes logged will, when pasted in, match the contents of the MySQL “snapshot” that the migration script is working from.
  2. Run the MySQL to Mongo migration script.
  3. When the migration script finishes, enable *JD_USE_MONGO*, and at the same time:
  4. Quickly dump the contents of the temporary log table into the MongoDB collection. This is the biggest weakness in the process: if this takes too long, you may end up overwriting changes that users are making while the temporary log is being processed. In practice this was not a problem: there were only around 300 changes recorded in the log during the 12 minutes it took the migration script to complete.
  5. Remove all code that references the *JD_USE_MONGO* and *JD_LOG_CHANGES* Dark Launch flags; remove code that references the original MySQL table; drop the MySQL table.

This is an OK way to migrate a small (~500,000 records) table. I’d love to know how you’ve accomplished a similar task.

I believe a better way, especially for large tables, is described in this great set of slides from Etsy: Migrating from PostreSQL to MySQL … without downtime. The Netflix Tech Blog also has a great detailed description of a large data migration, in this case from SimpleDB to Cassandra: Data migration for a high volume web application.