2016-02-10T15:21:43Z

Resolving Database Schema Conflicts

If you work on a project that uses database migrations with other developers, it is likely that you have experienced migration conflicts at some point. These occur when two or more developers are merging unrelated features to the master source control branch at around the same time, with each feature requiring different changes to the database.

In this article, I'm going to describe the problem and its solution in detail, using an actual example based on my Flask-Migrate extension. While I will be using commands that are specific to Flask-Migrate and Alembic, the solution to the problem that I present here can be adapted to other database migration frameworks.

The Problem

Let's say we have a team of two developers working on different features of an application. Mary, for example, needs to implement the user authentication subsystem, while David needs to add avatars for each user. The two developers start working on their features at about the same time, each on a freshly cloned copy of the team's git repository.

At the time Mary and David clone the project to start their work, the project has a User model that looks like this:

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(128))

They are using Flask-SQLAlchemy as ORM, and Flask-Migrate to track migrations to the database. The database migration history at the time both developers begin working on their features includes a single migration:

$ python app.py db history
<base> -> 279ebc64991a (head), add user table

Now Mary and David get to work on their features. Mary needs to add password hashes to the User model, so one of the very first things she does is to edit the model as follows:

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(128))
    password_hash = db.Column(db.String(64))

Immediately after editing the model, she generates and applies a database migration, to make this change on her development database:

$ python app.py db migrate -m "add password hashes to users"
INFO  [alembic.autogenerate.compare] Detected added column 'user.password_hash'
  Generating migrations/versions/d3868407e935_add_password_hashes_to_users.py ... done

$ python app.py db upgrade
INFO  [alembic.runtime.migration] Running upgrade 279ebc64991a -> d3868407e935, add password hashes to users

She then checks the database migration history to make sure the change was applied:

$ python app.py db history
279ebc64991a -> d3868407e935 (head), add password hashes to users
<base> -> 279ebc64991a, add user table

So Mary happily goes off to work on adding password hashes.

Meanwhile, David needs to add avatar URLs for all the users, so starting from the same User model as Mary, he makes the following change on his development environment:

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(128))
    avatar = db.Column(db.String(256))

And he also creates a database migration for this change:

$ python app.py db migrate -m "add user avatars"
INFO  [alembic.autogenerate.compare] Detected added column 'user.avatar'
  Generating migrations/versions/678d339a120f_add_user_avatars.py ... done

$ python app.py db history
279ebc64991a -> 678d339a120f (head), add user avatars
<base> -> 279ebc64991a, add user table

Let's assume that David finishes his work first, and pushes the user avatar changes to the upstream repository. This commit triggers an automatic deployment to a staging server, where David goes and checks that user avatars are working fine. He finds no problems, so he goes to find other work.

When Mary completes her work on user authentication, she tries to push to master and gets an error that tells her that her source tree is out of date. So she does a git pull --rebase, and then tries to push again. This time the push succeeds, so then she anxiously waits for the staging server to update so that she can check her work before moving on. But something bad happened, the deployment to the staging server failed horribly.

Can you see why? When Mary goes to check the logs of the failed deployment, this is what she finds:

$ python app.py db upgrade
Traceback (most recent call last):
    ...
alembic.script.revision.MultipleHeads: Multiple heads are present for given argument 'head';
678d339a120f, d3868407e935

And sure enough, she did not realize this, but even her own source tree got into a weird state after she pulled in David's change:

$ python app.py db history
279ebc64991a -> 678d339a120f (head), add user avatars
279ebc64991a -> d3868407e935 (head), add password hashes to users
<base> -> 279ebc64991a (branchpoint), add user table

So what happened here? As you see above, the migration history is not linear anymore. Migration 279ebc64991a was the starting migration for both Mary and David, so both their migrations are based on this change, creating a branched migration history with two heads.

It's interesting to note that when Mary pulled David's change git did not complain. These schema migrations are stored each in its own file, so git did not see conflicts in Mary's change and allowed her change to be merged, causing the project to break. Not nice, right?

How to Detect Schema Conflicts Before They are Committed

Before I tell you how to untangle this schema mess, let's think about Mary's actions. Could she or her team have done anything different to prevent conflicts like this from ever appearing in the team's repository?

There are certainly ways to prevent this type of conflicts. A migration history test could be written to find this and other problems with migrations. This is a test that creates an empty database, and simply applies all the migrations in order, to ensure that they all run fine. The test can then downgrade the database all the way back to its initial state, to also test downgrades, something that very few people check. A migration history test can be included as part of the application's unit test suite, or as a source control pre-commit check.

How to Resolve a Schema Conflict with a Merge

While detecting these conflicts before they are pushed to the shared repository is important, let's not forget about Mary's situation. Because she applied her migration before pulling in David's changes, her database is aligned with one of the two branched heads in the migration history. David's migration is in the other branch, so it can't be applied through an upgrade.

There are a couple of ways to unlock Mary's database. Recent releases of Alembic and Flask-Migrate support the merge command, which creates yet another migration that joins these multiple heads, creating a diamond shape. To get everything back in order with a merge, you need to run this command:

$ python app.py db merge -m "merge migrations from mary and david" 678d339a120f d3868407e935
  Generating migrations/versions/66fd23fca675_merge_migrations_from_mary_and_david.py ... done

And then after that, doing an upgrade works fine:

$ python app.py db upgrade
INFO  [alembic.runtime.migration] Running upgrade 279ebc64991a -> 678d339a120f, add user avatars
INFO  [alembic.runtime.migration] Running upgrade 678d339a120f, d3868407e935 -> 66fd23fca675, merge migrations from mary and david

If you look at the updated migration history, you will notice that the branches remain in the history behind the new merge migration:

$ python app.py db history
678d339a120f, d3868407e935 -> 66fd23fca675 (head) (mergepoint), merge migrations from mary and david
279ebc64991a -> 678d339a120f, add user avatars
279ebc64991a -> d3868407e935, add password hashes to users
<base> -> 279ebc64991a (branchpoint), add user table

Committing this new migration and pushing it to the team's git repository will address the problem and allow the deployment script to apply the migrations correctly.

How to Resolve a Schema Conflict Without Alembic Merges

I honestly find these diamond shaped merges confusing, I prefer to always have a linear migration history, so my approach to fixing schema conflicts does not involve Alembic's merge command.

Let's look at the state of the migration history on Mary's environment when she found she had a conflict:

$ python app.py db history
279ebc64991a -> 678d339a120f (head), add user avatars
279ebc64991a -> d3868407e935 (head), add password hashes to users
<base> -> 279ebc64991a (branchpoint), add user table

Her database is synced to migration d3868407e935, which is one of the two heads:

$ python app.py db current
d3868407e935 (head)

To unlock the migration history, all that needs to be done is to alter the order of the migrations, so that these two migrations by David and Mary happen one after the other. Because David was first to commit his migration, the correct thing to do is to move Mary's migration after David's.

The first step to reorder the migrations is to move the database back one migration, so that it isn't inside Mary's branch anymore:

$ python app.py db downgrade
INFO  [alembic.runtime.migration] Running downgrade d3868407e935 -> 279ebc64991a, add password hashes to users

With this command, the changes that Mary made to her database are removed. The next step involves manual editing of Mary's migration script, which is in file migrations/versions/d3868407e935_add_password_hashes_to_users.py. The first few lines of this script are:

"""add password hashes to users

Revision ID: d3868407e935
Revises: 279ebc64991a
Create Date: 2016-02-09 22:13:25.135581

"""

# revision identifiers, used by Alembic.
revision = 'd3868407e935'
down_revision = '279ebc64991a'

# ...

The important part here is the revision that is set as the down_revision, which is the migration that comes right before it in the history. To move this migration script after David's, all that needs to be done is to replace revision 279ebc64991a with David's revision code, which is 678d339a120f. After you edit the two occurrences of the previous migration, the migration script should look like this:

"""add password hashes to users

Revision ID: d3868407e935
Revises: 678d339a120f
Create Date: 2016-02-09 22:13:25.135581

"""

# revision identifiers, used by Alembic.
revision = 'd3868407e935'
down_revision = '678d339a120f'

# ...

And with that change saved, the migration history has been reordered and is again nice and sequential:

$ python app.py db history
678d339a120f -> d3868407e935 (head), add password hashes to users
279ebc64991a -> 678d339a120f, add user avatars
<base> -> 279ebc64991a, add user table

And now an upgrade cleanly applies David's migration and then Mary's:

$ python app.py db upgrade
INFO  [alembic.runtime.migration] Running upgrade 279ebc64991a -> 678d339a120f, add user avatars
INFO  [alembic.runtime.migration] Running upgrade 678d339a120f -> d3868407e935, add password hashes to users

At this point, the schema migrations are back in order and can be pushed to the team's repository to address the breakage.

Final Words

How do you deal with schema migration conflicts? If you have a different technique and want to share it, please write below in the comments!

8 comments

  • #1 Igor said 2016-02-11T10:01:30Z

    To be honest, it's quite a rare case in my experience, when multiple developers commit same migrations, but I try to be as safe as possible when I do this sort of changes. Usually I have my db migrations committed separately and I'm trying to be well sure that I won't clash with someone on this update. However, in this particular case, I agree that it's much easier to just dig into migration and rewrite it by hand (I'd rather merge the migrations together and add the field changes from both db-migrations manually). You would most likely learn something new and it also helps to stop thinking of db-migrations as 'some black magic being managed by this super extension'. For my particular case, my "fear" of db-migrations went away (or, at least, got pushed back significantly) as soon as I went from django-orm to sql alchemy, since it's far more transparent in my opinion.

  • #2 Miguel Grinberg said 2016-02-11T17:19:45Z

    @Igor: A migration conflict would happen with two developers commit "any" kind of migrations, even on different tables. The conflict is in how the migrations should be sequenced, not on what is being migrated.

  • #3 Cecil said 2016-02-11T22:41:38Z

    I see what you did there ;-)

  • #4 Igor said 2016-02-12T09:11:22Z

    Well, yes, but still if both developers are careful when creating their migrations and they don't blindly push to the upstream, nothing like this would be necessary. Which is, of course, too idealistic for most teams.

  • #5 Jack Grahl said 2016-02-12T13:33:37Z

    If the migrations folder contained a file with a list of all migrations in order, one per line, the git rebase would have failed, and Mary would have been reminded to fix the migrations. Perhaps this would be a decent approach. Alternatively it might work well if there was a Git feature where a folder could be tagged as always requiring manual merging, or if a 'merge command' (such as running alembic merge) could be configured for conflicts in a specific directory or to specific files.

  • #6 Miguel Grinberg said 2016-02-14T20:01:40Z

    @Igor: really, this is more common than you think. Whenever two people are working on feature that require database migrations you will have one of them having to handle a schema conflict.

  • #7 Frank said 2016-04-05T00:36:31Z

    Any tips/ideas for handling this in CI?

  • #8 Miguel Grinberg said 2016-04-05T02:08:37Z

    Hi Frank! For CI, what I have implemented two unit tests. One creates an empty database and then runs the upgrade, forcing alembic to go through all migrations. If there is a conflict, alembic will fail, so the build fails as well. The second test runs after the database has been upgraded succesfully, and it runs "manage.py db migrate" to generate a new migration. The resulting migration needs to be empty for the test to pass. If the migration isn't empty, then the migrations and the models got out of sync. These two tests are atypical in that they have a lot of shell commands and very little Python in them. You can opt to implement them as bash scripts, or (probably more painful) as a Python unittest that with calls the shell commands.

Leave a Comment