This is a guest blog post by Pedro Belo. Pedro worked as a Ruby consultant in Brazil and in the US, until joining Heroku in 2008. When we first read his article from July 2011 we immediately knew it would be perfect to publish on our blog.
Lets understand the challenge of changing a Rails database without introducing any downtime with a simple, apparently harmless migration:
#!ruby class ApparentlyHarmlessMigration < ActiveRecord::Migration def self.up remove_column :users, :notes end end
I learned this kind of migration is not really harmless the hard way: during a production deploy. Since the column was not being referred anywhere in the code, I assumed it was a migration that I could just run at will.
But as soon as rake db:migrate was done, errors started to pop up at the logs:
#!ruby PGError: ERROR: column "notes" does not exist
Turns out that ActiveRecord caches table columns, and uses this cache to build INSERT statements. Even if the code is not touching that column, ActiveRecord will still attempt to set it to NULL when saving models.
I was just starting to realize how delicate database migrations are.
My first reaction was to call for a maintenance window whenever we had a migration to deploy. But that practice quickly became unfeasible as it blocked deploys, left users unhappy and just made it evident that we were doing things wrong.
It was time to understand the problem, and fix it for good.
Hot Compatibility
So here’s the basic principle that allows you to avoid downtime: any migration being deployed should be compatible with the code that is already running.
In order to do so, you’ll usually split your deploy process in two steps:
Make the code compatible with the migration you need to run
Run the migration, and remove any code written specifically for it
Going back to our example: if you want to remove a column, you’ll need to deploy a patch telling ActiveRecord to ignore it first. Only then you can deploy the migration, and clean up that patch.
With that in mind, lets understand what are the different patches that will make your code ready for a migration.
Patterns
Keep in mind some of the patterns I’ll introduce here are for Postgres only.
Read-only models
Most of the issues coming from migrations happen when you’re writing to the database. If you don’t need to save your model make it explicitly read-only:
#!ruby class Role < ActiveRecord::Base def readonly? true end end
Tables for read-only models can be modified without any concerns.
Removing columns
Tell ActiveRecord to ignore a column from its cache:
#!ruby class User def self.columns super.reject { |c| c.name == "notes" } end end
Once this patch is deployed you can safely remove the specified column.
Renaming columns
There’s no way to rename a column without downtime. You can get the same results, though, by adding a column, migrating the data and then dropping the previous one.
So the first step here is to just add a new column and make sure your code is writing to it. The catch is that at this point you need to read from both columns, adding a fallback to the accessor:
#!ruby def first_name super || attributes["fname"] end
This will make AR read from first_name
if available, and default to fname
otherwise. You can then populate the new column, and safely remove the old one (removing this patch as well).
This obviously doesn’t address SQL queries. If you’re currently using the column in any search expression, you’ll need to split the deploy in three steps:
Add a column with the desired name and change your model to write data to both (but don’t change any queries yet)
Populate the new column with data from the previous one, and update queries to refer to the new column name
Delete the old column
NOT NULL constraint
First make sure you’re writing to the column that will be receiving that constraint. For example:
#!ruby before_save :assign_defaults def assign_defaults self.admin ||= false end
Then update all existing records that have it set to null, and only then you’re safe to add the constraint.
Creating indexes
Creating indexes on a live system is surprisingly unsafe: ActiveRecord doesn’t create indexes concurrently, so your table will be locked against writes. If you’re writing a lot of data to the table, or if there’s a lot of data to be indexed, you probably want to create it concurrently instead.
The catch is that you can’t create concurrent indexes from a transaction, and all Rails migrations run within one. So you’ll need to resort to a hack and create your index using raw SQL:
#!ruby class IndexUsersEmails < ActiveRecord::Migration def ddl_transaction(&block) block.call # do not start a transaction end def self.up execute "CREATE INDEX CONCURRENTLY index_users_on_email ON users(email)" end end
The good news is that Rails will be able to dump that index to a Ruby schema normally (despite the raw SQL).
Cheat sheet
Adding columns - Safe for readonly models - Safe when there are no constraints on the column
Removing columns - Safe for readonly models - Tell AR to ignore the column first
Renaming columns - Not safe - First add a new column, then remove the old one - When the column is used on SQL queries you’ll need to split this in three steps
Creating tables - Safe
Removing tables - Safe
Creating indexes - Safe only for readonly models - Otherwise make sure you create indexes concurrently
Removing indexes - Safe
Future
Running migrations with no downtime takes a lot of planning, and work. But programmers are good exactly at abstracting work and turning repetitive tasks like this into something that can be reused. So it seems like we’ll naturally see a lot of the work described above abstracted on a level below the application.
ActiveRecord, for instance, could be more resilient to migrations. A naive approach to resolve the problem of dropping columns would be to rescue the database exception saying the column doesn’t exist, remove it from the cache and retry. It’s hard to do this in a reliable and clean way, but it seems possible.
Going further, as we move from monolithic applications running on a single server to distributed systems, the need for a database that can elegantly support migrations gets much higher. That’s certainly part of the motivation behind the NoSQL movement - but I’d expect change in relational databases too. Ideally they would adapt to this new ecosystem by providing tools to make our lives easier, like the ability to alias a column.
But enough speculating.
The reality today is that hot compatibility needs to be addressed on the application level, and that’s the best way to avoid maintenance windows or serving errors to your users.
Pedro talking about Zero Downtime Deploys at Rails Conf 2012
Pedro talked about Zero Downtime Deploys at Rails Conf 2012. You can check out his video here:
[youtube https://www.youtube.com/watch?v=R6bVTthtnZ0\]
We want to thank Pedro for letting us republish his original article. If you have any questions or tell us about your experience with Zero Downtime Deploys let us know in the comments! You can also get in touch with Pedro on twitter.
Further Information
We talked about Zero Downtime Deployment in our Workflow Series here. You can also check out our free eBook which has a dedicated chapter on Zero Downtime Deployment.