Having to take your application offline for updates can be a pain. You can mitigate this with consistent, scheduled downtime, but it's not something that brings delight to customers. What's more, some sites can lose thousands of dollars per minute they're down. There are many reasons an app can go down, but deploying or upgrading your application shouldn't be one of them! We have a tool we can use to ensure that our deployments create no downtime: blue/green deployments.
What It Is: A Few Wonderful Colors
I don't know who originally decided to use the colors "blue" and "green." But the gist is this: you have an instance of your application, a green version, in production. You also have a router that routes your user traffic to the app. You need to get a new version, the blue version, out so that your users can get some new goodies. But you want to ensure that if a user goes to look at one of your screens or presses a button, they can still do so—even while you're deploying blue. If you can secretly deploy green while blue handles all traffic in the meantime, then you can eventually swap out the connections so that everyone stops going to green and goes to blue instead. So you follow these steps: You start with the green version in production.
Deploy the blue version to the same environment as the green version. Run any smoke tests, as necessary.
Connect router traffic to the new version alongside the old version (the green version).
Disconnect router traffic from the old version.
Decommission the old version from the environment, if necessary. Seems pretty straightforward when it's broken down, but the devil is in the details. Every platform and language has different ways of approaching blue/green deployments, but most have the capability to do it.
How it Reduces Risk
As said above, when we blue/green our deployments, we can deploy without creating application downtime. And when we deploy without downtime, we eliminate or reduce quite a few risks that directly affect our business and our development team. Here's what you can enjoy when you eliminate your risk with blue/green deployment:
No Surprise Errors
Put yourself in the mind of your users for a moment. Let's say you want to order an item. You fill out your billing address and your street address, then you go on to enter your payment information. You agree to the shipping fee and uncheck the "receive spam mail" box. Finally, you press that blessed submit button only to get an error message: "Your order could not be submitted at this time. Please try again later." And all that precious time filling out your information is lost. If you're lucky, you get a specific error message like "Application is offline for maintenance." Most of the time, you get the error message equivalent of ¯_(ツ)_/¯. When we blue/green our deployments, we never need this maintenance screen. From your user's viewpoint, there's a list of items upon one click, and upon the next click, they see that new menu you added. This will keep furious emails about error screens from flooding your inbox. Let's give users surprise features, not surprise errors!
Go Ahead, Test in Production!
Often, it's healthy to ensure your pre-production environments are as close to your production environment as possible. As much as we would like prod to be the same as our QA or staging environment, we don't always get our way. This can cause subtle bugs in our configurations to seep through. With blue/green, it's no problem; you can test the app while it's disconnected from main traffic. Your team can even load test it, if you so desire.
You Accommodate Customers Who Shop at Weird Hours
There's a constant struggle to find that sweet, sweet deployment window—that time when no one cares. This is tricky, as our customer bases are more global than ever. There's no longer an internationally good time to do a deployment, especially if you work in an enterprise where the business needs to be running around the clock. If you have a customer-facing application, this means a customer who can't place an order may place it on some other website. You just lost a sale. If you have an internal application, this means an employee can't do their job and is actively losing your company money. By blue/green deploying, you assure your traffic never stops. That customer can place their order just fine without disruption, giving you that sale. That employee overseas can continue to do their job without interruption, saving your company money. The longer your current deploy downtime is, the more valuable this is.
You Get to Sleep Instead of Deploy
We just talked about customers who shop at weird hours. But what about you or your developers—the ones forced to put out fires at those weird hours? Finding the right deployment window can lead to devs doing deployments over the weekend. In extreme cases, it has to be done at four AM or some other absurd hour. I remember being on call and having to wake up because the weekend deployment failed. I was groggy and frustrated, and the furthest thing from my mind was ensuring all the quality checks were in place when I made any fixes. This encourages human error, especially in more manual deployments. If we apply blue/green, we can deploy whenever we want. More specifically, we can deploy during office hours, when we can bring our full team to bear on any issues that occur. We can deploy while the coffee in our veins is in full effect, giving us that mistake-avoiding brainpower.
Easy Recovery
As much as we like to think we've done everything right, sometimes we introduce bugs. We can either spend inordinate amounts of money ensuring deployments will always be defect-free—and still occasionally find them—or we can ensure that when we inevitably find them, we recover quickly and easily. By blue/greening our deployments, we have our older, more stable version of our application waiting to come back online at a moment's notice, evading the pain of trying to roll back a deployment. This is especially valuable if your deployments have many manual steps.
There Are No Silver Bullets
As great as it is using blue/green deployments to remove downtime, it doesn't come free. There's often a cost to supporting two versions of your application at the same time. This ends up significantly affecting your data model but can affect other areas as well. I would only suggest applying blue/green when some of the above risks may apply to your application. If you find that none of them do, go ahead and enjoy those simple swap n' drop deploys.
The Death of Downtime
Blue/green can be an extremely powerful way to reduce pain and risk in your application lifecycle. If you're the manager of a development team, I encourage you to assess if any of these risks apply or may apply to your application. If you're a team member for an application but not the main decision maker, you can use these as selling points to convince your manager to institute zero-downtime deployments. Go ahead. Add a couple steps to your pipeline, and watch as your fears and pain melt away.