Heroku: Up and Running (2013)
Chapter 6. Deployment
Although deploying to Heroku is a simple process, people commonly forget to consider some things when deploying applications, not only to Heroku, but anywhere else as well. Some of these items are not necessarily technology driven, and form the basis of a good checklist that you can use to ensure that your deployments are of a much more organized nature. We’ll take a look at some of these issues now. When deploying, you want to ensure that your deployments are safe and timely, but you should also have a Get Out of Jail Free card handy should it not go as planned.
Let’s start with one of the most basic considerations, which can alleviate a lot of pain surrounding deployments, particularly to production environments: when to deploy.
It is common for client requirements to dictate that deployments happen at the end of the day or week, and that’s something that a lot of people will happily do—however, this is probably the worst possible time to deploy. Deploying at the end of the day or week leaves you open to all sorts of issues arising, and having nobody around to help resolve them. Therefore, always plan on deploying when you’ve got a large window of availability in order to resolve any problems that might crop up. Having your application go down while you’re happily driving home for the weekend does not make for a good start to your time off.
Another common thing that is overlooked in the sometimes frantic rush to deploy is being able to back out a deployment should things go wrong. Typically, a deploy will not only change code, but also data and potentially cache contents. Therefore, it is always a good idea to have a plan of how you might recover things should the deploy not go well.
Heroku makes release rollback simple via the releases commands in the command-line interface:
$ heroku rollback
But you should also consider how to recover other parts of your application. For instance, do you have a backup of your database in its current state that you can use should your new data migrations fail? Heroku Postgres Fork is superb for this. Do you have a way of clearing caches and letting them rebuild from the current state if your code isn’t using the old caches properly? A few minutes forethought on backing out your releases can often be worth its weight in gold.
A third item that is often missed is testing your deployments. Developers are often great at building immense code coverage with unit testing, but commonly fail to think about testing that deployments work as they should and that no unforeseen problems occur. Finding these issues in a production environment does not often leave you with a smile on your face. So, how can you mitigate this risk?
Heroku makes this very easy by letting you create separate environments and forking applications. Futhermore, it’s relatively simple to create an environment exactly like the one you have in production, but with a different URL. Forking allows you to duplicate your production data, and using Git for source control lets you deploy exactly the same code to your new environment before testing your new deployment. For more information on forking, refer back to Forking an Application.
Setting up a staging environment may take a few minutes, but it can pay dividends in the long run if you uncover issues that you hadn’t thought of.
How to Deploy, and Deploy Well
So what do we need to consider when deploying an application? We could just push our code to Heroku and hope for the best, but things go wrong. Someone may have accidentally pushed code not ready for production, or we may have some potentially data destroying bug in our migrations. How can we be sure that we have a way out if things go bad?
It is good practice to back up before you make any changes, either locally in development or on Heroku in your production environment. Losing data through corruption or overwrite is every business’s worst nightmare, so it is worth investing a few seconds to ensure that you are able to recover your data and code to a previous version should things go awry. For more information on how to back up data with the Heroku Postgres service, see PGBackups.
Whenever you deploy code, change a config var, or add or remove an add-on resource, Heroku creates a new release and restarts your app. You can list the history of releases, and use rollbacks to revert to prior releases for backing out of bad deploys or config changes.
Releases are named in the format vNN, where NN is an incrementing sequence number for each release. Releases are created whenever you deploy code. In this example, v10 is the release created by the deploy:
$ git push heroku master
-----> Compiled slug size is 8.3MB
-----> Launching... done, v10
http://severe-mountain-793.herokuapp.com deployed to Heroku
To see the releases for an application:
$ heroku releases
Rel Change By When
---- ---------------------- ---------- ----------
v52 Config add AWS_S3_KEY email@example.com 5 minutes ago
v51 Deploy de63889 firstname.lastname@example.org 7 minutes ago
v50 Deploy 7c35f77 email@example.com 3 hours ago
v49 Rollback to v46 firstname.lastname@example.org 2012-09-12 15:32:17
And to get detailed info on a release:
$ heroku releases:info v24
=== Release v24
Change: Deploy 575bfa8
When: 6 hours ago
Addons: deployhooks:email, releases:advanced
Config: MY_CONFIG_VAR => 42
RACK_ENV => production
Rolling back will create a new release, which is a copy of the state of code and config vars contained in the targeted release. The state of the database or external state held in add-ons (e.g., the contents of memcache) will not be affected and are up to you to reconcile with the rollback.
Running on rolled-back code is meant as a temporary fix to a bad deployment. If you are on rolled-back code and your slug is recompiled (for any reason other than a new deployment) your app will be moved back to running on the most recent release. Subscribing to a new add-on or adding/removing a config var will result in your slug being recompiled.
Now we have our code, and we know it works locally for us and our test suite passes with no problems, but how do we know that our code will function under load? How do we know that, once we have a thousand users hitting our application, we’re able to perform in a way that is acceptable and doesn’t end up losing us business at the most critical times? The only way to be sure is performance testing.
Performance testing is a vast subject that is outside of the scope of this book, but you can take some very simple steps to check the basics. Load testing is the simplest form of performance testing. By exposing your application to an expected amount of load you can witness how your application will perform and where the potential bottlenecks are. Another common method is stress testing. Unlike load testing, which is testing your application under a given load, stress testing pushes your application as far as it can go before breaking. The aim of the stress test is to take your application to the point where it cannot go any further, thus exposing you to its upper limits. By a combination of load and stress testing, you are able to determine how your application will perform at peak times; it also gives you an idea of how much extra capacity you have in your pocket should the time arise.
There are a wide variety of sources for more information on performance testing. A good place to start is The Art of Application Performance Testing by Ian Molyneaux (O’Reilly, 2009).
Trimming the Fat (Slug Management)
One aspect that can improve the management of your application significantly is the time that it takes to deploy. When deployed to Heroku, your application resides in a slug archive stored inside a massive file store. When scaling, this archive is then copied from this file store and deployed on a fresh dyno. Once on this dyno, it’s spun up and starts serving requests. For day-to-day deployment and scaling, the time this process takes can have a significant effect on managing your application. For instance, imagine your application’s slug was extremely large; this slug would take time to copy across the network, time to unzip, and time to spin up. This means that between hitting the scale up button, and the new dyno being available, you could be looking at a significant amount of time.
A simple way of improving this is via the .slugignore file. In essence, .slugignore is the same as a .gitignore file (Git) or an svn:ignore property (Subversion). This file tells Heroku which files it can ignore from your application’s source repository, and therefore, which not to include inside your applications slug. One very common use case is that of the test suite. Test suites can sometimes be as large as your application itself, and can also contain large files such as test images or documents to test imports. By adding your suite to the .slugignore, this won’t comprise part of your slug once deployed to Heroku.
Luckily, .slugignore uses exactly the same format as .gitignore (barring the ! negation operator) so the file format should already be familiar to most:
When deploying to Heroku, your slug size will be displayed in the output, so it’s good to keep an eye on this and ensure that you’re staying within acceptable limits. Generally speaking, any slug under 15 MB is small and nimble; 50 MB is average; and 90 MB or above is weighty.
Source Code Management
While source code management is integral to developing and deploying with Heroku, it is far too vast a topic to cover in this book, and there are several books out there that cover it extremely well. However, some aspects of managing your source code are very helpful when working with Heroku and are wise to consider, regardless of what other best practices you may already be following.
Bear in mind that as Git is the de facto source control system on the Heroku platform, this is the system we’re describing here. When we’re talking about branching, we realize that within Git this is an incredibly inexpensive process (unlike with some other software configuration managers out there).
Branching for environments
Probably the most significant thing that you can do to make your life simpler is to create branches within your source control for each environment that you have running on Heroku. For instance, you may have a production environment, a couple of staging environments, and maybe even the odd QA or acceptance environment. By creating a branch in your source control for each of these environments and religiously deploying from the appropriate branch to the correct environment, you are able, at a glance, to see what code is in which environment. Additionally, you’re able to easily move code from one environment to another. Have a QA environment ready to go to staging? Merge the QA branch into the Staging branch, and deploy that branch to the staging environment. Need to test a bug in your production environment in a new QA branch? Create a new branch from your Production branch, deploy it to a new environment, and test away!
In addition to branching for each environment you own, it’s also wise to branch for each significant feature that you’re building in your application. Features are not always deployed in the order they are built, so if you’re developing features one after the other inside your staging branch, you then lose control over how these features make it out into production or similar. By maintaining these features in separate branches, you are able to merge them individually into your environment branches; this is indispensable, especially when your stakeholders are indecisive or external factors have an impact on feature releases.
There are exhaustive references written on how to effectively use branching in source control with a team. We don’t have the time here, so instead we recommend researching Git Flow for dealing with feature branches, as well as its slightly modified cousin GitHub Flow. Understanding branches is critical to effectively working in software.
At this point, it should be clear that you don’t need to limit yourself to just one running instance of your application on Heroku. As you currently get 750 hours a month of free dyno time per application per month, you are able to spin up an instance of an application and run it on a single dyno for free 24 hours a day. What’s more, an application still exists even without any running dynos, so scaling everything down when it’s not needed means that you’re not using any hours at all. This means that the days of test and staging environments being limited in availability are long gone, and these environments now should be treated as freely available and disposable.
How can you leverage this ability to have your application running in multiple places at once? For starters, let’s revisit the basics of deployment and learn a little bit about Git remotes.
Consider the most basic command for deploying to Heroku:
$ git push heroku master
This default git command is telling Git to do something very specific. In a simple sense, it’s asking to push the master branch to the heroku remote, a remote simply being a foreign Git repository somewhere out there on the wire. You can see the definition of this remote by looking at your git configuration:
$ git remote -v
origin email@example.com:neilmiddleton/my-sample-app.git (fetch)
origin firstname.lastname@example.org:neilmiddleton/my-sample-app.git (push)
heroku email@example.com:morning-sunshine-42.git (fetch)
heroku firstname.lastname@example.org:morning-sunshine-42.git (push)
Here you are seeing two remotes. First, origin, which is a repository on GitHub, and second, a remote called heroku, which is automatically created when the application is created with heroku create.
The name of this heroku remote, however, is arbitrary; it’s just a string identifier for this remote. If you wanted, you could rename this remote to whatever you chose; for instance, production would be equally as useful.
By adding new remotes to your application, you are able to push your code to other places as easily as our first deploy. Let’s say you want to create a new staging environment for this application. First, you can create another new application and give it a specific name on Heroku and ask for a specific remote name:
$ heroku create my-sample-app-staging -r staging
This creates the application my-sample-app-staging and adds it as a remote to the current application. Once this has completed, you can then do a simple deploy:
$ git push staging master
and the magic will happen.
With this in mind, you can go as far as you want: you can create remotes for every environment you might ever need, and push to each of those remotes as much as you want. However, this only works for deploys. What about all the other Heroku commands?
Well, each and every command you can issue to Heroku via the command line takes an extra argument of the application in question:
$ heroku logs --app my-sample-app
$ heroku logs --app my-sample-app-staging
and so on.
By using Git remotes and targeting your commands to the individual instances of your application, you create, in theory, as many copies of your applications as you desire and manage them all separately.
A common pitfall for people using this multiinstance approach, and one that we authors have fallen into many times, is not maintaining consistency across all the instances as much as possible. For example, if you add an add-on to one application, ensure that it is also on all the other applications as they require them.
Failure to do this can lead you down a path of finding difficult-to-diagnose bugs, or environments that simply don’t function in the same way, which can very easily become frustrating.
Something that you’ll find yourself doing at one time or another when working with Heroku is collaborating with other people on an application in some way. There are a number of things to consider when doing this, and luckily the vast amount are common sense.
For instance, ensure you have a sound deployment workflow whereby code cannot make it from development to production without being checked at least by automated testing, or another pair of eyes. Communication can help here for ensuring that the wrong things aren’t deployed at the wrong times, but workflow can pretty much guarantee that bad things can’t happen unless you happen to have failed several steps of your workflow.
Branching strategies such as those described in this chapter will go a long way here, as will staging environments combined with some kind of stakeholder sign-off on the changes that are moving their way through the system. Ultimately, though, Heroku enables anyone in a team to be as empowered as any other, so it’s wise to consider what internal rules you might put into place before inviting the hordes to collaborate on your production application.
One of the hoops that every developer needs to jump through when putting an application into production is setting up his DNS so that www.yourapp.com resolves to his Heroku application and serves pages as it should.
Normally this is a very straightforward task when setting up on a simple VPS. Your VPS has an IP address, and you create some A records on your domain to point your domain at that server.
However, this gets a little more complicated when you need to scale. Let’s say you have to add a second server to your stack—you’re now looking at some sort of round-robin DNS, or implementing a load balancer and moving all of your DNS traffic through that new device.
Now let’s say you want to scale to the level of something like Heroku (it’s no simple task). Let’s consider how the Heroku DNS works. At the front of the stack is a massive array of load balancers that are receiving traffic from the outside world and directing it at the routers. To the outside world, these load balancers have IP addresses, and can be pointed to with A records.
However, this is a very bad idea. There are literally millions of applications being routed at a rate of tens of thousands of requests per second through these load balancers, so Heroku constantly has to reconfigure this layer to ensure that the traffic being received is being served quickly and is stable. What’s more, when some script kiddie out there decides to have some fun, he might launch a denial of service attack against these load balancers. At the point the Heroku operations team will react and make the changes that need to be made to keep things alive and stable.
So, how do we configure our DNS so that we don’t have to worry about all of this? Well, every application has a *.herokuapp.com domain that is managed within Heroku’s own DNS infrastructure centrally. Heroku ensures that these applications always have a healthy point to which to send the user and DNSs are updated as needed. This means that these *.herokuapp.com domains are an ideal place for you to send your users.
Therefore, when setting up your own DNS, it is always a good idea to use CNAMEs to alias your domain to the herokuapp.com domain. CNAME records are essentially a simple redirection—DNS-speak for saying my config is the same as this other domain’s, so use that. This means that you also benefit from changes that Heroku is making but don’t need to worry past the basic setup.
But what about root/apex domains (e.g., yourapp.com, not www.yourapp.com) that aren’t allowed to use CNAMEs? Here you have a few choices: either you can use regular A records and point those at the few IPs that Heroku publishes in its DevCenter documentation, or you can try not to use apex domains and try and encourage your reluctant users with a subdomain prefix such as www.
A more foolproof way is to use some of the more modern hosted DNS offerings out there. Some companies such as DNSimple now offer their own custom types of records in your DNS that allow you to alias apex domains in the same way as a CNAME, and they handle this DNS magic within their own infrastructure. Setup is simple: sign up for one of these services, read the documentation, point your domain name servers at theirs and set up your domain as required. What’s more, these services cost peanuts for the average user.
Setting up DNS in a way that’s fully compatible with the Heroku platform is relatively simple, but it can be a bit daunting at first. There’s plenty of documentation in the DevCenter on the topic, plus lots of documentation on the hosted DNS services such as DNSimple. The key thing to remember is that you’re setting up your DNS to be infinitely more resilient than pointing it to a single IP, and also doing it in a way that will let you scale as far as you could ever need.
Let’s go through a couple of common examples: one where you wish to host your website on the apex domain (heroku.com), and one on a subdomain (www.heroku.com). For both examples, we’ll use the tools available to us from DNSimple:
Hosting on apex
In these instances, we need to use the ALIAS record so that we can bind our apex domain to our Heroku application:
my_app.com ALIAS my_app.herokuapp.com
For any subdomains, we can do several things. We can either redirect the www subdomain to the apex via the URL record:
www.my_app.com URL my_app.com
This will redirect the user to the apex, or we can CNAME it:
www.my_app.com URL my_app.herokuapp.com
The problem with CNAME here is that you will have issues with canonical URLs, as the site will be available on both the apex and www subdomains.
Hosting on a subdomain
Ideally, you want to be hosting your website on a subdomain such as www. This makes the setup much simpler:
www.my_app.com URL my_app.herokuapp.com
To deal with the apex, we just need to redirect to the appropriate subdomain, although this isn’t strictly required:
my_app.com URL www.my_app.com
This doesn’t have any issues with canonical URLs, and also means that search engines will index the subdomain, something that will help you in the future should you need to rearrange your application.