Heroku: Up and Running (2013)
Chapter 8. Buildpacks
Buildpacks provide the magic and flexibility that make running your apps on Heroku so easy. When you push your code, the buildpack is the component that is responsible for setting up your environment so that your application can run. The buildpack can install dependencies, customize software, manipulate assets, and do anything else required to run your application. Heroku didn’t always have buildpacks; they’re a new component that came with the Cedar stack.
To better understand why the buildpack system is so useful, let’s take a look at a time before buildpacks. Let’s take a look at the original Aspen stack.
When Heroku first launched, its platform ran a stack called Aspen. The Aspen stack only ran Ruby code—version 1.8.6 to be exact. It had a read-only filesystem (no writes allowed). It had almost every publicly available Ruby dependency, known as gems, pre-installed, and it only supported Ruby on Rails applications.
Developers quickly fell in love with the Heroku workflow provided by Aspen and wanted to start using the platform for other things. The Ruby community saw Rack-based applications grow in popularity, especially Sinatra, and there was also an explosion in the number of community-built libraries being released. If you were a Ruby developer during this time you could be sure to find an acts_as library for whatever you wanted. While that was good for the community, keeping a copy of every gem on Aspen wasn’t maintainable or sane, especially when different gems started requiring specific versions of other gems to work correctly. Heroku needed a way to deploy different types of Ruby applications and a different way to manage external library dependencies.
Recognizing the need for a more flexible system, Heroku released their next stack, called Bamboo. It had very few gems installed by default and instead favored declaring dependencies in a .gems file that you placed in the root of your code repository (this was before Ruby’s now-common dependency management system, called bundler, or the Gemfile, used to declare dependencies, existed). The Bamboo stack had just about everything a Ruby developer could want, but it didn’t easily allow for custom binaries to be used, and it certainly didn’t allow non-Ruby developers to take advantage of the Heroku infrastructure and workflow. The need for flexibility and extensibility drove Heroku to release their third and most recent stack, called Cedar. This stack was the first to support the buildpack system.
Introducing the Buildpack
With the Cedar stack, the knowledge gleaned preparing innumerable Ruby apps to run on Aspen and Bamboo was abstracted out into a separate system called a buildpack. This system was kept separate from the rest of the platform so that it could be quickly iterated as the needs of individual languages grew. Buildpacks act as a clean interface between the runtimes, which is the system that actually run the apps, and your application code. Among others, Heroku now supports buildpacks for Ruby, Java, Python, Grails, Clojure, and NodeJS. The buildpack system is open source, so anyone can fork and customize an existing buildpack. Developers might want to do this to add extra functionality such as a native CouchDB driver, or to install a compiled library like wkhtmltopdf. With a forked buildpack, you can do just about anything you want on a Heroku instance.
Even if you’re not interested in building and maintaining a buildpack of your own, understanding how they work and how they are architected can be vital in understanding the deploy process on Heroku. Once you push your code up to Heroku, the system will either grab the buildpack you have listed under the config var BUILDPACK_URL or it will cycle through all of the officially supported buildpacks to find one that it detects can correctly build your code. Once the detect phase returns, the system then calls a compile method to get your code production ready, followed by arelease method to finalize the deployment.
Let’s take a look at each of these steps in turn.
When Heroku calls the detect command on a buildpack, it runs against the source code you push. The detect method is where a Ruby buildpack could check for existence of a Gemfile or a config.ru to check that it can actually run a given project. A Python buildpack might look for .pyand a Java buildpack might look for the existence of .mvn files. If the buildpack detects that it can build the project, it returns a 0 to Heroku; this is the UNIX way of communicating “everything ran as expected.” If the buildpack does not find the files it needs, it returns a nonzero status, usually 1. If a buildpack returns a nonzero result, Heroku cancels the deploy process.
Once a buildpack detect phase returns a 0 (which tells the system that it can be run), the compile step will be called.
In the compile stage, the configuration environment variables are not available. A well-architected app should compile the same regardless of configuration.
A cache directory is provided during the compile stage, and anything put in here will be available between deploys. The cache directory can be used to speed up future deploys (e.g., by storing downloaded items such as external dependencies, so they won’t need to be downloaded again, which can be time consuming or prone to occasional failure).
Binary files that have been prebuilt against the Heroku runtime can be downloaded in this stage. That is how the Java buildpack supports multiple versions of Java, and how the Ruby buildpack supports multiple versions of Ruby. Any code that needs to be compiled and run as a binary should be done in advance and made available on a publicly available source such as Amazon’s S3 service.
BINARY CRASH COURSE
On NIX-based systems, when you type in a command such as cat, cd, or ls, you are executing a binary program stored on the disk. But how does your operating system know where to find these programs?
You can find the location of commands by using which. For example, to get the location of the cat command, we could run:
$ which cat
Here we see that the binary is in the path /bin/cat. Instead of having the operating system look for our binary, we can run it from the full path if we desire:
$ /bin/cat /usr/share/dict/words
This will use the cat command, which can output one or more concatenated files. Many things you type on the command line are actually binaries that have been compiled to run on your operating system; from echo to ruby to python, they are all just compiled files that you can open from anywhere on your system. But you don’t need to type in the full path every time you execute a binary. How does the operating system know where to find them?
In the example, we executed cat by using the full path /bin/cat, but we don’t have to do that all the time; instead, we can just type in:
$ cat /usr/share/dict/words
How does our operating system know where to find the executable cat binary? It turns out that it searches all of the directories in the PATH environment variable in turn until it finds an executable file with the name you just typed in. You can see all of the paths that your operating system will search by running this command:
$ echo $PATH
Note that your PATH will likely look different than this. Here we have several paths such as /bin and /usr/local/bin in our PATH variable separated by colons. The dollar sign in the previous command simply tells our shell to evaluate the PATH variable and output that value.
Together, your PATH and the binary files on your computer make for a very flexible and useful system. You’re not stuck with the binaries that were put on your system for you; you can compile new ones, add them to your PATH, and use them anywhere on your system. This concept is core to the philosophy behind UNIX as well as Heroku and buildpacks. The systems start out as bare shells, and are filled with the components needed to run your web app in the compile stage. You can then modify your PATHs and make those binaries available across the whole application.
Once the compilation phase is complete, the release phase will be called. This pairs the read-built application with the configuration required to run it in production.
The release stage is when the compiled app gets paired with environment variables and executed. No changes to disk should be made here, only changes to environment variables. Heroku expects the return in YAML format with three keys: addons if there are any default add-ons;config_vars, which supplies a default set of configuration environment variables; and default_process_types, which will tell Heroku what command to run by default (i.e., web).
One of the most important values a buildpack may need to set is the PATH config var. The values passed in these three keys are all considered defaults (i.e., they will not overwrite any existing values in the application). Here is an example of YAML output a buildpack might output:
This YAML output will only set default environment variables; if you need to overwrite them, you need to use a profile script.
Once the release phase is through, your application is deployed and ready to run any processes declared in the Procfile.
The release phase of the build process allows you to set default environment variables, referred to by Heroku as config, on your application. While this functionality is useful it does not allow you to overwrite an existing variable in the build process. To accomplish this, you can use a profile.ddirectory, which can contain one or more scripts that can change environment variables.
For instance, if you had a custom directory inside of your application named foo/ that contained a binary that you wished to be executed in your application, you could prepend it to the front of your path by creating a foo.sh file in ./profile.d/ so it would reside in ./profile.d/foo.sh and could contain a path export like this:
If you’ve written this file correctly in the compile stage of your build process, then after you deploy, your foo will appear in your PATH.
You can get more information on these three steps and more through the buildpack documentation.
Now you can detect, compile, release, and even configure your environment with profile.d scripts. While most applications only need functionality provided in the default buildpacks, you can extend them without needing to fork and maintain your own custom buildpack. Instead, you can use multiple buildpacks with your application.
Leveraging Multiple Buildpacks
Buildpacks give you the ability to do just about anything you want on top of Heroku’s platform. Unfortunately, using a custom buildpack means that you’re giving up having someone else worry about your problems and you’re taking on the responsibility of making sure your application compiles and deploys correctly instead of having Heroku take care of it for you. Is there a way to use Heroku’s maintained buildpack, but to also add custom components? Of course there is: you can use a custom buildpack called Multi, to run multiple buildpacks in a row. Let’s take a look at one way you might use it.
In a traditional deployment setup, you would have to manually install binaries (see Binary Crash Course) like Ruby or Java just to be able to deploy. Luckily, Heroku’s default buildpacks will take care of most components our systems need, but it’s not unreasonable to imagine a scenario that would require a custom binary such as whtmltopdf, a command-line tool for converting HTML into PDFs. In these scenarios, how do we get the code we need on our Heroku applications?
You’ll need to compile the program you want so it can run on Heroku. You can find more information on how to do this in Heroku’s developer center. At the time of writing, the best way to compile binaries for Heroku systems is to use the Vulcan or Anvil library.
Once you’ve got the binary, you could fork a buildpack and add some custom code that installs the binary for you. Instead, we recommend creating a lightweight buildpack that only installs that binary. Once you’ve got this simple buildpack, you can leverage the existing Heroku-maintained buildpacks along with another community-maintained buildpack called “heroku-buildpack-multi.” This “Multi Buildpack” is a meta buildpack that runs an arbitrary number of buildpacks. To use it, first set the BUILDPACK_URL in your application:
$ heroku config:add BUILDPACK_URL=https://github.com/ddollar
Instead of deploying using someone else’s buildpack from GitHub, you should fork it and deploy using your copy. This prevents them from making breaking changes or deleting the code. A custom buildpack needs to be fetched on every deploy, so someone deleting his repository on GitHub could mean that you can’t deploy.
Once you’ve got the BUILDPACK_URL config set properly, make a new file called .buildpacks and add the URLs to your custom buildpack and the Heroku-maintained buildpack.
You can see the documentation on Multi Buildpacks for examples and more options.
Quick and Dirty Binaries in Your App
If making your own mini buildpack seems like too much work, you can compile a binary and include it in your application’s repository. You can then manually change the PATH config variable to include that directory, and you’re good to go. While this process is simpler, it has a few drawbacks. It increases the size of your repository, which means it is slower to move around on a network. It hardcodes the dependency into your codebase, which can litter your primary application source control with nonrelated binary commits. It requires manually setting a PATH on new apps, which must be done for each new app, and it makes the binary not easily reusable for multiple apps. With these limitations in mind, it’s up to you to pick the most appropriate solution.
We recommend using the multi buildpack approach when possible.
The Buildpack Recap
The buildpack is a low-level primitive on the Heroku platform, developed over years of running applications and three separate platform stack releases. It gives you the ability to have fine-grained control over how your application is compiled to be run. If you need to execute something that Heroku doesn’t support yet, a custom buildpack is a great place to start looking. Remember, though, that an application’s config variables are not available to the buildpack at compile time. It’s easy to forget, so don’t say you weren’t warned.
Most applications won’t ever need to use a custom buildpack, but understanding how the system works and having the background to utilize them if you need to is invaluable.
About the Authors
Neil Middleton has been developing web applications for over 15 years across a variety of industries and technologies. Now working for a popular agency in the south of England, Neil primarily spends his time writing Ruby applications and deploying to Heroku - the cloud platform provider. Neil is a massive fan of keeping things simple and straightforward.
Richard Schneeman has been writing Ruby on Rails apps since version 0.9. He works for Heroku on the Ruby Task Force and is responsible for the Ruby buildpack. He teaches Ruby at the University of Texas. Richard loves elegant solutions and beautiful code.
The animal on the cover of Heroku: Up and Running is a Japanese Waxwing (Bombycilla japonica), a fairly small passerine bird of the waxwing family found in Russia and northeast Asia.
The Japanese Waxwing is about 18 cm in length and its plumage is mostly pinkish-brown. It has a pointed crest, a black throat, a black stripe through the eye, a pale yellow center to the belly, and a black tail with a red tip. Unlike the other species of waxwing, it lacks the row of waxy red feathertips on the wing, which gives the birds its name.
The Japanese Waxwing’s call is a high-pitched trill, and it feeds mainly on fruit and berries and also eats insects during the summer. There is little available information about breeding and nesting behavior—its courtship displays are probably similar to those of the other waxwings, performed with raised crest and fluffed gray rump feathers.
Changes in its habitat, use of pesticides, and other control measures from commercial fruit-growers have caused declines in population. This species is currently considered “near threatened” due to loss and degradation of its forest habitat.
The cover image is from Wood’s Natural History. The cover fonts are URW Typewriter and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe Myriad Condensed; and the code font is Dalton Maag’s Ubuntu Mono.