Unicorn Configuration and Zero Downtime Deployment - Reliably Deploying Rails Applications: Hassle free provisioning, reliable deployment (2014)

Reliably Deploying Rails Applications: Hassle free provisioning, reliable deployment (2014)

16.0 - Unicorn Configuration and Zero Downtime Deployment

Overview

In this section we’ll look briefly at what the Unicorn server is and how it fits into the process of serving a request. We’ll then look at the configuration provided in the sample code before moving onto how zero downtime works and how to troubleshoot it when there are problems.

Unicorn and the request flow

Unicorn is a Ruby HTTP Server, specifically it’s a HTTP server designed for Rack applications, Rack is what Rails uses for its HTTP handling.

If you look at the Unicorn documentation, you’ll see that it states it is designed “to only serve fast clients on low-latency, high-bandwidth connections”. This might initially seem unreasonably picky! Surely we have no control over how fast our clients are?!

In practice what is meant is that Unicorn is not designed to communicate with the end user directly. Unicorn expects that something like Nginx will be responsible for dealing with requests from the user and when required, Nginx will request a response from the Rails app via Unicorn and then deal with returning that from the user. This means that Unicorn only needs to be good at serving requests very quickly, over very high bandwidth connections (either locally or a LAN) rather than the complexities of dealing with slow requests or queuing up large numbers of requests. This fits with the Ruby philosophy we’re used to, each component of the system should do one thing, very well.

So when a request comes into your server, it first goes to Nginx, if the request is for a static asset (e.g. anything in the public folder), Nginx deals with the request directly. If it is for your Rails application, it proxies the request back to Unicorn. This behaviour is not automatic or hidden, it’s defined in your Nginx Virtualhost. More information on these is in 17.0.

Unicorn is what’s called a preforking server, this means that we we start the Unicorn server, it will create a master process and multiple child “worker” processes”. The master process is responsible for receiving requests and then passing them to a worker, in simple terms the master process maintains a queue of requests and passes them to worker processes as they become available (e.g. finish processing their previous request).

If you want to understand more about the benefits of a pre-forking server such as Unicorn, this post on why Github switched to Unicorn is well worth reading https://github.com/blog/517-unicorn. For more about the process of creating and managing worker processes, this posthttp://tomayko.com/writings/unicorn-is-unix includes some great examples and extracts from the Unicorn source to make it clearer.

While there’s no need to understand the details of Unicorn’s internals in detail, a basic understanding that Unicorn:

· Uses multiple Unix processes, one for the master and one for each worker

· Makes use of Unix signals http://en.wikipedia.org/wiki/Unix_signal to control these processes

Will make troubleshooting issues around zero downtime deployment much easier.

In general, we start the master process and this master process takes care of spawning and managing the worker processes.

Basic Configuration

The basic Unicorn configuration is copied to the remote server when we run cap deploy:setup_config. It is stored in shared/unicorn.rb

1 root = "<%= current_path %>"

2 working_directory root

3 pid "#{root}/tmp/pids/unicorn.pid"

4 stderr_path "#{root}/log/unicorn.log"

5 stdout_path "#{root}/log/unicorn.log"

6

7 listen "/tmp/unicorn.<%= fetch(:full_app_name) %>.sock"

8 worker_processes <%= fetch(:unicorn_worker_count) %>

9 timeout 40

10 ...

We set the working directory for Unicorn to be the path of the release, e.g. /home/deploy/APP_NAME/current.

We have a pid file written to the tmp/pids sub-directory of our app root. Notice that the tmp/pids directory is included in linked_dirs in our deploy.rb file. This means that our pid is stored in the shared folder and so will persist across deploys. This is particularly important when setting up zero downtime deploys as the contents of current will change but we will still need access to the existing pids.

We then set both errors and standard logging output to be stored in log/unicorn.log. If you prefer you can setup separate logfiles for errors and logging output.

The listen command sets up the unicorn master process to accept connections on a unix socket stored in /tmp. A socket is a special type of unix file used for inter process communication. In our case it will be used for allowing Nginx and the Unicorn master process to communicate. The socket will be named unicorn.OUR_APP_NAME.sock. The .sock is a convention to make it easy to identify the file as a socket and the use of our app name prevents collisions if we decide to run multiple Rails app on the same server.

worker_processes sets the number of worker processes as per our stage files.

Finally timeout sets the maximum length a request will be allowed to run for before being killed and a 500 error returned. In the sample configuration this is set to 40 seconds. This is generally too generous for a modern web application, typically a value of 15 or below is acceptable. In this case the long timeout is because it’s not unusual for apps being put into production for the first time to have some “Rough Edges” with a few requests, often admin ones, taking a long time. A tight timeout value to start with can make getting set up frustrating. Once your app is up and running smoothly, I’d suggest decreasing this based on the longest you’d expect a request to your specific app to take, plus a margin of 25 - 50% for error.

Unix Signals

We mentioned earlier that Unicorn uses Unix signals to allow for communication with both the master process and the individual worker processes. In general we will only ever communicate directly with the master process which is then responsible for sending appropriate signals onto the work processes.

The Unix command for sending signals to processes is:

1 kill -signal pid

Where signal is the signal to be sent and pid is the process id of the recipient process. This is often confusing because kill is generally associated with terminating processes but we can see from it’s man page that it’s more versatile:

1 The kill utility sends a signal to the processes specified by the pid op-

2 erand(s).

It’s worth getting familiar with the key types of signal which the Unicorn master process responds to here http://unicorn.bogomips.org/SIGNALS.html.

Something to be aware of is that its use of signals is not entirely standard. Specifically it is standard for the QUIT signal to be used to tell a process to exit immediately TERM to trigger a graceful shut down, e.g. to allow the processes to go through their normal shut down process and clean up after themselves.

Unicorn however uses QUIT to trigger a graceful shut down, which allows any workers to finish processing the current request before shutting down. TERM is used to immediately kill all worker processes and then immediately stop the master process.

In general this does not effect us as we will use an init.d script for interacting with the master process. Understanding this difference does however make debugging problems with the init.d script should they arise, easier.

Init script

This is the primary script for starting, stopping and restarting the unicorn workers.

In a perfect world, we have no direct interaction with this script at all, the deploy:restart Capistrano task takes care of calling it after deploys and, like any other Capistrano task, we can invoke it on it’s own if needed. As with all other processes, we leave Monit to take care of restarting it if there are problems.

In practice however there will undoubtedly be times when you’re SSH’d into a server and need to quickly start or restart the Unicorn workers so it’s worth getting familiar with it directly.

The unicorn init script is created when we run cap deploy:setup_config. It is stored in shared/unicorn_init.sh and symlinked to /etc/init.d/unicorn_YOUR_APP_NAME.

Basic usage is simple:

1 /etc/init.d/unicorn_YOUR_APP_NAME COMMAND

Where command will generally be one of start, stop and restart, force-stop.

The restart command is covered in detail below in the Zero Downtime Deployment section but first we’ll take a brief look at what the start, stop and force-stop command are doing.

Each of these commands makes use of the sig function defined in the init script:

1 sig () {

2 test -s "$PID" && kill -$1 `cat $PID`

3 }

This tests to see if the Pidfile exists and if so sends the supplied signal to the process specified in it.

Start

Defined in the init script as:

1 start)

2 sig 0 && echo >&2 "Already running" && exit 0

3 run "$CMD"

4 ;;

Invoked with /etc/init.d/unicorn_YOUR_APP_NAME start.

This invokes the sig function but with a signal of 0. In practice this just looks to see if there is a Pidfile, which indicates that Unicorn is already running, if so sig will return 0 (success) since kill with a signal of 0 sends no signal. This means that if a Pidfile already exists, the start command will output “Already Running” and then exit.

If on the other hand no Pidfile exists, $CMD, which is the Unicorn start command, is executed and Unicorn is started.

Stop

Defined in the init script as:

1 stop)

2 sig QUIT && exit 0

3 echo >&2 "Not running"

4 ;;

Invoked with /etc/init.d/unicorn_YOUR_APP_NAME stop.

This sends the “QUIT” signal to the Unicorn master process (if the Pidfile exists) which the Unicorn manual (http://unicorn.bogomips.org/SIGNALS.html) explains signals a:

1 graceful shutdown, waits for workers to finish their current request before f\

2 inishing.

If the Pidfile does not exist then it outputs the text “Not running” and exits.

Force-stop

Defined in the init script as:

1 force-stop)

2 sig TERM && exit 0

3 echo >&2 "Not running"

4 ;;

Invoked with /etc/init.d/unicorn_YOUR_APP_NAME force-stop.

This operates in the same way as stop except it sends the TERM signal. which the Unicorn manual explains signals a:

1 quick shutdown, kills all workers immediately

Zero Downtime Deployment

After deploying a new version of your application code, the simplest way to reload the code is to issue the stop command followed by the start command. The downside of this is that there will be a period while your Rails app is starting up, during which your site is unavailable. For larger Rails applications this startup time can be significant, sometimes several minutes, which makes deploying regularly less attractive.

Zero downtime deployment with Unicorn allows us to do the following:

· Deploy New Code

· Start a new Unicorn master process (and associated workers) with the new code, without stopping the existing master

· Only once the new master (and it’s workers) has loaded, stop the old one and start sending requests to the new one

This means that we can deploy without our users experiencing any downtime at all.

In our init script, this is taken care of be the restart task:

1 sig USR2 && echo reloaded OK && exit 0

2 echo >&2 "Couldn't reload, starting '$CMD' instead"

3 run "$CMD"

4 ;;

Invoked with /etc/init.d/unicorn_YOUR_APP_NAME restart.

This sends the USR2 signal which the Unicorn manual states:

1 re-execute the running binary. A separate QUIT should be sent to the original \

2 process once the child is verified to be up and running.

This takes care of starting the new master process, once the new master has started, both the old master and the new master will be running and processing requests. Note that because of the before_fork block below, we never actually have the scenario where some requests are being processed by workers running the old code and some running the new.

As described in the above extract from the manual, we must take care of killing the original (old) master once the new one has started. This is handled by the before_fork block in our Unicorn config file (unicorn.rb on the server, unicorn.rb.erb in our config/deploy/shared directory locally):

1 before_fork do |server, worker|

2 defined?(ActiveRecord::Base) and

3 ActiveRecord::Base.connection.disconnect!

4 # Quit the old unicorn process

5 old_pid = "#{server.config[:pid]}.oldbin"

6 if File.exists?(old_pid) && server.pid != old_pid

7 puts "We've got an old pid and server pid is not the old pid"

8 begin

9 Process.kill("QUIT", File.read(old_pid).to_i)

10 puts "killing master process (good thing tm)"

11 rescue Errno::ENOENT, Errno::ESRCH

12 puts "unicorn master already killed"

13 end

14 end

15 end

The before_fork block is called when the master process has finished loading, just before it forks the worker processes.

The block defined above begins by gracefully closing any open ActiveRecord connections. It then checks to see if we have a Pidfile with the .oldbin extension, this is automatically created by Unicorn when handling a USR2 restart. If so it sends the “QUIT” signal to the old master process to shut it down gracefully. Once this completes, our requests are being handled by just the new master process and its workers, running our updated application code, with no interruption for people using the site.

The final section of our Unicorn config file (unicorn.rb) defines an after_fork block which is run once by each worker, once the master process finishes forking it:

1 after_fork do |server, worker|

2 port = 5000 + worker.nr

3 child_pid = server.config[:pid].sub('.pid', ".#{port}.pid")

4 system("echo #{Process.pid} > #{child_pid}")

5 defined?(ActiveRecord::Base) and

6 ActiveRecord::Base.establish_connection

7 end

This creates Pidfiles for the forked worker process so that we can monitor it individually with Monit. It also establishes an ActiveRecord connection for the new worker process.

Gemfile Reloading

There are two very common problems which are complained of when using the many example Unicorn zero downtime configurations available:

· New or updated gems aren’t loaded, so whenever the Gemfile is changed, the application has to be stopped and started again.

· Zero downtime fails every fifth or so deploy and the application has to be manually started and stopped. It then works again for five or so deploys and the cycle repeats.

Both of these are generally caused by not setting the BUNDLE_GEMFILE environment variable when when a USR2 restart takes place.

If this is not specified then the Gemfile path from when the master process was first started will be used. Initially it may seem like this is fine, our Gemfile path is always going to be /home/deploy/apps/APP_NAME/current/Gemfile therefore surely this should work correctly?

In practice however this is not the case. When deploying with Capistrano, the code is stored in /deploy/apps/APP_NAME/releases/DATESTAMP for example /deploy/apps/APP_NAME/releases/20140324162017 and the current directory is a symlink which points to one of these release directories.

The Gemfile path which will be used by the Unicorn process is the resolved symlink path, e.g. /deploy/apps/APP_NAME/releases/20140324162017 rather than the current directory. This means that if we don’t explicitly specify that the Gemfile in current should be used it will always use the one from the release in which the master process was first started.

In deploy.rb we use this set :keep_releases, 5 to have Capistrano delete all releases except the five most recent ones. This means that if we’re not setting the Gemfile path back to current on every deploy, we’ll eventually delete the release which contains the Gemfile Unicorn is referencing, this will prevent a new master process from starting and you’ll see a

1 Gemfile Not Found

Exception in your unicorn.log file.

We avoid the above two problems with the following before_exec block in our Unicorn configuration file:

1 # Force unicorn to look at the Gemfile in the current_path

2 # otherwise once we've first started a master process, it

3 # will always point to the first one it started.

4 before_exec do |server|

5 ENV['BUNDLE_GEMFILE'] = "<%= current_path %>/Gemfile"

6 end

before_exec is run before Unicorn starts the new master process so setting the BUNDLE_GEMFILE environment variable here ensures it will always be set to the current directory not a release one.

Troubleshooting Process for Zero Downtime Deployment

Zero downtime deployment can be tricky to debug, the following process is a good starting point for debugging problems with code not reloading or the application appearing not to restart:

· Open one terminal window and ssh into the remote server as the deploy user

· Navigate to ~/apps/YOUR_APP_NAME/shared/logs

· Execute tail -f unicorn.log to show the Unicorn log in real time

· In a second terminal window ssh into the remote server as the deploy user

· In the second terminal execute sudo /etc/init.d/unicorn_YOUR_APP_NAME restart (check the naming by looking in the `/etc/init.d/ directory)

· Now watch the log in the first terminal, you should see output which includes Refreshing Gem List and eventually (this may take several minutes) killing master process (good thing tm)

If this fails with an exception, the exception should give good pointers as to the source of the problem.

If this works but restarts after deploys are still not working then repeat the process but in the second terminal, instead of ssh’ing into the remote server, execute cap STAGE deploy:restart and watch the log to see if behaviour is different.