Ansible: Up and Running (2015)
Chapter 9. Making Ansible Go Even Faster
In this chapter, we will discuss strategies for reducing the time it takes Ansible to execute playbooks.
SSH Multiplexing and ControlPersist
If you’ve made it this far in the book, you know that Ansible uses SSH as its primary transport mechanism for communicating with servers. In particular, Ansible will use the system SSH program by default.
Because the SSH protocol runs on top of the TCP protocol, when you make a connection to a remote machine with SSH, you need to make a new TCP connection. The client and server have to negotiate this connection before you can actually start doing useful work. The negotiation takes a small amount of time.
When Ansible runs a playbook, it will make many SSH connections, in order to do things such as copy over files and run commands. Each time Ansible makes a new SSH connection to a host, it has to pay this negotiation penalty.
OpenSSH is the most common implementation of SSH and is almost certainly the SSH client you have installed on your local machine if you are on Linux or Mac OS X. OpenSSH supports an optimization called SSH multiplexing, which is also referred to as ControlPersist. When you use SSH multiplexing, then multiple SSH sessions to the same host will share the same TCP connection, so the TCP connection negotiation only happens the first time.
When you enable multiplexing:
§ The first time you try to SSH to a host, OpenSSH starts a master connection.
§ OpenSSH creates a Unix domain socket (known as the control socket) that is associated with the remote host.
§ The next time you try to SSH to a host, OpenSSH will use the control socket to communicate with the host instead of making a new TCP connection.
The master connection stays open for a user-configurable amount of time, and then the SSH client will terminate the connection. Ansible uses a default of 60 seconds.
Manually Enabling SSH Multiplexing
Ansible automatically enables SSH multiplexing, but to give you a sense of what’s going on behind the scenes, let’s work through the steps of manually enabling SSH multiplexing and using it to SSH to a remote machine.
Example 9-1 shows an example of an entry in the ~/.ssh/config file for myserver.example.com, which is configured to use SSH multiplexing.
Example 9-1. ssh/config for enabling ssh multiplexing
Host myserver.example.com
ControlMaster auto
ControlPath /tmp/%r@%h:%p
ControlPersist 10m
The ControlMaster auto line enables SSH multiplexing, and it tells SSH to create the master connection and the control socket if it does not exist yet.
The ControlPath /tmp/%r@%h:%p line tells SSH where to put the control Unix domain socket file on the file system. %h is the target host name, %r is the remote login username, and %p is the port. If we SSH as the Ubuntu user:
$ ssh ubuntu@myserver.example.com
Then SSH will create the control socket at /tmp/ubuntu@myserver.example.com:22 the first time you SSH to the server.
The ControlPersist 10m line tells SSH to close the master connection if there have been no SSH connections for 10 minutes.
You can check if a master connection is open using the -O check flag:
$ ssh -O check ubuntu@myserver.example.com
It will return output like this if the control master is running:
Master running (pid=4388)
Here’s what the control master process looks like if you do ps 4388:
PID TT STAT TIME COMMAND
4388 ?? Ss 0:00.00 ssh: /tmp/ubuntu@myserver.example.com:22 [mux]
You can also terminate the master connection using the -O exit flag, like this:
$ ssh -O exit ubuntu@myserver.example.com
You can see more details about these settings on the ssh_config man page.
I tested out the speed of making an SSH connection like this:
$ time ssh ubuntu@myserver.example.com /bin/true
This will time how long it takes to indicate an SSH connection to the server and run the /bin/true program, which simply exits with a 0 return code.
The first time I ran it, the timing part of the output looked like this:1
0.01s user 0.01s system 2% cpu 0.913 total
The time we really care about is the total time: 0.913 total. This tells us it took 0.913 seconds to execute the whole command. (Total time is also sometimes called wall-clock time, since it’s how much time elapsed if we were measuring the time on the clock on the wall.)
The second time, the output looked like this:
0.00s user 0.00s system 8% cpu 0.063 total
The total time went down to 0.063s, for a savings of about 0.85s for each SSH connection after the first one. Recall that Ansible uses at least two SSH sessions to execute each task: one session to copy the module file to the host, and another session to execute the host.2 This means that SSH multiplexing should save you on the order of one or two seconds for each task that runs in your playbook.
SSH Multiplexing Options in Ansible
Ansible uses the options for SSH multiplexing shown in Table 9-1.
Option |
Value |
ControlMaster |
auto |
ControlPath |
$HOME/.ansible/cp/ansible-ssh-%h-%p-%r |
ControlPersist |
60s |
Table 9-1. Ansible’s SSH multiplexing options |
I’ve never needed to change Ansible’s default ControlMaster or ControlPersist values. However, I have needed to change the value for the ControlPath option. That’s because the operating system sets a maximum length on the path of a Unix domain socket, and if the ControlPathstring is too long, then multiplexing won’t work. Unfortunately, Ansible won’t tell you if the ControlPath string is too long; it will simply run without using SSH multiplexing.
You can test it out on your control machine by manually trying to SSH using the same ControlPath that Ansible would use:
$ CP=~/.ansible/cp/ansible-ssh-%h-%p-%r
$ ssh -o ControlMaster=auto -o ControlPersist=60s \
-o ControlPath=$CP \
ubuntu@ec2-203-0-113-12.compute-1.amazonaws.com \
/bin/true
If the ControlPath is too long, you’ll see an error that looks like Example 9-2.
Example 9-2. ControlPath too long
ControlPath
"/Users/lorinhochstein/.ansible/cp/ansible-ssh-ec2-203-0-113-12.compute-1.amazonaws.
com-22-ubuntu.KIwEKEsRzCKFABch"
too long for Unix domain socket
This is a common occurrence when connecting to Amazon EC2 instances, because EC2 uses long hostnames.
The workaround is to configure Ansible to use a shorter ControlPath. The official documentation recommends setting this option in your ansible.cfg file:
[ssh_connection]
control_path = %(directory)s/%%h-%%r
Ansible sets %(directory)s to $HOME/.ansible.cp, and the double percent signs (%%) are needed to escape these characters because percent signs are special characters for files in .ini format.
WARNING
If you have SSH multiplexing enabled, and you change a configuration of your SSH connection, say by modifying the ssh_args configuration option, this change won’t take effect if the control socket is still open from a previous connection.
Pipelining
Recall how Ansible executes a task:
1. It generates a Python script based on the module being invoked.
2. Then it copies the Python script to the host.
3. Finally, it executes the Python script.
Ansible supports an optimization called pipelining, where it will execute the Python script by piping it to the SSH session instead of copying it. This saves time because it tells Ansible to use one SSH session instead of two.
Enabling Pipelining
Pipelining is off by default because it can require some configuration on your remote hosts, but I like to enable it because it speeds up execution. To enable it, modify your ansible.cfg file as shown in Example 9-3.
Example 9-3. ansible.cfg Enable pipelining
[defaults]
pipelining = True
Configuring Hosts for Pipelining
For pipelining to work, you need to make sure that the requiretty is not enabled in your /etc/sudoers file on your hosts. Otherwise, you’ll get errors that look like Example 9-4 when you run your playbook.
Example 9-4. Error when requiretty is enabled
failed: [vagrant1] => {"failed": true, "parsed": false}
invalid output was: sudo: sorry, you must have a tty to run sudo
If sudo on your hosts is configured to read files from the /etc/sudoers.d, then the simplest way to resolve this is to add a sudoers config file that disables the requiretty restriction for the user you use to SSH with.
If the /etc/sudoers.d directory is present, then your hosts should support adding sudoers config files in that directory. You can use the ansible command-line tool to check if it’s there:
$ ansible vagrant -a "file /etc/sudoers.d"
If the directory is present, the output will look like this:
vagrant1 | success | rc=0 >>
/etc/sudoers.d: directory
vagrant3 | success | rc=0 >>
/etc/sudoers.d: directory
vagrant2 | success | rc=0 >>
/etc/sudoers.d: directory
If the directory is not present, the output will look like this:
vagrant3 | FAILED | rc=1 >>
/etc/sudoers.d: ERROR: cannot open `/etc/sudoers.d' (No such file or
directory)
vagrant2 | FAILED | rc=1 >>
/etc/sudoers.d: ERROR: cannot open `/etc/sudoers.d' (No such file or
directory)
vagrant1 | FAILED | rc=1 >>
/etc/sudoers.d: ERROR: cannot open `/etc/sudoers.d' (No such file or
directory)
If the directory is present, create a template file that looks like Example 9-5.
Example 9-5. templates/disable-requiretty.j2
Defaults:{{ ansible_ssh_user }} !requiretty
Then run the playbook shown in Example 9-6, replacing myhosts with your hosts. Don’t forget to disable pipelining before you do this, or the playbook will fail with an error.
Example 9-6. disable-requiretty.yml
- name: do not require tty for ssh-ing user
hosts: myhosts
sudo: True
tasks:
- name: Set a sudoers file to disable tty
template: >
src=templates/disable-requiretty.j2
dest=/etc/sudoers.d/disable-requiretty
owner=root group=root mode=0440
validate="visudo -cf %s"
Note the use of validate="visudo -cf %s". See “Validating Files” for a discussion of why it’s a good idea to use validation when modifying sudoers files.
Fact Caching
If your play doesn’t reference any Ansible facts, you can turn off fact gathering for that play. Recall that you can disable fact gathering with the gather_facts clause in a play, for example:
- name: an example play that doesn't need facts
hosts: myhosts
gather_facts: False
tasks:
# tasks go here:
You can disable fact gathering by default by adding the following to your ansible.cfg file:
[defaults]
gathering = explicit
If you write plays that do reference facts, you can use fact caching so that Ansible gathers facts for a host only once, even if you rerun the playbook or run a different playbook that connects to the same host.
If fact caching is enabled, Ansible will store facts in a cache the first time it connects to hosts. For subsequent playbook runs, Ansible will look up the facts in the cache instead of fetching them from the remote host, until the cache expires.
Example 9-7 shows the lines you must add to your ansible.cfg file to enable fact caching. The fact_caching_timeout value is in seconds, and the example uses a 24-hour (86,400 second) timeout.
WARNING
As with all caching-based solutions, there’s always the danger of the cached data becoming stale. Some facts, such as the CPU architecture (stored in the ansible_architecture fact), are unlikely to change often. Others, such as the date and time reported by the machine (stored in the ansible_date_time fact), are guaranteed to change often.
If you decide to enable fact caching, make sure you know how quickly the facts used in your playbook are likely to change, and set an appropriate fact caching timeout value. If you want to clear the fact cache before running a playbook, pass the --flush-cache flag to ansible-playbook.
Example 9-7. ansible.cfg Enable fact caching
[defaults]
gathering = smart
# 24-hour timeout, adjust if needed
fact_caching_timeout = 86400
# You must specify a fact caching implementation
fact_caching = ...
Setting the gathering configuration option to “smart” in ansible.cfg tells Ansible to use smart gathering. This means that Ansible will only gather facts if they are not present in the cache or if the cache has expired.
NOTE
If you want to use fact caching, make sure your playbooks do not explicitly specify gather_facts: True or gather_facts: False. With smart gathering enabled in the configuration file, Ansible will gather facts only if they are not present in the cache.
You must explicitly specify a fact_caching implementation in ansible.cfg, or Ansible will not cache facts between playbook runs.
As of this writing, there are three fact-caching implementations:
§ JSON files
§ Redis
§ Memcached
JSON File Fact-Caching Backend
With the JSON file fact-caching backend, Ansible will write the facts it gathers to files on your control machine. If the files are present on your system, it will use those files instead of connecting to the host and gathering facts.
To enable the JSON fact-caching backend, add the settings in Example 9-8 to your ansible.cfg file.
Example 9-8. ansible.cfg with JSON fact caching
[defaults]
gathering = smart
# 24-hour timeout, adjust if needed
fact_caching_timeout = 86400
# JSON file implementation
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_fact_cache
Use the fact_caching_connection configuration option to specify a directory where Ansible should write the JSON files that contain the facts. If the directory does not exist, Ansible will create it.
Ansible uses the file modification time to determine whether the fact-caching timeout has occurred yet.
Redis Fact Caching Backend
Redis is a popular key-value data store that is often used as a cache. To enable fact caching using the Redis backend, you need to:
1. Install Redis on your control machine.
2. Ensure the Redis service is running on the control machine.
3. Install the Python Redis package.
4. Modify ansible.cfg to enable fact caching with Redis.
Example 9-9 shows how to configure ansible.cfg to use Redis as the cache backend.
Example 9-9. ansible.cfg with Redis fact caching
[defaults]
gathering = smart
# 24-hour timeout, adjust if needed
fact_caching_timeout = 86400
fact_caching = redis
Ansible needs the Python Redis package on the control machine, which you can install using pip:3
$ pip install redis
You must also install Redis and ensure that it is running on your control machine. If you are using OS X, you can install Redis using Homebrew. If you are using Linux, install Redis using your native package manager.
Memcached Fact Caching Backend
Memcached is another popular key-value data store that is often used as a cache. To enable fact caching using the Memcached backend, you need to:
1. Install Memcached on your control machine.
2. Ensure the Memcached service is running on the control machine.
3. Install the Python Memcached Python package.
4. Modify ansible.cfg to enable fact caching with Memcached.
Example 9-10 shows how to configure ansible.cfg to use Memcached as the cache backend.
Example 9-10. ansible.cfg with Memcached fact caching
[defaults]
gathering = smart
# 24-hour timeout, adjust if needed
fact_caching_timeout = 86400
fact_caching = memcached
Ansible needs the Python Memcached package on the control machine, which you can install using pip. You might need to sudo or activate a virtualenv, depending on how you installed Ansible on your control machine.
$ pip install python-memcached
You must also install Memcached and ensure that it is running on your control machine. If you are using OS X, you can install Memcached using Homebrew. If you are using Linux, install Memcached using your native package manager.
For more information on fact caching, check out the official documentation.
Parallelism
For each task, Ansible will connect to the hosts in parallel to execute the tasks. But Ansible doesn’t necessarily connect to all of the hosts in parallel. Instead, the level of parallelism is controlled by a parameter, which defaults to 5. You can change this default parameter in one of two ways.
You can set the ANSIBLE_FORKS environment variable, as shown in Example 9-11.
Example 9-11. Setting ANSIBLE_FORKS
$ export ANSIBLE_FORKS=20
$ ansible-playbook playbook.yml
You can modify the Ansible configuration file (ansible.cfg) by setting a forks option in the defaults section, as shown in Example 9-12.
Example 9-12. ansible.cfg Configuring number of forks
[defaults]
forks = 20
Accelerated Mode
Ansible supports a connection mode called accelerated mode. This feature is older than pipelining, and the official documentation recommends using pipelining instead of accelerated mode, unless your environment prevents you from enabling pipelining. For more details on accelerated mode, see the official documentation.
Fireball Mode
Fireball mode is a deprecated Ansible feature that was previously used to improve performance. It was replaced by accelerated mode.
You should now know how to configure SSH multiplexing, pipelining, fact caching, and parallelism in order to get your playbooks to run more quickly. Next, we’ll discuss writing your own Ansible modules.
1 The output format may look different depending on your shell and OS. I’m running zsh on Mac OS X.
2 One of these steps can be optimized away using pipelining, described later in this chapter.
3 You may need to sudo or activate a virtualenv, depending on how you installed Ansible on your control machine