Inventory: Describing Your Servers - Ansible: Up and Running (2015)

Ansible: Up and Running (2015)

Chapter 3. Inventory: Describing Your Servers

So far, we’ve been working with only one server (or host, as Ansible calls it). In reality, you’re going to be managing multiple hosts. The collection of hosts that Ansible knows about is called the inventory.

The Inventory File

The default way to describe your hosts in Ansible is to list them in text files, called inventory files. A very simple inventory file might just contain a list of hostnames, as shown in Example 3-1.

Example 3-1. A very simple inventory file

ontario.example.com

newhampshire.example.com

maryland.example.com

virginia.example.com

newyork.example.com

quebec.example.com

rhodeisland.example.com

NOTE

Ansible uses your local SSH client by default, which means that it will understand any aliases that you set up in your SSH config file. This does not hold true if you configure Ansible to use the Paramiko connection plug-in instead of the default SSH plug-in.

There is one host that Ansible automatically adds to the inventory by default: localhost. Ansible understands that localhost refers to your local machine, so it will interact with it directly rather than connecting by SSH.

WARNING

Although Ansible adds the localhost to your inventory automatically, you have to have at least one other host in your inventory file; otherwise, ansible-playbook will terminate with the error:

ERROR: provided hosts list is empty

In the case where you have no other hosts in your inventory file, you can explicitly add an entry for localhost like this:

localhost ansible_connection=local

Preliminaries: Multiple Vagrant Machines

To talk about inventory, we need to interact with multiple hosts. Let’s configure Vagrant to bring up three hosts. We’ll unimaginatively call them vagrant1, vagrant2, and vagrant3.

Before you modify your existing Vagrantfile, make sure you destroy your existing virtual machine by running:

$ vagrant destroy --force

If you don’t include the --force option, Vagrant will prompt you to confirm that you want to destroy the virtual machine.

Next, edit your Vagrantfile so it looks like Example 3-2.

Example 3-2. Vagrantfile with three servers

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

# Use the same key for each machine

config.ssh.insert_key = false

config.vm.define "vagrant1" do |vagrant1|

vagrant1.vm.box = "ubuntu/trusty64"

vagrant1.vm.network "forwarded_port", guest: 80, host: 8080

vagrant1.vm.network "forwarded_port", guest: 443, host: 8443

end

config.vm.define "vagrant2" do |vagrant2|

vagrant2.vm.box = "ubuntu/trusty64"

vagrant2.vm.network "forwarded_port", guest: 80, host: 8081

vagrant2.vm.network "forwarded_port", guest: 443, host: 8444

end

config.vm.define "vagrant3" do |vagrant3|

vagrant3.vm.box = "ubuntu/trusty64"

vagrant3.vm.network "forwarded_port", guest: 80, host: 8082

vagrant3.vm.network "forwarded_port", guest: 443, host: 8445

end

end

Vagrant 1.7+ defaults to using a different SSH key for each host. Example 3-2 contains the line to revert to the earlier behavior of using the same SSH key for each host:

config.ssh.insert_key = false

Using the same key on each host simplifies our Ansible setup because we can specify a single SSH key in the ansible.cfg file. You’ll need to edit the host_key_checking value in your ansible.cfg. Your file should look like Example 3-3.

Example 3-3. ansible.cfg

[defaults]

hostfile = inventory

remote_user = vagrant

private_key_file = ~/.vagrant.d/insecure_private_key

host_key_checking = False

For now, we’ll assume each of these servers can potentially be a web server, so Example 3-2 maps ports 80 and 443 inside each Vagrant machine to a port on the local machine.

You should be able to bring up the virtual machines by running:

$ vagrant up

If all went well, the output should look something like this:

Bringing machine 'vagrant1' up with 'virtualbox' provider...

Bringing machine 'vagrant2' up with 'virtualbox' provider...

Bringing machine 'vagrant3' up with 'virtualbox' provider...

...

vagrant3: 80 => 8082 (adapter 1)

vagrant3: 443 => 8445 (adapter 1)

vagrant3: 22 => 2201 (adapter 1)

==> vagrant3: Booting VM...

==> vagrant3: Waiting for machine to boot. This may take a few minutes...

vagrant3: SSH address: 127.0.0.1:2201

vagrant3: SSH username: vagrant

vagrant3: SSH auth method: private key

vagrant3: Warning: Connection timeout. Retrying...

==> vagrant3: Machine booted and ready!

==> vagrant3: Checking for guest additions in VM...

==> vagrant3: Mounting shared folders...

vagrant3: /vagrant => /Users/lorinhochstein/dev/oreilly-ansible/playbooks

Let’s create an inventory file that contains these three machines.

First, we need to know what ports on the local machine map to the SSH port (22) inside of each VM. Recall we can get that information by running:

$ vagrant ssh-config

The output should look something like this:

Host vagrant1

HostName 127.0.0.1

User vagrant

Port 2222

UserKnownHostsFile /dev/null

StrictHostKeyChecking no

PasswordAuthentication no

IdentityFile /Users/lorinhochstein/.vagrant.d/insecure_private_key

IdentitiesOnly yes

LogLevel FATAL

Host vagrant2

HostName 127.0.0.1

User vagrant

Port 2200

UserKnownHostsFile /dev/null

StrictHostKeyChecking no

PasswordAuthentication no

IdentityFile /Users/lorinhochstein/.vagrant.d/insecure_private_key

IdentitiesOnly yes

LogLevel FATAL

Host vagrant3

HostName 127.0.0.1

User vagrant

Port 2201

UserKnownHostsFile /dev/null

StrictHostKeyChecking no

PasswordAuthentication no

IdentityFile /Users/lorinhochstein/.vagrant.d/insecure_private_key

IdentitiesOnly yes

LogLevel FATAL

We can see that vagrant1 uses port 2222, vagrant2 uses port 2200, and vagrant3 uses port 2201.

Modify your hosts file so it looks like this:

vagrant1 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2222

vagrant2 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2200

vagrant3 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2201

Now, make sure that you can access these machines. For example, to get information about the network interface for vagrant2, run:

$ ansible vagrant2 -a "ip addr show dev eth0"

On my machine, the output looks like this:

vagrant2 | success | rc=0 >>

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP

group default qlen 1000

link/ether 08:00:27:fe:1e:4d brd ff:ff:ff:ff:ff:ff

inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0

valid_lft forever preferred_lft forever

inet6 fe80::a00:27ff:fefe:1e4d/64 scope link

valid_lft forever preferred_lft forever

Behavioral Inventory Parameters

To describe our Vagrant machines in the Ansible inventory file, we had to explicitly specify the hostname (127.0.0.1) and port (2222, 2200, or 2201) that Ansible’s SSH client should connect to.

Ansible calls these variables behavioral inventory parameters, and there are several of them you can use when you need to override the Ansible defaults for a host (see Table 3-1).

Name

Default

Description

ansible_ssh_host

name of host

Hostname or IP address to SSH to

ansible_ssh_port

22

Port to SSH to

ansible_ssh_user

root

User to SSH as

ansible_ssh_pass

none

Password to use for SSH authentication

ansible_connection

smart

How Ansible will connect to host (see below)

ansible_ssh_private_key_file

none

SSH private key to use for SSH authentication

ansible_shell_type

sh

Shell to use for commands (see below)

ansible_python_interpreter

/usr/bin/python

Python interpreter on host (see below)

ansible_*_interpreter

none

Like ansible_python_interpreter for other languages (see below)

Table 3-1. Behavioral inventory parameters

For some of these options, the meaning is obvious from the name, but others require additional explanation.

ansible_connection

Ansible supports multiple transports, which are mechanisms that Ansible uses to connect to the host. The default transport, smart, will check to see if the locally installed SSH client supports a feature called ControlPersist. If the SSH client supports ControlPersist, Ansible will use the local SSH client. If the SSH client doesn’t support ControlPersist, then the smart transport will fall back to using a Python-based SSH client library called paramiko.

ansible_shell_type

Ansible works by making SSH connections to remote machines and then invoking scripts. By default, Ansible assumes that the remote shell is the Bourne shell located at /bin/sh, and will generate the appropriate command-line parameters that work with Bourne shell.

Ansible also accepts csh, fish, and (on Windows) powershell as valid values for this parameter. I’ve never encountered a need for changing the shell type.

ansible_python_interpreter

Because the modules that ship with Ansible are implemented in Python 2, Ansible needs to know the location of the Python interpreter on the remote machine. You might need to change this if your remote host does not have a Python 2 interpreter at /usr/bin/python. For example, if you are managing hosts that run Arch Linux, you will need to change this to /usr/bin/python2, because Arch Linux installs Python 3 at /usr/bin/python, and Ansible modules are not (yet) compatible with Python 3.

ansible_*_interpreter

If you are using a custom module that is not written in Python, you can use this parameter to specify the location of the interpreter (e.g., /usr/bin/ruby). We’ll cover this in Chapter 10.

Changing Behavioral Parameter Defaults

You can override some of the behavioral parameter default values in the [defaults] section of the ansible.cfg file (Table 3-2). Recall that we used this previously to change the default SSH user.

Behavioral inventory parameter

ansible.cfg option

ansible_ssh_port

remote_port

ansible_ssh_user

remote_user

ansible_ssh_private_key_file

private_key_file

ansible_shell_type

executable (see the following paragraph)

Table 3-2. Defaults that can be overridden in ansible.cfg

The ansible.cfg executable config option is not exactly the same as the ansible_shell_type behavioral inventory parameter. Instead, the executable specifies the full path of the shell to use on the remote machine (e.g., /usr/local/bin/fish). Ansible will look at the name of the base name of this path (in the case of /usr/local/bin/fish, the basename is fish) and use that as the default value for ansible_shell_type.

Groups and Groups and Groups

When performing configuration tasks, we typically want to perform actions on groups of hosts, rather than on an individual host.

Ansible automatically defines a group called all (or *), which includes all of the hosts in the inventory. For example, we can check if the clocks on the machines are roughly synchronized by running:

$ ansible all -a "date"

or

$ ansible '*' -a "date"

The output on my system looks like this:

vagrant3 | success | rc=0 >>

Sun Sep 7 02:56:46 UTC 2014

vagrant2 | success | rc=0 >>

Sun Sep 7 03:03:46 UTC 2014

vagrant1 | success | rc=0 >>

Sun Sep 7 02:56:47 UTC 2014

We can define our own groups in the inventory file. Ansible uses the .ini file format for inventory files. In the .ini format, configuration values are grouped together into sections.

Here’s how we would specify that our vagrant hosts are in a group called vagrant, along with the other example hosts we mentioned at the beginning of the chapter:

ontario.example.com

newhampshire.example.com

maryland.example.com

virginia.example.com

newyork.example.com

quebec.example.com

rhodeisland.example.com

[vagrant]

vagrant1 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2222

vagrant2 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2200

vagrant3 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2201

We could have also listed the vagrant hosts at the top, and then also in a group, like this:

maryland.example.com

newhampshire.example.com

newyork.example.com

ontario.example.com

quebec.example.com

rhodeisland.example.com

vagrant1 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2222

vagrant2 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2200

vagrant3 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2201

virginia.example.com

[vagrant]

vagrant1

vagrant2

vagrant3

Example: Deploying a Django App

Imagine you’re responsible for deploying a Django-based web application that processes long-running jobs. The app needs to support the following services:

§ The actual Django web app itself, run by a Gunicorn HTTP server.

§ An nginx web server, which will sit in front of Gunicorn and serve static assets.

§ A Celery task queue that will execute long-running jobs on behalf of the web app.

§ A RabbitMQ message queue that serves as the backend for Celery.

§ A Postgres database that serves as the persistent store.

TIP

In later chapters, we will work through a detailed example of deploying this kind of Django-based application, although our example won’t use Celery or RabbitMQ.

We need to deploy this application into different types of environments: production (the real thing), staging (for testing on hosts that our team has shared access to), and vagrant (for local testing).

When we deploy to production, we want the entire system to respond quickly and be reliable, so we:

§ Run the web application on multiple hosts for better performance and put a load balancer in front of them.

§ Run task queue servers on multiple hosts for better performance.

§ Put Gunicorn, Celery, RabbitMQ, and Postgres all on separate servers.

§ Use two Postgres hosts, a primary and a replica.

Assuming we have one load balancer, three web servers, three task queues, one RabbitMQ server, and two database servers, that’s 10 hosts we need to deal with.

For our staging environment, imagine that we want to use fewer hosts than we do in production in order to save costs, especially since the staging environment is going to see a lot less activity than production. Let’s say we decide to use only two hosts for staging; we’ll put the web server and task queue on one staging host, and RabbitMQ and Postgres on the other.

For our local vagrant environment, we decide to use three servers: one for the web app, one for a task queue, and one that will contain RabbitMQ and Postgres.

Example 3-4 shows a possible inventory file that groups our servers by environment (production, staging, vagrant) and by function (web server, task queue, etc.).

Example 3-4. Inventory file for deploying a Django app

[production]

delaware.example.com

georgia.example.com

maryland.example.com

newhampshire.example.com

newjersey.example.com

newyork.example.com

northcarolina.example.com

pennsylvania.example.com

rhodeisland.example.com

virginia.example.com

[staging]

ontario.example.com

quebec.example.com

[vagrant]

vagrant1 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2222

vagrant2 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2200

vagrant3 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2201

[lb]

delaware.example.com

[web]

georgia.example.com

newhampshire.example.com

newjersey.example.com

ontario.example.com

vagrant1

[task]

newyork.example.com

northcarolina.example.com

maryland.example.com

ontario.example.com

vagrant2

[rabbitmq]

pennsylvania.example.com

quebec.example.com

vagrant3

[db]

rhodeisland.example.com

virginia.example.com

quebec.example.com

vagrant3

We could have first listed all of the servers at the top of the inventory file, without specifying a group, but that isn’t necessary, and that would’ve made this file even longer.

Note that we only needed to specify the behavioral inventory parameters for the Vagrant instances once.

Aliases and Ports

We described our Vagrant hosts like this:

[vagrant]

vagrant1 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2222

vagrant2 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2200

vagrant3 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2201

The names vagrant1, vagrant2, and vagrant3 here are aliases. They are not the real hostnames, but instead are useful names for referring to these hosts.

Ansible supports doing <hostname>:<port> syntax when specifying hosts, so we could replace the line that contains vagrant1 with 127.0.0.1:2222.

However, we can’t actually run what you see in Example 3-5.

Example 3-5. This doesn’t work

[vagrant]

127.0.0.1:2222

127.0.0.1:2200

127.0.0.1:2201

The reason is that Ansible’s inventory can associate only a single host with 127.0.0.1, so the vagrant group would contain only one host instead of three.

Groups of Groups

Ansible also allows you to define groups that are made up of other groups. For example, both the web servers and the task queue servers will need to have Django and its dependencies. We might find it useful to define a “django” group that contains both of these two groups. You would add this to the inventory file:

[django:children]

web

task

Note that the syntax changes when you are specifying a group of groups, as opposed to a group of hosts. That’s so Ansible knows to interpret web and task as groups and not as hosts.

Numbered Hosts (Pets versus Cattle)

The inventory file shown in Example 3-4 looks complex. In reality, it describes only 15 different hosts, which doesn’t sound like a large number in this cloudy scale-out world. However, even dealing with 15 hosts in the inventory file can be cumbersome because each host has a completely different hostname.

Bill Baker of Microsoft came up with the distinction between treating servers as pets versus treating them like cattle.1 We give pets distinctive names, and we treat and care for them as individuals. On the other hand, when we discuss cattle, we refer to them by identification number.

The cattle approach is much more scalable, and Ansible supports it well by supporting numeric patterns. For example, if your 20 servers were named web1.example.com, web2.example.com, and so on, then you could specify them in the inventory file like this:

[web]

web[1:20].example.com

If you prefer to have a leading zero (e.g., web01.example.com), then specify a leading zero in the range, like this:

[web]

web[01:20].example.com

Ansible also supports using alphabetic characters to specify ranges. If you wanted to use the convention web-a.example.com, web-b.example.com, and so on, for your 20 servers, then you could do this:

[web]

web-[a-t].example.com

Hosts and Group Variables: Inside the Inventory

Recall how we specified behavioral inventory parameters for Vagrant hosts:

vagrant1 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2222

vagrant2 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2200

vagrant3 ansible_ssh_host=127.0.0.1 ansible_ssh_port=2201

Those parameters are variables that have special meaning to Ansible. We can also define arbitrary variable names and associated values on hosts. For example, we could define a variable named color and set it to a value for each server:

newhampshire.example.com color=red

maryland.example.com color=green

ontario.example.com color=blue

quebec.example.com color=purple

This variable can then be used in a playbook, just like any other variable.

Personally, I don’t often attach variables to specific hosts. On the other hand, I often associate variables with groups.

Circling back to our Django example, the web application and task queue service need to communicate with RabbitMQ and Postgres. We’ll assume that access to the Postgres database is secured both at the network layer (so only the web application and the task queue can reach the database) as well as by username and password, where RabbitMQ is secured only by the network layer.

To set everything up, we need to do the following:

§ Configure the web servers with the hostname, port, username, password of the primary postgres server, and name of the database.

§ Configure the task queues with the hostname, port, username, password of the primary postgres server, and the name of the database.

§ Configure the web servers with the hostname and port of the RabbitMQ server.

§ Configure the task queues with the hostname and port of the RabbitMQ server.

§ Configure the primary postgres server with the hostname, port, and username and password of the replica postgres server (production only).

This configuration info varies by environment, so it makes sense to define these as group variables on the production, staging, and vagrant groups.

Example 3-6 shows one way we can specify this information as group variables in the inventory file.

Example 3-6. Specifying group variables in inventory

[all:vars]

ntp_server=ntp.ubuntu.com

[production:vars]

db_primary_host=rhodeisland.example.com

db_primary_port=5432

db_replica_host=virginia.example.com

db_name=widget_production

db_user=widgetuser

db_password=pFmMxcyD;Fc6)6

rabbitmq_host=pennsylvania.example.com

rabbitmq_port=5672

[staging:vars]

db_primary_host=quebec.example.com

db_name=widget_staging

db_user=widgetuser

db_password=L@4Ryz8cRUXedj

rabbitmq_host=quebec.example.com

rabbitmq_port=5672

[vagrant:vars]

db_primary_host=vagrant3

db_primary_port=5432

db_primary_port=5432

db_name=widget_vagrant

db_user=widgetuser

db_password=password

rabbitmq_host=vagrant3

rabbitmq_port=5672

Note how group variables are organized into sections named [<group name>:vars].

Also note how we took advantage of the all group that Ansible creates automatically to specify variables that don’t change across hosts.

Host and Group Variables: In Their Own Files

The inventory file is a reasonable place to put host and group variables if you don’t have too many hosts. But as your inventory gets larger, it gets more difficult to manage variables this way.

Additionally, though Ansible variables can hold Booleans, strings, lists, and dictionaries, in an inventory file, you can specify only Booleans and strings.

Ansible offers a more scalable approach to keep track of host and group variables: You can create a separate variable file for each host and each group. Ansible expects these variable files to be in YAML format.

Ansible looks for host variable files in a directory called host_vars and group variable files in a directory called group_vars. Ansible expects these directories to be either in the directory that contains your playbooks or in the directory adjacent to your inventory file. In our case, those two directories are the same.

For example, if I had a directory containing my playbooks at /home/lorin/playbooks/ with an inventory file at /home/lorin/playbooks/hosts, then I would put variables for the quebec.example.com host in the file /home/lorin/playbooks/host_vars/quebec.example.com, and I would put variables for the production group in the file /home/lorin/playbooks/group_vars/production.

Example 3-7 shows what the /home/lorin/playbooks/group_vars/production file would look like.

Example 3-7. group_vars/production

db_primary_host: rhodeisland.example.com

db_replica_host: virginia.example.com

db_name: widget_production

db_user: widgetuser

db_password: pFmMxcyD;Fc6)6

rabbitmq_host:pennsylvania.example.com

Note that we could also use YAML dictionaries to represent these values, as shown in Example 3-8.

Example 3-8. group_vars/production, with dictionaries

db:

user: widgetuser

password: pFmMxcyD;Fc6)6

name: widget_production

primary:

host: rhodeisland.example.com

port: 5432

replica:

host: virginia.example.com

port: 5432

rabbitmq:

host: pennsylvania.example.com

port: 5672

If we choose YAML dictionaries, that changes the way we access the variables:

{{ db_primary_host }}

versus:

{{ db.primary.host }}

If you want to break things out even further, Ansible will allow you to define group_vars/production as a directory instead of a file, and let you place multiple YAML files that contain variable definitions.

For example, we could put the database-related variables in one file and the RabbitMQ-related variables in another file, as shown in Examples 3-9 and 3-10.

Example 3-9. group_vars/production/db

db:

user: widgetuser

password: pFmMxcyD;Fc6)6

name: widget_production

primary:

host: rhodeisland.example.com

port: 5432

replica:

host: virginia.example.com

port: 5432

Example 3-10. group_vars/production/rabbitmq

rabbitmq:

host: pennsylvania.example.com

port: 6379

In general, I find it’s better to keep things simple rather than split variables out across too many files.

Dynamic Inventory

Up until this point, we’ve been explicitly specifying all of our hosts in our hosts inventory file. However, you might have a system external to Ansible that keeps track of your hosts. For example, if your hosts run on Amazon EC2, then EC2 tracks information about your hosts for you, and you can retrieve this information through EC2’s web interface, its Query API, or through command-line tools such as awscli. Other cloud providers have similar interfaces. Or, if you’re managing your own servers and are using an automated provisioning system such as Cobbler or Ubuntu MAAS, then your provisioning system is already keeping track of your servers. Or, maybe you have one of those fancy configuration management databases (CMDBs) where all of this information lives.

You don’t want to manually duplicate this information in your hosts file, because eventually that file will not jibe with your external system, which is the true source of information about your hosts. Ansible supports a feature called dynamic inventory that allows you to avoid this duplication.

If the inventory file is marked executable, Ansible will assume it is a dynamic inventory script and will execute the file instead of reading it.

NOTE

To mark a file as executable, use the chmod +x command. For example:

$ chmod +x dynamic.py

The Interface for a Dynamic Inventory Script

An Ansible dynamic inventory script must support two command-line flags:

§ --host=<hostname> for showing host details

§ --list for listing groups

Showing host details

To get the details of the individual host, Ansible will call the inventory script like this:

$ ./dynamic.py --host=vagrant2

The output should contain any host-specific variables, including behavioral parameters, like this:

{ "ansible_ssh_host": "127.0.0.1", "ansible_ssh_port": 2200,

"ansible_ssh_user": "vagrant"}

The output is a single JSON object where the names are variable names, and the values are the variable values.

Listing groups

Dynamic inventory scripts need to be able to list all of the groups, and details about the individual hosts. For example, if our script is called dynamic.py, Ansible will call it like this to get a list of all of the groups:

$ ./dynamic.py --list

The output should look something like this:

{"production": ["delaware.example.com", "georgia.example.com",

"maryland.example.com", "newhampshire.example.com",

"newjersey.example.com", "newyork.example.com",

"northcarolina.example.com", "pennsylvania.example.com",

"rhodeisland.example.com", "virginia.example.com"],

"staging": ["ontario.example.com", "quebec.example.com"],

"vagrant": ["vagrant1", "vagrant2", "vagrant3"],

"lb": ["delaware.example.com"],

"web": ["georgia.example.com", "newhampshire.example.com",

"newjersey.example.com", "ontario.example.com", "vagrant1"]

"task": ["newyork.example.com", "northcarolina.example.com",

"ontario.example.com", "vagrant2"],

"rabbitmq": ["pennsylvania.example.com", "quebec.example.com", "vagrant3"],

"db": ["rhodeisland.example.com", "virginia.example.com", "vagrant3"]

}

The output is a single JSON object where the names are Ansible group names, and the values are arrays of host names.

As an optimization, the --list command can contain the values of the host variables for all of the hosts, which saves Ansible the trouble of making a separate --host invocation to retrieve the variables for the individual hosts.

To take advantage of this optimization, the --list command should return a key named _meta that contains the variables for each host, in this form:

"_meta" :

{ "hostvars" :

"vagrant1" : { "ansible_ssh_host": "127.0.0.1", "ansible_ssh_port": 2222,

"ansible_ssh_user": "vagrant"},

"vagrant2": { "ansible_ssh_host": "127.0.0.1", "ansible_ssh_port": 2200,

"ansible_ssh_user": "vagrant"},

...

}

Writing a Dynamic Inventory Script

One of the handy features of Vagrant is that you can see which machines are currently running using the vagrant status command. Assuming we had a Vagrant file that looked like Example 3-2, if we ran vagrant status, the output would look like Example 3-11.

Example 3-11. Output of vagrant status

$ vagrant status

Current machine states:

vagrant1 running (virtualbox)

vagrant2 running (virtualbox)

vagrant3 running (virtualbox)

This environment represents multiple VMs. The VMs are all listed

above with their current state. For more information about a specific

VM, run `vagrant status NAME`.

Because Vagrant already keeps track of machines for us, there’s no need for us to write a list of the Vagrant machines in an Ansible inventory file. Instead, we can write a dynamic inventory script that queries Vagrant about which machines are currently running.

Once we’ve set up a dynamic inventory script for Vagrant, even if we alter our Vagrantfile to run different numbers of Vagrant machines, we won’t need to edit an Ansible inventory file.

Let’s work through an example of creating a dynamic inventory script that retrieves the details about hosts from Vagrant.2

Our dynamic inventory script is going to need to invoke the vagrant status command. The output shown in Example 3-11 is designed for humans to read, rather than for machines to parse. We can get a list of running hosts in a format that is easier to parse with the --machine-readable flag, like so:

$ vagrant status --machine-readable

The output looks like this:

1410577818,vagrant1,provider-name,virtualbox

1410577818,vagrant1,state,running

1410577818,vagrant1,state-human-short,running

1410577818,vagrant1,state-human-long,The VM is running. To stop this VM%!(VAGRANT

_COMMA) you can run `vagrant halt` to\nshut it down forcefully%!(VAGRANT_COMMA)

or you can run `vagrant suspend` to simply\nsuspend the virtual machine. In

either case%!(VAGRANT_COMMA to restart it again%!(VAGRANT_COMMA)\nsimply run

`vagrant up`.

1410577818,vagrant2,provider-name,virtualbox

1410577818,vagrant2,state,running

1410577818,vagrant2,state-human-short,running

1410577818,vagrant2,state-human-long,The VM is running. To stop this VM%!(VAGRANT

_COMMA) you can run `vagrant halt` to\nshut it down forcefully%!(VAGRANT_COMMA)

or you can run `vagrant suspend` to simply\nsuspend the virtual machine. In

either case%!(VAGRANT_COMMA) to restart it again%!(VAGRANT_COMMA)\nsimply run

`vagrant up`.

1410577818,vagrant3,provider-name,virtualbox

1410577818,vagrant3,state,running

1410577818,vagrant3,state-human-short,running

1410577818,vagrant3,state-human-long,The VM is running. To stop this VM%!(VAGRANT

_COMMA) you can run `vagrant halt` to\nshut it down forcefully%!(VAGRANT_COMMA)

or you can run `vagrant suspend` to simply\nsuspend the virtual machine. In

either case%!(VAGRANT_COMMA) to restart it again%!(VAGRANT_COMMA)\nsimply

run `vagrant up`.

To get details about a particular Vagrant machine, say, vagrant2, we would run:

$ vagrant ssh-config vagrant2

The output looks like:

Host vagrant2

HostName 127.0.0.1

User vagrant

Port 2200

UserKnownHostsFile /dev/null

StrictHostKeyChecking no

PasswordAuthentication no

IdentityFile /Users/lorinhochstein/.vagrant.d/insecure_private_key

IdentitiesOnly yes

LogLevel FATAL

Our dynamic inventory script will need to call these commands, parse the outputs, and output the appropriate json. We can use the Paramiko library to parse the output of vagrant ssh-config. Here’s an interactive Python session that shows how to use the Paramiko library to do this:

>>> import subprocess

>>> import paramiko

>>> cmd = "vagrant ssh-config vagrant2"

>>> p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)

>>> config = paramiko.SSHConfig()

>>> config.parse(p.stdout)

>>> config.lookup("vagrant2")

{'identityfile': ['/Users/lorinhochstein/.vagrant.d/insecure_private_key'],

'loglevel': 'FATAL', 'hostname': '127.0.0.1', 'passwordauthentication': 'no',

'identitiesonly': 'yes', 'userknownhostsfile': '/dev/null', 'user': 'vagrant',

'stricthostkeychecking': 'no', 'port': '2200'}

NOTE

You will need to install the Python Paramiko library in order to use this script. You can do this with pip by running:

$ sudo pip install paramiko

Example 3-12 shows our complete vagrant.py script.

Example 3-12. vagrant.py

#!/usr/bin/env python

# Adapted from Mark Mandel's implementation

# https://github.com/ansible/ansible/blob/devel/plugins/inventory/vagrant.py

# License: GNU General Public License, Version 3 <http://www.gnu.org/licenses/>

import argparse

import json

import paramiko

import subprocess

import sys

def parse_args():

parser = argparse.ArgumentParser(description="Vagrant inventory script")

group = parser.add_mutually_exclusive_group(required=True)

group.add_argument('--list', action='store_true')

group.add_argument('--host')

return parser.parse_args()

def list_running_hosts():

cmd = "vagrant status --machine-readable"

status = subprocess.check_output(cmd.split()).rstrip()

hosts = []

for line instatus.split('\n'):

(_, host, key, value) = line.split(',')

if key == 'state' andvalue == 'running':

hosts.append(host)

return hosts

def get_host_details(host):

cmd = "vagrant ssh-config {}".format(host)

p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)

config = paramiko.SSHConfig()

config.parse(p.stdout)

c = config.lookup(host)

return {'ansible_ssh_host': c['hostname'],

'ansible_ssh_port': c['port'],

'ansible_ssh_user': c['user'],

'ansible_ssh_private_key_file': c['identityfile'][0]}

def main():

args = parse_args()

if args.list:

hosts = list_running_hosts()

json.dump({'vagrant': hosts}, sys.stdout)

else:

details = get_host_details(args.host)

json.dump(details, sys.stdout)

if __name__ == '__main__':

main()

Pre-Existing Inventory Scripts

Ansible ships with several dynamic inventory scripts that you can use. I can never figure out where my package manager installs these files, so I just grab the ones I need directly off GitHub. You can grab these by going to the Ansible GitHub repo and browsing to theplugins/inventory directory.

Many of these inventory scripts have an accompanying configuration file. In Chapter 12, we’ll discuss the Amazon EC2 inventory script in more detail.

Breaking Out the Inventory into Multiple Files

If you want to have both a regular inventory file and a dynamic inventory script (or, really, any combination of static and dynamic inventory files), just put them all in the same directory and configure Ansible to use that directory as the inventory. You can do this either via thehostfile parameter in ansible.cfg or by using the -i flag on the command line. Ansible will process all of the files and merge the results into a single inventory.

For example, our directory structure could look like this: inventory/hosts and inventory/vagrant.py.

Our ansible.cfg file would contain these lines:

[defaults]

hostfile = inventory

Adding Entries at Runtime with add_host and group_by

Ansible will let you add hosts and groups to the inventory during the execution of a playbook.

add_host

The add_host module adds a host to the inventory. This module is useful if you’re using Ansible to provision new virtual machine instances inside of an infrastructure-as-a-service cloud.

WHY DO I NEED ADD_HOST IF I’M USING DYNAMIC INVENTORY?

Even if you’re using dynamic inventory scripts, the add_host module is useful for scenarios where you start up new virtual machine instances and configure those instances in the same playbook.

If a new host comes online while a playbook is executing, the dynamic inventory script will not pick up this new host. This is because the dynamic inventory script is executed at the beginning of the playbook, so if any new hosts are added while the playbook is executing, Ansible won’t see them.

We’ll cover a cloud computing example that uses the add_host module in Chapter 12.

Invoking the module looks like this:

add_host name=hostname groups=web,staging myvar=myval

Specifying the list of groups and additional variables is optional.

Here’s the add_host command in action, bringing up a new vagrant machine and then configuring the machine:

- name: Provision a vagrant machine

hosts: localhost

vars:

box: trusty64

tasks:

- name: create a Vagrantfile

command: vagrant init {{ box }} creates=Vagrantfile

- name: Bring up a vagrant server

command: vagrant up

- name: add the Vagrant hosts to the inventory

add_host: >

name=vagrant

ansible_ssh_host=127.0.0.1

ansible_ssh_port=2222

ansible_ssh_user=vagrant

ansible_ssh_private_key_file=/Users/lorinhochstein/.vagrant.d/

insecure_private_key

- name: Do something to the vagrant machine

hosts: vagrant

sudo: yes

tasks:

# The list of tasks would go here

- ...

NOTE

The add_host module adds the host only for the duration of the execution of the playbook. It does not modify your inventory file.

When I do provisioning inside of my playbooks, I like to split it up into two plays. The first play runs against localhost and provisions the hosts, and the second play configures the hosts.

Note that we made use of the creates=Vagrantfile parameter in this task:

- name: create a Vagrantfile

command: vagrant init {{ box }} creates=Vagrantfile

This tells Ansible that if the Vagrantfile file is present, the host is already in the correct state, and there is no need to run the command again. It’s a way of achieving idempotence in a playbook that invokes the command module, by ensuring that the (potentially non-idempotent) command is run only once.

group_by

Ansible also allows you to create new groups during execution of a playbook, using the group_by module. This lets you create a group based on the value of a variable that has been set on each host, which Ansible refers to as a fact.3

If Ansible fact gathering is enabled, then Ansible will associate a set of variables with a host. For example, the ansible_machine variable will be i386 for 32-bit x86 machines and x86_64 for 64-bit x86 machines. If Ansible is interacting with a mix of such hosts, we can create i386 andx86_64 groups with the task.

Or, if we want to group our hosts by Linux distribution (e.g., Ubuntu, CentOS), we can use the ansible_distribution fact.

- name: create groups based on Linux distribution

group_by: key={{ ansible_distribution }}

In Example 3-13, we use group_by to create separate groups for our Ubuntu hosts and our CentOS hosts, and then we use the apt module to install packages onto Ubuntu and the yum module to install packages into CentOS.

Example 3-13. Creating ad-hoc groups based on Linux distribution

- name: group hosts by distribution

hosts: myhosts

gather_facts: True

tasks:

- name: create groups based on distro

group_by: key={{ ansible_distribution }}

- name: do something to Ubuntu hosts

hosts: Ubuntu

tasks:

- name: install htop

apt: name=htop

# ...

- name: do something else to CentOS hosts

hosts: CentOS

tasks:

- name: install htop

yum: name=htop

# ...

Although using group_by is one way to achieve conditional behavior in Ansible, I’ve never found much use for it. In Chapter 6, we’ll see an example of how to use the when task parameter to take different actions based on variables.

That about does it for Ansible’s inventory. In the next chapter, we’ll cover how to use variables. See Chapter 9 for more details about ControlPersist, also known as SSH multiplexing.

1 This term has been popularized by Randy Bias of Cloudscaling.

2 Yes, there’s a Vagrant dynamic inventory script included with Ansible already, but it’s helpful to go through the exercise.

3 We cover facts in more detail in Chapter 4.