Inventories - Ansible for DevOps: Server and configuration management for humans (2015)

Ansible for DevOps: Server and configuration management for humans (2015)

Chapter 7 - Inventories

Earlier in the book, a basic inventory file example was given (see Chapter 1’s basic inventory file example). For the simplest of purposes, an inventory file at the default location (/etc/ansible/hosts) will suffice to describe to Ansible how to reach the servers you want to manage.

Later, a slightly more involved inventory file was introduced (see Chapter 3’s inventory file for multiple servers), which allowed us to tell Ansible about multiple servers, and even group them into role-related groups, so we could run certain playbooks against certain groups.

Let’s jump back to a basic inventory file example and build from there:

1 # Inventory file at /etc/ansible/hosts

2

3 # Groups are defined using square brackets (e.g. [groupname]). Each server

4 # in the group is defined on its own line.

5 [myapp]

6 www.myapp.com

If you want to run an ansible playbook on all the myapp servers in this inventory (so far, just one, www.myapp.com), you can set up the playbook like so:

---

- hosts: myapp

tasks:

[...]

If you want to run an ad-hoc command against all the myapp servers in the inventory, you can run a command like so:

# Use ansible to check memory usage on all the myapp servers.

$ ansible myapp -a "free -m"

A real-world web application server inventory

The example above might be adequate for single-server services and tiny apps or websites, but most real-world applications require many more servers, and usually separate servers per application concern (database, caching, application, queuing, etc.). Let’s take a look at a real-world inventory file for a small web application that monitors server uptime, Server Check.in.

1 # Individual Server Check.in servers.

2 [servercheck-web]

3 www1.servercheck.in

4 www2.servercheck.in

5

6 [servercheck-web:vars]

7 ansible_ssh_user=servercheck_svc

8

9 [servercheck-db]

10 db1.servercheck.in

11

12 [servercheck-log]

13 log.servercheck.in

14

15 [servercheck-backup]

16 backup.servercheck.in

17

18 [servercheck-nodejs]

19 atl1.servercheck.in

20 atl2.servercheck.in

21 nyc1.servercheck.in

22 nyc2.servercheck.in

23 nyc3.servercheck.in

24 ned1.servercheck.in

25 ned2.servercheck.in

26

27 [servercheck-nodejs:vars]

28 ansible_ssh_user=servercheck_svc

29 foo=bar

30

31 # Server Check.in distribution-based groups.

32 [centos:children]

33 servercheck-web

34 servercheck-db

35 servercheck-nodejs

36 servercheck-backup

37

38 [ubuntu:children]

39 servercheck-log

This inventory may look a little overwhelming at first, but if you break it apart into simple groupings (web app servers, database servers, logging server, and node.js app servers), it describes a straightforward architecture.

Server Check.in Infrastructure

Server Check.in Infrastructure.

Lines 1-29 describe a few groups of servers (some with only one server), so playbooks and ansible commands can refer to the group by name. Lines 6-7 and 27-29 set variables that will apply only to the servers in the group (e.g. variables below [servercheck-nodejs:vars] will only apply to the servers in the servercheck-nodejs group).

Lines 31-39 describe groups of groups (using groupname:children to describe ‘child’ groups) that allow for some helpful abstractions.

Describing infrastructure in such a way affords a lot of flexibility when using Ansible. Consider the task of patching a vulnerability on all your CentOS servers; instead of having to log into each of the servers, or even having to run an ansible command against all the groups, using the above structure allows you to easily run an ansible command or playbook against all centos servers.

As an example, when the Shellshock vulnerability was disclosed in 2014, patched bash packages were released for all the major distributions within hours. To update all the Server Check.in servers, all that was needed was:

$ ansible centos -m yum -a "name=bash state=latest"

You could even go further and create a small playbook that would patch the vulnerability, then run tests to make sure the vulnerability was no longer present, as illustrated in this playbook. This would also allow you to run the playbook in check mode or run it through a continuous integration system to verify the fix works in a non-prod environment.

This infrastructure inventory is also nice in that you could create a top-level playbook that runs certain roles or tasks against all your infrastructure, others against all servers of a certain Linux flavor, and another against all servers in your entire infrastructure.

Consider, for example, this example master playbook to completely configure all the servers:

1 ---

2 # Set up basic, standardized components across all servers.

3 - hosts: all

4 sudo: true

5 roles:

6 - security

7 - logging

8 - firewall

9

10 # Configure web application servers.

11 - hosts: servercheck-web

12 roles:

13 - nginx

14 - php

15 - servercheck-web

16

17 # Configure database servers.

18 - hosts: servercheck-db

19 roles:

20 - pgsql

21 - db-tuning

22

23 # Configure logging server.

24 - hosts: servercheck-log

25 roles:

26 - java

27 - elasticsearch

28 - logstash

29 - kibana

30

31 # Configure backup server.

32 - hosts: servercheck-backup

33 roles:

34 - backup

35

36 # Configure Node.js application servers.

37 - hosts: servercheck-nodejs

38 roles:

39 - servercheck-node

There are a number of different ways you can structure your infrastructure-management playbooks and roles, and we’ll explore some in later chapters, but for a simple infrastructure, something like this is adequate and maintainable.

Non-prod environments, separate inventory files

Using the above playbook and the globally-configured Ansible inventory file is great for your production infrastructure, but what happens when you want to configure a separate but similar infrastructure for, say a development or user certification environment?

In this case, it’s easiest to use individual inventory files, rather than the central, locally-managed Ansible inventory file. For typical team-managed infrastructure, I would recommend including an inventory file for each environment in the same version-controlled repository as your Ansible playbooks, perhaps within an ‘inventories’ directory.

For example, I could take the entire contents of /etc/ansible/hosts above, and stash that inside an inventory file named inventory-prod, then duplicate it, changing server names where appropriate (e.g. the [servercheck-web] group would only have www-dev1.servercheck.in for the development environment), and naming the files for the environments:

servercheck/

inventories/

inventory-prod

inventory-cert

inventory-dev

playbook.yml

Now, when running playbook.yml to configure the development infrastructure, I would pass in the path to the dev inventory (assuming my current working directory is servercheck/):

$ ansible-playbook playbook.yml -i inventory-dev

Using inventory variables (which will be explored further), and well-constructed roles and/or tasks that use the variables effectively, you could architect your entire infrastructure, with environment-specific configurations, by changing some things in your inventory files.

Inventory variables

Chapter 5 introduced basic methods of managing variables for individual hosts or groups of hosts through your inventory in the inventory variables section, but it’s worth exploring the different ways of defining and overriding variables through inventory here.

For extremely simple use cases—usually when you need to define one or two connection-related variables (like ansible_ssh_user or ansible_ssh_port)—you can place variables directly inside an inventory file.

Assuming we have a standalone inventory file for a basic web application, here are some examples of variable definition inside the file:

1 [www]

2 # You can define host-specific variables inline with the host.

3 www1.example.com ansible_ssh_user=johndoe

4 www2.example.com

5

6 [db]

7 db1.example.com

8 db2.example.com

9

10 # You can add a '[group:vars]' heading to create variables that will apply

11 # to an entire inventory group.

12 [db:vars]

13 ansible_ssh_port=5222

14 database_performance_mode=true

It’s usually better to avoid throwing too many variables inside static inventory files, because not only are these variables typically less visible, they are also mixed in with your architecture definition. Especially for host-specific vars (which appear on one long line per host), this is an unmaintainable, low-visibility approach to host and group-specific variables.

Fortunately, Ansible provides a more flexible way of declaring host and group variables.

host_vars

For Hosted Apache Solr, different servers in a solr group have different memory requirements. The simplest way to tell Ansible to override a default variable in our Ansible playbook (in this case, the tomcat_xmx variable) is to use a host_vars directory (which can be placed either in the same location as your inventory file, or in a playbook’s root directory), and place a YAML file named after the host which needs the overridden variable.

As an illustration of the use of host_vars, we’ll assume we have the following directory layout:

hostedapachesolr/

host_vars/

nyc1.hostedapachesolr.com

inventory/

hosts

main.yml

The inventory/hosts file contains a simple definition of all the servers by group:

1 [solr]

2 nyc1.hostedapachesolr.com

3 nyc2.hostedapachesolr.com

4 jap1.hostedapachesolr.com

5 ...

6

7 [log]

8 log.hostedapachesolr.com

Ansible will search for a file at either hostedapachesolr/host_vars/nyc1.hostedapachesolr.com or hostedapachesolr/inventory/host_vars/nyc1.hostedapachesolr.com, and if there are any variables defined in the file (in YAML format), those variables will override all other playbook and role variables and gathered facts, only for the single host.

The nyc1.hostedapachesolr.com host_vars file looks like:

1 ---

2 tomcat_xmx: "1024m"

The default for tomcat_xmx may normally be 640m, but when Ansible runs a playbook against nyc1.hostedapachesolr.com, the value of tomcat_xmx will be 1024m instead.

Overriding host variables with host_vars is much more maintainable than doing so directly in static inventory files, and also provides greater visibility into what hosts are getting what overrides.

group_vars

Much like host_vars, Ansible will automatically load any files named after inventory groups in a group_vars directory placed inside the playbook or inventory file’s location.

Using the same example as above, we’ll override one particular variable for an entire group of servers. First, we add a group_vars directory with a file named after the group needing the overridden variable:

hostedapachesolr/

group_vars/

solr

host_vars/

nyc1.hostedapachesolr.com

inventory/

hosts

main.yml

Then, inside group_vars/solr, use YAML to define a list of variables that will be applied to servers in the solr group:

1 ---

2 do_something_amazing=true

3 foo=bar

Typically, if your playbook is only being run on one group of hosts, it’s easier to define the variables in the playbook via an included vars file. However, in many cases you will be running a playbook or applying a set of roles to multiple inventory groups. In these situations, you may need to use group_vars to override specific variables for one or more groups of servers.

Ephemeral infrastructure: Dynamic inventory

In many circumstances, static inventories are adequate for describing your infrastructure. When working on small applications, low-traffic web applications, and individual workstations, it’s simple enough to manage an inventory file by hand.

However, in the age of cloud computing and highly scalable application architecture, it’s often necessary to add dozens or hundreds of servers to an infrastructure in a short period of time—or to add and remove servers continuously, to scale as traffic grows and subsides. In this circumstance, it would be tedious (if not impossible) to manage a single inventory file by hand, especially if you’re using auto-scaling infrastructure new instances are provisioned and need to be configured in minutes or seconds.

Even in the case of container-based infrastructure, new instances need to be configured correctly, with the proper port mappings, application settings, and filesystem configuration.

For these situations, Ansible allows you to define inventory dynamically. If you’re using one of the larger cloud-based hosting providers, chances are there is already a dynamic inventory script (which Ansible uses to build an inventory) for you to use. Ansible core already includes scripts for Amazon Web Services, Cobbler, DigitalOcean, Linode, OpenStack, and other large providers, and later we’ll explore creating our own dynamic inventory script (if you aren’t using one of the major hosting providers or cloud management platforms).

Dynamic inventory with DigitalOcean

Digital Ocean is one of the world’s top five hosting companies, and has grown rapidly since it’s founding in 2011. One of the reasons for the extremely rapid growth is the ease of provisioning new ‘droplets’ (cloud VPS servers), and the value provided; as of this writing, you could get a fairly speedy VPS with 512MB of RAM and a generous portion of fast SSD storage for $5 USD per month.

Digital Ocean’s API and simple developer-friendly philosophy has made it easy for Ansible to interact with Digital Ocean droplets; you can create, manage, and delete droplets with Ansible, as well as use droplets with your playbooks using dynamic inventory.

DigitalOcean account prerequisites

Before you can follow the rest of the examples in this section, you will need:

1. A DigitalOcean account (sign up at www.digitalocean.com).

2. dopy, a Python wrapper for Digital Ocean API interaction (you can install it with pip: sudo pip install dopy).

3. Your DigitalOcean account API key and client ID (Ansible currently supports the v1 API, so you need to go to the ‘Apps & API’ page in your profile, then click the ‘API v1.0 Page’ link to get a key and client ID).

4. An SSH key pair, which will be used to connect to your DigitalOcean servers. Follow this guide to create a key pair and add the public key to your DigitalOcean account.

Once you have these four things set up and ready to go, you should be able to communicate with your DigitalOcean account through Ansible.

Connecting to your DigitalOcean account

There are a few different ways you can specify your DigitalOcean client ID and API key (including command line arguments --client-id and --api-key, as values inside a digital_ocean.ini file, or as environment variables). For our example, we’ll use environment variables (since these are easy to configure, and work both with Ansible’s digital_ocean module and the dynamic inventory script). Open up a terminal session, and enter the following commands:

$ export DO_CLIENT_ID=YOUR_CLIENT_ID_HERE

$ export DO_API_KEY=YOUR_API_KEY_HERE

Before we can use a dynamic inventory script to discover our DigitalOcean droplets, let’s use Ansible to quickly provision a new droplet.

warning

Creating cloud instances (‘Droplets’, in DigitalOcean parlance) will incur minimal charges for the time you use them (currently less than $0.01/hour for the size in this example). For the purposes of this tutorial (and in general, for any testing), make sure you shut down and destroy your instances when you’re finished using them, or you will be charged through the next billing cycle! Even so, using low-priced instances (like a $5/month DigitalOcean droplet with hourly billing) means that, even in the worst case, you won’t have to pay much. If you create and destroy an instance in a few hours, you’ll be charged a few pennies.

Creating a droplet with Ansible

Create a new playbook named provision.yml, with the following contents:

1 ---

2 - hosts: localhost

3 connection: local

4 gather_facts: false

5

6 tasks:

7 - name: Create new Droplet.

8 digital_ocean:

9 state: present

10 command: droplet

11 name: ansible-test

12 private_networking: yes

13 # 512mb

14 size_id: 66

15 # CentOS 7.0 x64

16 image_id: 6713409

17 # nyc2

18 region_id: 4

19 ssh_key_ids: 138954

20 # Required for idempotence/only one droplet creation.

21 unique_name: yes

22 register: do

The digital_ocean module lets you create, manage, and delete droplets with ease. You can read the documentation for all the options, but the above is an overview of the main options. name sets the hostname for the droplet, statecan also be set to deleted if you want the droplet to be destroyed, and other options tell DigitalOcean where to set up the droplet, and with what OS and configuration.

tip

You can use DigitalOcean’s API, along with your client_id and api_key, to get the IDs for size_id (the size of the Droplet), image_id (the system or distro image to use), region_id (the data center in which your droplet will be created), and ssh_key_ids (a comma separate list of SSH keys to be included in the root account’s authorized_keys file).

As an example, to get all the available images, use curl "https://api.digitalocean.com/images/?client_id=CLIENT_ID&api_key=API_KEY&filter=global" | python -m json.tool, substituting your own CLIENT_ID and API_KEY, and you’ll receive a JSON listing of all available values. Browse the DigitalOcean API for information on how to query SSH key information, size information, etc.

We used register as part of the digital_ocean task so we could immediately start using and configuring the new host if needed. Running the above playbook returns the following output (using debug: var=do in an additional task to dump the contents of our registered variable, do):

$ ansible-playbook do_test.yml

PLAY [localhost] ***********************************************************

TASK: [Create new Droplet.] ************************************************

changed: [localhost]

TASK: [debug var=do] *******************************************************

ok: [localhost] => {

"do": {

"changed": true,

"droplet": {

"backups": [],

"backups_active": false,

"created_at": "2014-10-22T02:09:20Z",

"event_id": 34915980,

"id": 2940194,

"image_id": 6918990,

"ip_address": "162.243.20.29",

"locked": false,

"name": "ansible-test",

"private_ip_address": null,

"region_id": 4,

"size_id": 66,

"snapshots": [],

"status": "active"

},

"invocation": {

"module_args": "",

"module_name": "digital_ocean"

}

}

}

PLAY RECAP *****************************************************************

localhost : ok=2 changed=1 unreachable=0 failed=0

Since do contains the new droplet’s IP address (alongside other relevant information), you can place your freshly-created droplet in an existing inventory group using Ansible’s add_host module. Adding to the playbook we started above, you could set up your playbook to provision an instance and immediately configure it (after waiting for port 22 to become available) with something like:

24 - name: Add new host to our inventory.

25 add_host:

26 name: "{{ do.droplet.ip_address }}"

27 groups: do

28 when: do.droplet is defined

29

30 - hosts: do

31 remote_user: root

32 gather_facts: false

33

34 tasks:

35 - name: Wait for port 22 to become available.

36 local_action: "wait_for port=22 host={{ inventory_hostname }}"

37

38 - name: Install tcpdump.

39 yum: name=tcpdump state=installed

At this point, if you run the playbook ($ ansible-playbook provision.yml), it should create a new droplet (if it has not already been created), then add that droplet to the do inventory group, and finally, run a new play on all the dohosts (including the new droplet). Here are the results:

$ ansible-playbook provision.yml

PLAY [localhost] ***********************************************************

TASK: [Create new Droplet.] ************************************************

changed: [localhost]

TASK: [Add new host to our inventory.] *************************************

ok: [localhost]

PLAY [do] ******************************************************************

TASK: [Install tcpdump.] ***************************************************

changed: [162.243.20.29]

PLAY RECAP *****************************************************************

162.243.20.29 : ok=2 changed=1 unreachable=0 failed=0

localhost : ok=2 changed=1 unreachable=0 failed=0

If you run the same playbook again, it should report no changes—the entire playbook is idempotent! You might be starting to see just how powerful it is to have a tool as flexible as Ansible at your disposal; not only can you configure servers, you can create them (singly, or dozens at a time), and configure them at once. And even if a ham-fisted sysadmin jumps in and deletes an entire server, you can run the playbook again, and rest assured your server will be recreated and reconfigured exactly as it was when it was first set up.

tip

Note that you might need to disable strict host key checking to get provisioning and instant configuration to work correctly, otherwise you may run into an error stating that Ansible can’t connect to the new droplet during the second play. To do this, add the line host_key_checking=False under the [defaults] section in your ansible.cfg file (located in /etc/ansible by default).

You should normally leave host_key_checking enabled, but when rapidly building and destroying VMs for testing purposes, it is simplest to disable it temporarily.

DigitalOcean dynamic inventory with digital_ocean.py

Once you have some DigitalOcean droplets, you need a way for Ansible to dynamically build an inventory of your servers so you can build playbooks and use the servers in logical groupings (or run playbooks and ansiblecommands directly on all droplets).

There are a few steps to getting DigitalOcean’s official dynamic inventory script working:

1. Install dopy via pip (the DigitalOcean Python library): $ pip install dopy.

2. Download the DigitalOcean dynamic inventory script from Ansible on GitHub: $ curl -O https://raw.githubusercontent.com/ansible/ansible/devel/plugins/inventory/digital_ocean.py.

3. Make the inventory script executable: $ chmod +x digital_ocean.py.

4. Make sure you have the credentials configured in digital_ocean.ini (as explained earlier in this chapter). Alternatively, you can set DO_CLIENT_ID and DO_API_KEY in your environment, or pass the command line options --client-id and --api-key.

5. Make sure the script is working by running the script directly: $ ./digital_ocean.py --pretty. After a second or two, you should see all your droplets (likely just the one you created earlier) listed by IP address and dynamic group as JSON.

6. Ping all your DigitalOcean droplets: $ ansible all -m ping -i digital_ocean.py -u root.

Now that you have all your hosts being loaded through the dynamic inventory script, you can use add_hosts to build groups of the Droplets for use in your playbooks. Alternatively, if you want to fork the digital_ocean.py inventory script, you can modify it to suit your needs; exclude certain servers, build groups based on certain criteria, etc.

information

Ansible currently supports DigitalOcean’s v1 API, which makes working with DigitalOcean slightly more difficult. The v2 API allows you to use region names (e.g. “nyc2”) instead of numeric IDs, allows you to add metadata to your droplets, and much more. Ansible should soon support the v2 API. If you’re interested in the current status of this effort, or would like to help in migrating to the v2 API, visit the v2 API support issue on GitHub.

Dynamic inventory with AWS

Many of this book’s readers are familiar with Amazon Web Services (especially EC2, S3, ElastiCache, and Route53), and likely have managed or currently manage an infrastructure within Amazon’s cloud. Ansible has very strong support for managing AWS-based infrastructure, and includes a dynamic inventory script to help you run playbooks on your hosts in a variety of ways.

There are a few excellent guides to using Ansible with AWS, for example:

· Ansible - Amazon Web Services Guide

· Ansible for AWS

I won’t be covering dynamic inventory in this chapter, but will mention that the ec2.py dynamic inventory script, along with Ansible’s extensive support for AWS infrastructure through ec2_* modules, makes Ansible the best and most simple tool for managing a broad AWS infrastructure.

In the next chapter, one of the examples will include a guide for provisioning infrastructure on AWS, along with a quick overview of dynamic inventory on AWS.

Inventory on-the-fly: add_host and group_by

Sometimes, especially when provisioning new servers, you will need to modify the in-memory inventory during the course of a playbook run. Ansible offers the add_host and group_by modules to help you manage inventory for these scenarios.

In the DigitalOcean example above, add_host was used to add the new droplet to the do group:

[...]

- name: Add new host to our inventory.

add_host:

name: "{{ do.droplet.ip_address }}"

groups: do

when: do.droplet is defined

- hosts: do

remote_user: root

tasks:

[...]

You could add multiple groups with add_host, and you can also add other variables for the host inline with add_host. As an example, let’s say you created a VM using an image that exposes SSH on port 2288 and requires an application-specific memory limit specific to this VM:

- name: Add new host to our inventory.

add_host:

name: "{{ do.droplet.ip_address }}"

ansible_ssh_port: 2288

myapp_memory_maximum: "1G"

when: do.droplet is defined

The custom port will be used when Ansible connects to this host, and the myapp_memory_maximum will be passed into the playbooks just as any other inventory variable.

The group_by module is even simpler, and allows you to create dynamic groups during the course of a playbook run. Usage is extremely simple:

- hosts: all

gather_facts: yes

tasks:

- name: Create an inventory group for each architecture.

group_by: "key=architecture-{{ ansible_machine }}"

- debug: var=groups

After running the above playbook, you’d see all your normal inventory groups, plus groups for architecture-x86_64, i386, etc. (depending on what kind of server architectures you use).

Multiple inventory sources - mixing static and dynamic inventories

If you need to combine static and dynamic inventory, or even if you wish to use multiple dynamic inventories (for example, if you are managing servers hosted by two different cloud providers), you can pass a directory to ansible or ansible-playbook, and Ansible will combine the output of all the inventories (both static and dynamic) inside the directory:

`ansible-playbook -i path/to/inventories main.yml`

One caveat: Ansible ignores .ini and backup files in the directory, but will attempt to parse every text file and execute every executable file in the directory—don’t leave random files in mixed inventory folders!

Creating custom dynamic inventories

TODO:

· Dynamic Inventory

· Developing dynamic inventory sources

Summary

From the most basic infrastructure consisting of one server to a multi-tenant, dynamic infrastructure with thousands of servers, Ansible offers many options for describing your servers, and overriding playbook and role variables for specific hosts or groups. You should be able to describe all your servers, however they’re managed and wherever they’re hosted, with Ansible’s flexible inventory system.

___________________________________

/ A pint of sweat saves a gallon of \

\ blood. (General Patton) /

-----------------------------------

\ ^__^

\ (oo)\_______

(__)\ )\/\

||----w |

|| ||