Learning Nagios 4 (2014)

Chapter 4. Using the Nagios Plugins

The previous chapter discussed the basic configuration of host and service checking. Nagios can be set up to check if your services are up and running. This chapter describes in detail how these checks work. It also introduces some of the Nagios plugins that are developed as a part of Nagios and as a part of the Nagios plugins project.

Nagios' strength comes from its ability to monitor servers and the services that they offer in a large number of ways. What's more interesting is that all of these ways make sure that your services are functional, external plugins, and work in quite an easy way. Many of these are even shipped with Nagios, as we mentioned in Chapter 2, Installing Nagios 4. Therefore, it is possible to either existing plugins or write your own. In this chapter, we will learn the checks that can be made using the Nagios plugins project, and we will cover the following:

· Using the standard checks for the host that is alive, and basic TCP/UDP checks

· Checking e-mail services, such as POP3, IMAP, and SMTP

· Monitoring network services, such as FTP, DHCP, and websites and checking Nagios itself

· Checking database systems

· Monitoring hard drives and network sharing

· Checking system information

· Additional and third-party plugins

Understanding how checks work

Nagios performs checks by running an external command, and uses the return code along with output from the command as information on whether the check worked or not. It is the command's responsibility to verify if a host or service is working at the time the command is invoked.

Nagios itself handles all of the internals, such as scheduling the commands to be run, storing their results, and determining the status of each host and service.

Nagios requires that all plugins follow a specific, easy-to-follow behavior in order for them to work smoothly. These rules are common for both host checks and service checks. It requires that each command returns specific result codes, which are shown in the following table:

Exit code	Status	Description
0	OK	Working correctly
1	WARNING	Working, but needs attention (for example, low resources)
2	CRITICAL	Not working correctly or requires attention
3	UNKNOWN	Plugin was unable to determine the status for the host or service

Standard output from the command is not parsed in any way by Nagios. It is usually formatted in the following way:

PLUGIN STATUS - status description

Usually, the status description contains human-readable information that is visible using the web interface. Some sample outputs from various plugins and states are as follows:

PING OK - Packet loss = 0%, RTA = 0.18 ms

DNS WARNING: 0.015 seconds response time

DISK CRITICAL - free space: /boot 18 MB (8% inode=99%)

Nagios plugins use options for their configuration. It is up to the plugin author's host to parse these options. However, most commands that come as part of the Nagios plugins package use standard options and support the-h or --help options to provide a full description of all the arguments they accept. Standard Nagios plugins usually accept the following parameters:

Option	Description
-h, --help	This provides help
-V, --version	This prints the exact version of the plugin
-v, --verbose	This makes the plugin give a more detailed information on what it is doing
-t, --timeout	This provides the timeout (in seconds); after this time, the plugin will report CRITICAL status
-w, --warning	This provides the plugin-specific limits for the WARNING status
-c, --critical	This provides the plugin-specific limits for the CRITICAL status
-H, --hostname	This provides the hostname, IP address, or Unix socket to communicate with
-4, --use-ipv4	This lets you use IPv4 for network connectivity
-6, --use-ipv6	This lets you use IPv6 for network connectivity

Commands that verify various daemons also have a common set of options. Many of the networking-related plugins use the following options in addition to the preceding standard ones:

Option	Description
-p, --port	This is used to connect to the TCP or UDP port
-w, --warning	This provides the response time that will issue a WARNING status (seconds)
-c, --critical	This provides the response time that will issue a CRITICAL status (seconds)
-s, --send	This provides the string that will be sent to the server
-e, --expect	This provides the string that should be sent back from the server (option might be passed several times; see --all for details)
-q, --quit	This provides the string to send to the server to close the connection
-A, --all	In case multiple --expect parameters are passed, this option indicates that all responses need to be received; if this option is not present, at least one matching result indicates a success
-m, --maxbytes	This specifies the maximum number of bytes to read when expecting a string to be sent back from the server; after this number of bytes, a mismatch is assumed
-d, --delay	This provides the delay in seconds between sending a string to server and expecting a response
-r, --refuse	This provides the status that should be indicated in case the connection is refused (ok, warn, crit; defaults to crit)
-M	This provides the status in case the expected answer is not returned by the server (ok, warn, crit; defaults to warn)
-j, --jail	Do not return output from the server in plugin output text
-D, --certificate	The number of days that the SSL certificate must still be valid; requires --ssl
-S, --ssl	Connect using the SSL encryption
-E, --escape	Allows using \n, \r, \t, or \\ to send or quit string; must be passed before the --send or --quit option

Tip

The option names are case sensitive. For many plugins, there are options that have their abbreviated name the same, but with different cases. For example, both -e and -E as well as -m and -M are valid options for most of the plugins. It is important to distinguish lowercase and uppercase options.

All of the commands support the --verbose option (or -v for short variant of it) that will print out useful information about the test. It is recommended to add the -v option whenever you run into issues with getting a plugin to work.

This chapter describes the commands provided by a standard distribution Nagios plugin and is based on Version 1.4.16. Before using specific options for a command, it is recommended that you use the --help option and familiarize yourself with the functionality available on your Nagios installation.

All plugins have their nonstandard options, which are described in more detail in this chapter. All commands described in this chapter also have a sample configuration for the Nagios check command. Even though some longer definitions might span multiple lines, please make sure that you put it on a single line in your configuration. Some of the plugins already have their command counterparts configured with the sample Nagios configuration that is installed along with Nagios. Therefore, it is also worth checking if your command's .cfg file contains a definition for a particular command.

Monitoring using the standard network plugins

One of the basic roles of a plugin is to monitor local or remote hosts and verify if they are working correctly. There is a choice of generic plugins to accomplish this task.

Standard networking plugins allow hosts to be monitored using ICMP ECHO (refer to http://en.wikipedia.org/wiki/Ping). This is used to determine whether a computer is responding to IP requests. It is also used to measure the time that a machine takes to respond, and how many packages are lost during the communication. These plugins also try to connect to certain TCP/UDP ports. This is used to communicate with various network-based services to make sure that they are working properly, and respond within a defined amount of time.

Testing the connection to a remote host

Checking if a host is alive is a basic test that should be performed for all remote machines. Nagios offers a command that is commonly used for checking if a host is alive and plugged into the network. The syntax of the plugin is as follows:

check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>%

[-p packets] [-t timeout] [-4|-6]

This command accepts the standard options described previously, as well as the following nonstandard options:

Option	Description
-p, --packets	This provides the number of packets to send; defaults to 5
-w, --warning	This gives WARNING status limit in form of RTA and PKTLOSS percentage
-c, --critical	This gives CRITICAL status limit in form of RTA and PKTLOSS percentage

Round Trip Average (RTA) is the average time taken in milliseconds for the package to return. Packet Loss (PKTLOSS) is the maximum percentage of packages that can be lost during communication. For example, for a value of 100, 2 percent means that a ping must return within 0.1 seconds on average and at least 4 out of 5 packages have to come back.

A sample command definition for checking if a host is alive is as follows:

define command {

command_name check-host-alive

command_line $USER1$/check_ping -H $HOSTADDRESS$

-w 3000.0,80% -c 5000.0,100% -p 5

}

Testing the connectivity using TCP and UDP

In many cases, Nagios is used to monitor services that work over the network. To check if a service is working properly, it is necessary to make sure that a certain TCP or UDP port is accessible over the network. In such cases, the tests are done by connecting to the service periodically using the plugin, and this may cause entries in the system log regarding connection attempts.

For example, Microsoft SQL Server listens on TCP port 1433. In many cases, it is enough to run generic plugins that check whether a service is available on a specified TCP or UDP port. However, it is recommended that you run specialized plugins for various services such as web or e-mail servers, as these commands also try basic communication with the server and/or measure response time.

Internally, as this command is also handling many other checks, the syntax is almost the same. It is designed so that it behaves slightly differently based on its name. Many other plugins are symbolic links to check_tcp. The check_tcp plugin is mainly intended to test services, that do not have a corresponding Nagios check command. The second command check_udp is also a symbolic link to check_tcp, and the only difference is that it communicates over UDP instead of TCP. Its syntax is as follows:

check_tcp|check_udp -H host -p port [-w <warning >] [-c <critcal>]

[-s <send string>] [-e <expect string>]

[-q <quit string>] [-A] [-m <maximum bytes>]

[-d <delay>] [-t <timeout>] [-r <refuse state>]

[-M <mismatch state>] [-v] [-4|-6]

[-j] [-D <days to cert expiry>] [-S] [-E]

These commands accept several nonstandard options as follows:

Option	Description
-p, --port	To helps connect to the TCP or UDP port
-w, --warning	This provides the response time that will issue a WARNING status (in seconds)
-c, --critical	This provides the response time that will issue a CRITICAL status (in seconds)

An example to verify whether VMware server (1.x and 2.x) is listening to connections is as follows:

define command

{

command_name check_vmware

command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 902  –e "220 VMware"

}

For UDP, the following is an example command definition to verify if the OpenVPN server is listening on UDP port 1142:

define command

{

command_name check_openvpn

command_line $USER1$/check_udp -H $HOSTADDRESS$ -p 1142

}

Monitoring the e-mail servers

Making sure that all e-mail-related services are working correctly is something that each hosting company and intranet administrator needs to perform on a daily basis. In order to do this, Nagios can watch these servers and make sure things are working as expected. This can be done by a remote machine to make sure that the services are accessible, or can be monitored by the same server that offers these services.

Nagios can make sure that the processes are running and waiting for connections. It is also easy to verify whether a predefined user and password pair is working properly to make sure that a custom authorization system is working properly.

This section describes the commands that check e-mail servers using network connectivity. The plugins that verify specific processes on a server can be used to make sure a particular daemon is up and running as well.

Checking the POP3 and IMAP servers

POP3 is a very popular protocol for retrieving e-mail messages from an e-mail client application. It uses TCP port 110 for unencrypted connections and port 995 for SSL-encrypted connections. Nagios offers means to verify both unencrypted and encrypted POP3 connections that can be made. Even though POP3 is the most popular e-mail retrieving protocol, another protocol is also very common. IMAP is a protocol that is used to access e-mails on remote servers rather than download them to the user's computer. It uses TCP port 143 for standard connections and port 993 for encrypted connections over SSL. The following plugins are based on check_tcp (and are actually symbolic links to check_tcp). The syntax is identical to the original plugin:

check_pop|check_imap -H host [-p port] [-w <warning>]

[-c <critical>] [-s <send string>]

[-e <expect string>] [-q <quit string>] [-A]

[-m <maximum bytes>] [-d <delay>] [-t <timeout seconds>] [-r <refuse state>] [-M <mismatch state>] [-v] [-4|-6] [-j]

[-D <days to cert expiry>] [-S] [-E]

The only difference between this plugin and the standard command is that the port parameter can be omitted for this plugin, and in this case, a default value for both non-SSL and SSL variants is chosen. In order to enable connection over SSL, either pass the --ssloption, or invoke the command as check_spop instead of check_pop and check_simap instead of check_imap.

The following are sample command definitions that check for a daemon listening on a specified host and verify that a valid POP3 and IMAP welcome message can be retrieved:

define command

{

command_name check_pop

command_line $USER1$/check_pop -H $HOSTADDRESS$

}

define command

{

command_name check_imap

command_line $USER1$/check_imap -H $HOSTADDRESS$

}

However, it seems more useful to verify the actual functionality of the server. It is therefore reasonable also to verify that a predefined username and password is accepted by our POP3 daemon. In order to do that, the example uses -E to escape the newline characters: -s to send commands that authenticate and -e to verify that the user has actually been logged in. Additionally, the -d option is passed to indicate that the command should wait a couple of seconds before analyzing the output. If this option is not passed, the command will return after the first line. The following examples should work with any POP3/IMAP server, but it may be necessary to customize the response for your particular environment:

define command

{

command_name check_pop3login

command_line $USER1$/check_pop -H $HOSTADDRESS$ -E

–s "USER $ARG1$\r\nPASS $ARG2$\r\n" -d 5

-e "ogged in"

}

define command

{

command_name check_imaplogin

command_line $USER1$/check_imap -H $HOSTADDRESS$ -E

-s "pr01 LOGIN $ARG1 $ARG2$\r\n" -d 5

-e "pr01 OK"

}

The value that is passed in the -s option is a string with two lines for POP3 and one line for IMAP4. Each line ends with a newline character (\r\n) that are sent as newline characters due to using the -E option.

For POP3, these lines are standard protocol commands to log into an account. The POP3 server should then issue a response stating that the user is authenticated, and this is what the command is expecting to receive because of the -e option. In addition, $ARG1$and $ARG2$ will be replaced with a username and a password that is supplied in a service check definition, which allows different usernames and passwords to be specified for different checks.

With IMAP4, there is only a slight difference in the protocol dialect. IMAP requires the sending of only a single LOGIN command in order to authenticate. As for POP3, $ARG1$ and $ARG2$ will be replaced with a username and password. In this way, it is possible to set up checks for different users and passwords with a single command definition. The pr01 string can be replaced by any other text without spaces. It is necessary with the IMAP4 protocol to bind requests with answers provided by the server.

Testing the SMTP protocol

Simple Mail Transfer Protocol (SMTP) is a protocol for sending e-mails—both from a client application as well as between e-mail servers. Therefore, monitoring it is also very important from the point of view of availability.

Nagios standard plugins offer a command to check whether an SMTP server is listening. Unlike checks for POP3 and IMAP, the command is available only for this particular protocol; therefore, the options are a bit different:

check_smtp -H host [-p port] [-C command] [-R response]

[-e expect] [-f from addr] [-F hostname]

[-A authtype –U authuser –P authpass]

[-w <warning time>] [-c <critical time>]

[-t timeout] [-S] [-D days] [-n] [-4|-6]

The plugin accepts most of the standard options; additional ones are as follows:

Option	Description
-C, --command	This provides the SMTP command to execute on the server (option might be repeated)
-R, --response	This provides the response to expect from the server (option might be repeated)
-f, --from	This attempts to set from where the e-mail is originating
-F, --fqdn	This provides the fully-qualified domain name to send during SMTP greeting (defaults to the local hostname if not specified)
-S, --starttls	This lets you use STARTTLS to initialize connection over SMTP

The port can be omitted and defaults to 25. In this case, the –S option also behaves a bit differently, and it uses the STARTTLS function of SMTP servers instead of connecting directly over SSL. A basic SMTP check command definition looks like the following:

define command

{

command_name check_smtp

command_line $USER1$/check_smtp -H $HOSTADDRESS$

}

Most of these options are similar to the standard send/expect parameters in the way they work. Therefore, it is quite easy to create a more complex definition that verifies the sending of e-mails to a specific address:

define command

{

command_name check_smtpsend

command_line $USER1$/check_smtp -H $HOSTADDRESS$

-f "$ARG1$" –C "RCPT TO:<$ARG2$>" –R "250"

}

This check will attempt to send an e-mail from $ARG1$ to $ARG2$, which will be passed from a check definition. Also, it expects to receive a return code 250, which indicates that no error has occurred.

Monitoring network services

Nagios also offers plugins that monitor different network services. These include commands for checking FTP, DHCP protocol, and WWW servers. It is also possible for Nagios to monitor itself.

Checking the FTP server

Nagios allows you to verify whether an FTP server is listening for connections by using the check_ftp command. This plugin is identical to check_tcp, with the difference that the port is optional. By default, a valid FTP welcome message is expected.

check_ftp -H host [-p port] [-w <warning time>]

[-c <critical time>] [-s <send string>]

[-e <expect string>] [-q <quit string>] [-A] [-m <maximum bytes>] [-d <delay>]

[-t <timeout seconds>] [-r <refuse state>] [-M <mismatch state>] [-v] [-4|-6] [-j]

[-D <days to cert expiry>] [-S] [-E]

The port argument can be omitted and defaults to 21, or 990 for SSL-based connections. A sample command definition for checking FTP accepting connections is as follows:

define command

{

command_name check_ftp

command_line $USER1$/check_ftp -H $HOSTADDRESS$

}

By using the -s and -e flags, it is also possible to verify if a specified username and password is allowed to log in:

define command

{

command_name check_ftplogin

command_line $USER1$/check_ftp -H $HOSTADDRESS$ -E

-s "USER $ARG1\r\nPASS $ARG2$\r\n" -d 5

-e "230"

}

This example is quite similar to POP3 authentication as the commands are the same. The only difference is that the requested response is 230 as this is a code for a successful response to the PASS command.

Verifying the DHCP protocol

If your network has a server or a router that provides the users with IP addresses via DHCP, it would be wise to make sure that this server is also working correctly. Nagios offers a plugin that attempts to request an IP address via a DHCP protocol, which can be used for this purpose. The syntax is a bit different from other plugins:

check_dhcp [-v] [-u] [-s serverip] [-r requestedip] [-t timeout]

[-i interface] [-m mac]

This command accepts the options described in the following table:

Option	Description
-s, --serverip	This provides the IP of the server that needs to reply with an IP (option might be repeated)
-r, --requestedip	This indicates that at least one DHCP server needs to offer the specified IP address
-m, --mac	This provides the MAC address that should be used in the DHCP request
-i, --interface	This provides the name of the interface that is to be used for checking (for example eth0)
-u, --unicast	Unicast, for testing a DHCP relay request; this requires -s

Options for DHCP checking are very powerful—they can be used to check if any server is responding to the DHCP requests, for example:

define command

{

command_name check_dhcp

command_line $USER1$/check_dhcp

}

This plugin can also be used, as shown in the following command snippet, to verify if specific servers work, if a specified MAC address will receive an IP address, if a specific IP address is returned, or a combination of these checks:

define command

{

command_name check_dhcp_mac

command_line $USER1$/check_dhcp –s $HOSTADDRESS$

-m $ARG1$ -r $ARG2$

}

This check will ensure that a specific machine provides a specific IP for requesting a specific MAC address. This allows checks to be created for specific DHCP rules. This is crucial in the case of networks that need to provide specific devices with IP addresses, which other services depend.

It is also worth noting that such tests are safe from a network's perspective as the IP received from the server is not acknowledged by the Nagios plugin. Therefore, a check for a specific MAC address can be done even if a network card with the same address is currently connected. DHCP works over broadcast IP requests, and therefore, it is not recommended that you set up testing of this service often it might cause excessive traffic for larger networks.

Monitoring the Nagios process

It is possible for Nagios to monitor whether it is running on the local machine. This works by checking the Nagios log file for recent entries, as well as reading the output from the ps system command to ensure that the Nagios daemon is currently running. This plugin is mainly used in combination with NRPE or SSH, which are described in more detail in Chapter 8, Monitoring Remote Hosts. However, it can also be deployed to check the same Nagios that is scheduling the command, mainly to make sure that the log files contain recent entries. The syntax and options are as follows:

check_nagios -F <status log file> -e <expire_minutes> -C <process_string>

Option	Description
-F, --filename	This provides the IP address of the server that needs to reply with an IP (option might be repeated)
-e, --expires	This provides the number of minutes after which the log file is assumed to be stale
-C, --command	This provides the command or partial command to search for in the process list

All of the arguments listed previously are required. The check for the --expires option is done by comparing the date and time of the latest entry in the log with the current date and time. The log file is usually called nagios.log and is stored in the directory that was passed in the --localstatedir option during the Nagios compilation. For an installation performed according to the steps given in Chapter 2, Installing Nagios 4, the path will be /var/nagios/nagios.log. The Nagios process for such a setup would be/opt/nagios/bin/nagios. An example definition of a command receiving all of the information as arguments is as follows:

define command

{

command_name check_nagios

command_line $USER1$/check_nagios –F $ARG1$ -C $ARG2$

-e $ARG3$

}

The first argument is the path to the log file, the second is the path to the Nagios daemon binary, and the last one is the maximum acceptable number of minutes since the last log updated.

Testing the websites

Making sure that the websites are up and running 24/7 is vital to many large companies. Verifying that the returned pages contain correct data may be even more important for companies conducting e-commerce. Nagios offers plugins to verify that a web server works. It can also make sure that your SSL certificate is still valid and checks the contents of specific pages to verify that they contain specific text. This command accepts various parameters, which are as follows:

check_http -H <vhost> | -I <IP-address> [-u <uri>]

[-p <port>] [-w <warning time>]

[-c <critical time>] [-t <timeout>] [-L] [-a auth] [-b proxy_auth]

[-f <ok | warn | critcal | follow>] [-e <expect>] [-s string] [-l] [-r <regex> | -R <regex>] [-j method]

[-P string] [-m <min_pg_size>:<max_pg_size>] [-4|-6] [-N] [-M <age>] [-A string]

[-k string] [-S] [--sni][-C <age>] [-T <content-type>]

The following table lists the options that differ from their usual behavior, or are not common in other commands:

Option	Description
-H, --hostname	This provides the hostname that should be used for the host HTTP header; the port might be appended, so it is also present in the http header
-I, --IP-address	This provides the IP address to which to connect; if not specified, --hostname is used
-u, --url	This provides the URL to GET or POST (defaults to /)
-j, --method	To use the HTTP method such as GET, HEAD, POST, PUT, DELETE
-P, --post	This will post the encoded HTTP via POST; content is specified as argument
-N, --no-body	Do not wait for the document, only parse the HTTP headers
-M, --max-age	Warn if the document is older than the number of seconds provided; this parameter can also be specified as 15m for minutes, 8h for hours, or 7d for days
-T, --content-type	Specify the http Content-Type header
-e, --expect	The text to expect in the first line of the http response; If specified, the plugin will not handle status code logic (that is, it won't warn about 404)
-s, --string	Search for the specified text in result HTML
-r, --ereg	Search for a specified regular expression in HTML (case sensitive)
-R, --eregi	Search for a specified regular expression in HTML (case insensitive)
-l, --linespan	Allow the regular expression to span across new lines
--invert-regex	Return a state of CRITICAL if the text is found, and OK if it is not found
-a, --authorization	Authorize the page using the basic authentication type; must be passed in the form of <username>:<password>
-b, --proxy-authorization	Authorization for the proxy server; must be passed in the form of <username>:<password>
-A, --useragent	Pass the specified value as the User-Agent http header
-k, --header	Add other parameters to be sent in http header (might be repeated)
-f, --onredirect	How to handle redirects, such as ok, warning, critical, follow
-m, --pagesize	The minimum and maximum HTML page sizes in bytes; as <min>:<max>
-C, --certificate	Specifies how long the certificate has to be valid in days; should be in form of critical_days or critical_days,warning_days
--sni	Server Name Indication (SNI) enables SSL/TLS hostname extension support; this allows verification of the SSL-enabled websites with multiple sites on a single IP address

For example, to verify if a main page has at least the specified number of bytes and is returned promptly, the following check can be done:

define command

{

command_name check_http_basic

command_line $USER1$/check_http –H $HOSTADDRESS$ -f follow

-m $ARG1$:1000000 -w $ARG2$ -c $ARG3$

}

More complex tests of the WWW infrastructure should be carried out frequently. For example, to verify if an SSL-enabled page works correctly and quickly, a more complex test might be required. The following command will verify the SSL certificate and the page size, and it will look for a specific string in the page body:

define command

{

command_name check_https

command_line $USER1$/check_http –H $HOSTADDRESS$ -S –C 14 -u $ARG1$ -f follow –m $ARG1$:$ARG2$ -R $ARG3$

}

Checking web pages at a higher level is described in more detail in Chapter 11, Programming Nagios, and it uses the custom-written plugins for this purpose.

Monitoring the database systems

Databases allow the storage of information that is used often by entire departments or whole companies. Because most systems usually depend on one or more databases, a failure in these databases can cause all of the underlying systems to go down as well. Imagine a business-critical database failure that went unnoticed over a weekend, making both the company's website as well as e-mail, unavailable. That would be a disaster! A series of scheduled reports that was supposed to be sent out would fail to be generated because of this.

This is why making sure that databases are working correctly and have enough resources to operate might be essential for many companies. Many enterprise-class databases also have table space capacity management that should also be monitored—even though a valid user may be able to log in, this does not necessarily mean that a database is up and running correctly.

Checking MySQL

One of the most commonly used database types is MySQL. It is very often used to provide a basic database for PHP-based web applications. It is also commonly used as a database system for the client-server applications. Nagios offers two plugins to verify if MySQL is working properly. One of the plugins allows checking of connectivity to the database and checking master-slave replication status. The other one allows the measurement of the time taken to execute an SQL query. The syntax of both the commands, and the definition of their options is as follows:

check_mysql [-H host] [-d database] [-P port] [-s socket]

[-u user] [-p password] [-S]

check_mysql_query -q SQL_query [-w <warn>] [-c <crit>]

[-d database] [-H host] [-P port] [-s socket]

[-u user] [-p password]

Option	Description
-s, --socket	This provides the Unix socket to use for connection, used if -H option was not specified; does not need to be customized in most cases
-P, --port	This provides the port to use for connections (defaults to 3306)
-d, --database	This provides the database to which an attempt to connect is to be made
-u, --username	This provides the username to log in
-p, --password	This provides password to log in
-S, --check-slave	This verifies that the slave thread is running (check_mysql only); this is used for monitoring replicated databases
-w, --warning	This specifies the warning threshold, which is dependent on the plugin used
-c, --critical	This specifies the critical threshold, which is dependent on the plugin used
-q, --query	Query to perform (check_mysql_query only)

For the check_mysql_query command, the -w and -c options specify the limits for the execution time of the specified SQL query. This allows us to make sure that the database performance is within acceptable limits.

The definitions of the check commands for both a simple test and running an SQL query within a specified time are as follows.

define command

{

command_name check_mysql

command_line $USER1$/check_mysql –H $HOSTADDRESS$ -u $ARG1$

-p $ARG2$ -d $ARG3$ -S –w 10 –c 30

}

define command

{

command_name check_mysql_query

command_line $USER1$/check_mysql_query –H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -d $ARG3$ -q $ARG4$ –w $ARG5$

-c $ARG6$

}

Both the examples need username, password, and database name as arguments. The second example also requires an SQL query, warning, and critical time limits.

If the -S option is specified, the plugin will also check whether the replication of MySQL databases is working correctly. This check should be run on the MySQL slave servers to make sure that the replication with the master server is in place. Monitoring the number of seconds by which the slave server is behind the master server can be done using the -w and –c flags. In this case, if the slave server is more than the specified number of seconds behind the master server in the replication process, a warning or critical status is issued. More information about checking the replication status can be found under the MySQL documentation for the SHOW SLAVE STATUS command (visit http://dev.mysql.com/doc/refman/5.0/en/show-slave-status.html).

Checking PostgreSQL

PostgreSQL is another open source database that is commonly used in hosting companies. It is also used very often for client-server applications. The Nagios plugins package offers a command to check if the PostgreSQL database is working correctly. Its syntax is quite similar to the MySQL command:

check_pgsql [-H <host>] [-P <port>] [-w <warn>] [-c <crit>]

[-t <timeout>] [-d <database>] [-l <logname>] [-p <password>]

The following table describes the options that this plugin accepts:

Option	Description
-P, --port	This provides the port to use for connections (defaults to 5432)
-d, --database	This is used to connect to the database
-l, --logname	This provides the username to log in
-p, --password	This provides the password to log in

A sample check command that expects username, password, and database name as arguments is as follows:

define command

{

command_name check_pgsql

command_line $USER1$/check_pgsql –H $HOSTADDRESS$ -l $ARG1$

-p $ARG2$ -d $ARG3$

}

Checking Oracle

Oracle is a popular enterprise-level database server. It is mainly used by medium- and large-sized companies for business-critical applications. Therefore, a failure, or even a lack of disk space, for a single database might cause huge problems for a company. Fortunately, a plugin exists to verify the various aspects of the Oracle database. And it even offers the ability to monitor tablespace storage and cache usage. The syntax is quite different from most Nagios plugins, as the first argument specifies the mode in which the check should be carried out and the remaining parameters are dependent on the first one. The syntax is as follows:

check_oracle --tns <SID>

--db <SID>

--oranames <Hostname>

--login <SID>

--cache <SID> <USER> <PASS> <CRITICAL> <WARNING>

--tablespace < SID> <USER> <PASS>

<TABLESPACE> <CRITICAL> <WARNING>

For all checks, Oracle System Identifier (SID) can be specified in the form of <ip> or <ip>/<database>. Because the plugin automatically adds the username and password to the identifier, an SID in the form of <username>[/<password>]@<ip>[/<database>] should not be specified, and it will not work in many cases.

The --tns option checks if a database is listening for a connection based on the tnsping command. This can be used as a basic check of both local and remote databases. Verifying that a local database is running can be done using the --db option—in which case, a check is performed by running the Oracle process for a specified database. Verifying a remote Oracle Names server can be done using the --oranames mode.

In order to verify if a database is working properly, a --login option can be used—this tries to log in using an invalid username and verifies whether the ORA-01017 error is received; in which case, the database is behaving correctly.

Verifying cache usage can be done using the --cache option; in which case, the cache hit ratio is checked. If it is lower than the specified warning or critical limits, the respective status is returned. This allows the monitoring of bottlenecks within the database caching mechanism.

Similarly, for tablespace checking, a --tablespace option is provided. A check is carried out against the available storage for the specified tablespace. If it is lower than the specified limits, a warning or critical status is returned (as appropriate).

This plugin requires various Oracle commands to be in the binary path (the PATH environment variable). Therefore, it is necessary to have either the entire Oracle installation or the Oracle client installation done on the machine that will perform the checks for the Oracle database. Sample definitions to check the login into the Oracle database and the database cache are as follows:

define command

{

command_name check_oracle_login

command_line $USER1$/check_oracle --login $HOSTADDRESS$

}

define command{

command_name check_oracle_tablespace

command_line $USER1$/check_oracle --cache

$HOSTADDRESS$/$ARG1$

$ARG2$ $ARG3$ $ARG4$ $ARG5$

}

The second example requires the passing of the database name, username, password, and critical/warning limits for the cache hit ratio. The critical value should be lower than the warning value.

Checking other databases

Even though Nagios supports verification of some common databases, there are a lot of commonly-used databases for which the standard Nagios plugins package does not provide a plugin. For these databases, the first thing worth checking is the Nagios Exchange (visit http://exchange.nagios.org/) as this has a category for database check plugins, with commands for checking various types of databases (such as DB2, Ingres, Firebird, MS SQL, and Sybase).

In some cases, it might be sufficient to use the check_tcp plugin to verify whether a database server is up and running. In other cases, it might be possible to use a dynamic language (such as Python, Perl, or Tcl) to write a small script that connects to your database and performs basic tests. See Chapter 11, Programming Nagios, for more information on writing the Nagios check plugins.

Monitoring the storage space

Making sure that a system is not running out of space is very important. A lack of disk space for the basic paths such as/var/spool, or /tmp might cause unexpected results throughout the entire system, such as applications failing due to not being able to write temporary files or local e-mail not being delivered due to lack of disk space. Quotas that are not properly set up for home directories might also cause disk space to run out in a few minutes under certain circumstances.

Nagios can monitor storage space and warn administrators before such problems happen. It is also possible to monitor remote shares on other disks without mounting them. This would be useful for easily monitoring disk space on Windows boxes, without installing the dedicated Windows Nagios tools described in Chapter 10, Advanced Monitoring.

Checking the swap space

Making sure that a system is not running out of swap space is essential to the system's correct behavior. Many operating systems have mechanisms that kill the most resource-intensive processes when the system is running out of memory, and this usually leads to many services not functioning properly—many vital processes are not properly respawned in such cases. It is therefore a good idea to monitor swap space usage in order to be able to handle low memory issues on critical systems.

Nagios offers a plugin to monitor each swap device independently, as well as the ability to monitor cumulative values. The syntax and description of these options are as follows:

check_swap [-a] [-v] -w limit -c limit

Option	Description
-a, --all	This compares all swap partitions one by one; if not specified, only total swap sizes are checked.

Values for the -w and -c options can be supplied in the form of <value>%, in which case the <value> percent must be free in order not to cause an exception to be generated. They can also be supplied in the form <value><unit> (for example, 1000k, 100M, 1G), and in this case, a test fails if less than the specified amount of swap space is available. A sample definition of a check is as follows:

define command

{

command_name check_swap

command_line $USER1$/check_swap –w $ARG1$ -c $ARG2$

}

Monitoring the disk status using SMART

Nagios offers a standard plugin that uses Self-Monitoring, Analysis, and Reporting Technology Syste) (SMART) technology to monitor and report the failure of disk operations. This plugin operates on top of the SMART mechanism and verifies the status of local hard drives. If supported by the underlying IDE and SCSI hardware. This plugin allows the monitoring of hard disk failures. The syntax is as follows:

check_ide_smart [-d <device>] [-i] [-q] [-1] [-O] [-n]

The following table provides a description of the accepted options:

Option	Description
-d, --device	The device to verify; if this option is set, no other options are accepted
-i, --immediate	Perform offline tests immediately
-q, --quick-check	Return the number of failed tests
-1, --auto-on	Enable automatic offline tests
-0, --auto-off	Disable automatic offline tests
-n, --nagios	Return output suitable for Nagios

A sample definition of a command to monitor a particular device and report failed tests is as follows:

define command

{

command_name check_ide_smart

command_line $USER1$/check_ide_smart –d $ARG1$ -1 –q -n

}

Checking the disk space

One of the most common checks is checking one or more mounted partitions for available space. Nagios offers a plugin for doing this. This plugin offers very powerful functionality and can be set up to monitor one, several, or all partitions mounted on a system. The syntax for the plugin is as follows:

check_disk -w limit -c limit [-W limit] [-K limit]

{-p path | -x device} [-C] [-E] [-e]

[-g group] [-k] [-l] [-M] [-m] [-R path ]

[-r path] [-t timeout] [-u unit] [-v] [-X type]

The most commonly used options for this plugin are described in the following table:

Option	Description
-w, --warning	This returns a warning status if less than the specified percentage of disk space is free
-c, --critical	Return a critical if less than the specified percentage of disk space is free
-W, --iwarning	Return a warning if less than the specified percentage of inodes are free
-K, --icritical	Return a critical if less than specified percentage of inodes are free
-p, --path	The path or partition to verify (option might be specified multiple times)
-M, --mountpoint	Display the mount point instead of the partition in the result
-l, --local	Check only local file systems
-A, --all	Verify all mount points
-r, --ereg-path	Regular expression to find paths/partitions (case sensitive)
-R, --eregi-path	Regular expression to find paths/partitions (case insensitive)

Values for the -w and -c options can be supplied in the form of <value>%, in which case <value> percent must be free in order not to cause a state to occur. They can also be specified in the form of <value><unit> (for example, 800k, 50M, and 4G); in which case, a test fails if the available space is less than the specified amount. Checks for inode availability (options -W and -K) can only be specified in the form of <value>.

It is possible to check a single partition or specify multiple -p, -r, or -R options, and it is also possible to check if all the matching mount points have sufficient disk space. It is sometimes better to define separate checks for each partition so that if the limits are exceeded on several of these, each one is tracked separately. The sample check commands for a single partition and for all partitions are shown in the following examples:

define command

{

command_name check_partition

command_line $USER1$/check_disk –p $ARG1$ –w $ARG2$ -c $ARG3$

}

define command

{

command_name check_local_partitions

command_line $USER1$/check_disk –A –l –w $ARG1$ -c $ARG2$

}

Both of these commands expect warning and critical levels, but the first example also requires a partition path or device as the first argument. It is possible to build more complex checks either by repeating the -p parameter or by using -r to include several mount points.

Testing the free space for remote shares

Nagios offers plugins that allow the monitoring of remote file systems exported over the SMB/CIFS protocol, the standard protocol for file sharing used by Microsoft Windows. This allows you to check whether a specified user is able to log on to a particular file server and monitor the amount of free disk space on the file server. The syntax of this command is as follows:

check_disk_smb -H <host> -s <share> -u <user> -p <password>

-w <warn> -c <crit> [-W <workgroup>] [-P <port>]

Options specific to this plugin are described in the following table:

Option	Description
-s, --share	The SMB share that should be tested
-u, --user	The username to log in to the server (defaults to guest)
-p, --password	The password to use for logging in
-P, --port	The port to be used for connections; defaults to 139

Values for the -w and -c options can be specified in the form <value>%, in which case the <value> percent must be free in order to avoid generating an exception. They can also be specified in the form of <value><unit> (for example, 800k, 50M, and 4G); in which case, the test fails if the available space is less than the specified amount

This command uses the smbclient command to communicate over the SMB protocol. It is therefore necessary to have the Samba client package installed on the machine where the test will be run. Sample command definitions to check connectivity to a share without checking for disk space and also to verify disk space over SMB are as follows:

define command

{

command_name check_smb_connect

command_line $USER1$/check_disk_smb –H $HOSTADDRESS$ -w 100% -c 100% -u $ARG1$ -p $ARG2$ -s $ARG3$

}

define command{

command_name check_smb_space

command_line $USER1$/check_disk_smb –H $HOSTADDRESS$

-u $ARG1$ -p $ARG2$ -s $ARG3$ -w $ARG4$ -c $ARG5$

}

Both of the commands require the passing of a username, password, and share name as arguments. The latter example also requires the passing of warning and critical value limits that should be checked. The first example will only issue a critical state if a partition has no space left. It is also worth noting that Samba 3.x servers report quota as disk space, if this is enabled for the specified user. Therefore, this might not always be an accurate way to measure disk space.

Monitoring the resources

For servers or workstations to be responsive and to be kept from being overloaded, it is also worth monitoring system usage using various additional measures. Nagios offers several plugins to monitor resource usage and to report if the limits set for these checks are exceeded.

Checking the system load

The first thing that should always be monitored is the system load, and it is calculated based on count of processes running or waiting to run. This value reflects the number of processes and the amount of CPU capacity that they are utilizing. This means that if one process is using up to 50 percent of the CPU capacity, the value will be around 0.5. And if four processes try to utilize the maximum CPU capacity, the value will be around 4.0. The system load is measured in three values: the average loads in the last minute, last 5 minutes, and the last 15 minutes. The syntax of the command is as follows:

check_load [-r] –w wload1,wload5,wload15 –c cload1,cload5,cload15

Option	Description
-r, --percpu	Divide the load averages by the number of CPUs

Values for the -w and -c options should be in the form of three values separated by commas. If any of the load averages exceeds the specified limits, a warning, or critical status will be returned, respectively. Here is a sample command definition that uses warning and critical load limits as arguments:

define command

{

command_name check_load

command_line $USER1$/check_load –w $ARG1$ -c $ARG2$

}

Checking the processes

Nagios processes also offer a way to monitor the total number of processes. Nagios can be configured to monitor all processes, only running ones, those consuming CPU, those consuming memory, or a combination of these criteria. The syntax and options are as follows:

check_procs -w <range> -c <range> [-m metric] [-s state]

[-p ppid] [-u user] [-r rss] [-z vsz] [-P %cpu]

[-a argument-array] [-C command] [-t timeout] [-v]

Option	Description
-m, --metric	Select one of the following values for use: PROCS: Number of processes (the default) VSZ: Virtual memory size of the matching process RSS: Resident set memory size of the matching process CPU: Percentage CPU time of the matching process ELAPSED: Time elapsed in seconds of the matching process
-s, --state	Only check processes that have the specified status; this is the same as the status in the ps command
-p, --ppid	Check the children of the indicated process IDs
-z, --vsz	Check processes with a virtual memory size exceeding value
-r, --rss	Check processes with the resident set memory exceeding value
-P, --pcpu	Check processes with the CPU usage exceeding value
-u, --user	Check processes owned by a specified user
-a, --argument-array	Check processes whose arguments contain a specified value
-C, --command	Check processes with exact matches of the specified value as a command

Values for the -w and -c options can either take a single value or take the form of <min>:<max>. In the first case, a warning or critical state is returned if the value (number of processes by default) exceeds the specified number. In the second case, the appropriate status is returned if the value is lower than <min> or higher than <max>. Sample commands to monitor the total number of processes and to monitor the number of specific processes are as follows. The second code, for example, can be used to check if the specific server is running and has not created too many processes. In this case, warning or critical values should be specified ranging from 1.

define command

{

command_name check_procs_num

command_line $USER1$/check_procs –m PROCS –w $ARG1$ -c $ARG2$

}

define command

{

command_name check_procs_cmd

command_line $USER1$/check_procs –C $ARG1$ –w $ARG1$ -c $ARG2$

}

Monitoring the logged-in users

It is also possible to use Nagios to monitor the number of users currently logged in to a particular machine. The syntax is very simple, and there are no options, except the warning and critical limits.

check_users -w limit -c limit

A command definition that uses the warning or critical limits specified in the arguments is as follows:

define command

{

command_name check_users

command_line $USER1$/check_users –w $ARG1$ -c $ARG2$

}

Monitoring other operations

Nagios also offers plugins for many other operations that are common to daily system monitoring and activities; this section covers only a few of them. It is recommended that you look for remaining commands in both the Nagios plugins package as well as on the Nagios Exchange website.

Checking for updates with APT

Many Linux distributions use Advanced Packaging Tool (APT) for handling package management (visit http://en.wikipedia.org/wiki/Advanced_Packaging_Tool). This tool is used by default on Debian and its derivatives such as Ubuntu. It allows the handling of upgrades and download of packages. It also allows the synchronization of package lists from one or more remote sources.

Nagios provides a plugin that allows you to monitor, if any upgrades are available, and/or perform upgrades automatically. The syntax and options are as follows:

check_apt [–d|-u|-U [<opts>]] [-n] [-t timeout]

[-i <regex>] [-e <regex>] [-c <regex>]

Option	Description
-u, --update	Perform an apt update operation prior to other operations
-U, --upgrade	Perform an apt upgrade operation
-d, --dist-upgrade	Perform an apt dist-upgrade operation
-n, --no-upgrade	Do not run upgrade or dist-upgrade; useful only with -u
-i, --include	Include only packages matching a regular expression
-c, --critical	If any packages match a regular expression, a critical state is returned.
-e, --exclude	Exclude packages matching a regular expression

If the -u option is specified, the command first attempts to update apt package information. Otherwise, the package information currently in cache is used. If the -U or -d option is specified, the specified operation is performed. If -n is specified, only an attempt to run the operation is made, without actually upgrading performance monitoring (and not upgrade) activities system. The plugin might also be based on daily updates/upgrades and only monitor.

The following is a command definition for a simple dist-upgrade, as well as for monitoring available packages and issuing a critical state if the Linux images are upgradeable (that is, if newer packages exist). However, this command does not perform the actual upgrades.

define command

{

command_name check_apt_upgrade

command_line $USER1$/check_apt –u -d

}

define command

{

command_name check_apt_upgrade2

command_line $USER1$/check_apt –n –u –d –c "^linux-(image|restrict)"

}

Monitoring the UPS status

Another useful feature is that Nagios is able to monitor the UPS status over the network. This requires the machine with the UPS to have the Network UPS Tools package (visit http://www.networkupstools.org/) installed and running, so that it is possible to query the UPS parameters. It is also possible to monitor local resources using the same plugin. The syntax and options are as follows:

check_ups -H host -u ups [-p port] [-v variable] [-T]

[-w <warn time>] [-c <crit time>] [-t <timeout>]

Option	Description
-u, --ups	The name of the UPS to check
-p, --port	The port to use for TCP/IP connection; defaults to 3493
-T, --temperature	Report the temperature in degrees-celsius
-v, --variable	Variable to output (LINE, TEMP, BATTPCT, or LOADPCT)

The name of the UPS is usually defined in the ups.conf file on the machine to which the command is connecting. The plugin will return an ok state if the UPS is calibrating or running on AC power. A warning state is returned if the UPS claims to be running on batteries, and a critical state is returned in the case of a low battery or if the UPS is off. The following is a sample definition of a check command that gets passed to the UPS name as an argument:

define command

{

command_name check_ups

command_line $USER1$/check_ups –H $HOSTADDRESS$ -u $ARG1$

}

Gathering information from the lm-sensors

This is a Linux-specific plugin that uses the lm-sensors package (visit http://www.lm-sensors.org/) to monitor hardware health. The command issues an unknown state if the underlying hardware does not support health monitoring or if the lm-sensors package is not installed, a warning status is shown if a non-zero error is returned by the sensors command, and a critical status if the string ALARM is found within the output from the command. The plugin does not take any arguments and simply reports information based on thesensors command. The command definition is as follows:

define command

{

command_name check_sensors

command_line $USER1$/check_sensors

}

Using the dummy check plugin

Nagios also offers a dummy checking plugin. It simply takes an exit code. It is useful for testing dependencies between hosts and/or services and verifying notifications, and can also be used for a service that will be measured using passive checks only. The syntax of this plugin is as follows:

check_dummy <exitcode> [<result string>]

A sample command to return an ok status as well as critical with a status text supplied as an argument is shown:

define command

{

command_name check_dummy_ok

command_line $USER1$/check_dummy 0

}

define command

{

command_name check_dummy_critical

command_line $USER1$/check_dummy 2 $ARG1$

}

Manipulating other plugins' output

Nagios offers an excellent plugin that simply invokes other checks and converts their status accordingly. This might be useful when a failed check from a plugin is actually an indication that the service is working correctly. This can, for example, be used to make sure that non-authenticated users can't send e-mails while valid users can. The syntax and options are as follows:

negate [–t timeout] [-o|-w|-c|-u state] <actual command to run>

Option	Description
-o, --ok	This provides a state to which to return, when the actual command returns an ok state
-w, --warning	This provides a state to which to return, when the actual command returns a warning state
-c, --critical	This provides a state to which to return, when the actual command returns a critical state
-u, --unknown	This provides a state to which to return, when the actual command returns an unknown state

The state to return to can either be specified as exit code number or as a string. If no options are specified, only the ok and critical states are swapped. If at least one status change option is specified, only the specified states are mapped. Sample command definitions to check that an SMTP server is not listening and to verify that a user can't log into a POP3 server are as follows:

define command

{

command_name check_nosmtp

command_line $USER1$/negate $USER1$/check_smtp

–H $HOSTADDRESS$

}

define command

{

command_name check_pop3loginfailure

command_line $USER1$/negate –o critical –w ok –c critical

$USER1$/check_pop -H $HOSTADDRESS$ -E

–s "USER $ARG1$\r\nPASS $ARG2$\r\n" -d 5

-e "ogged in"

}

The first example does not use state mapping, and the default ok for critical state replacement is done. The second example maps the states, so that if a server is not listening or if the user is actually able to log in, it is considered a critical status for the service.

Additional and third-party plugins

So far, we have used plugins that are part of the standard Nagios Plugins package. It provides plugins for monitoring typical servers. The IT setup often consists of large variety of hardware and software that has to be monitored. There are many devices and services that should be monitored. In many cases, standard plugins are enough to properly monitor them, such as monitoring using PING, SSH, or HTTP.

There are, however, many applications that require more sophisticated checks, such as applications communicating over a custom protocol that can be checked using check_udp or check_tcp by specifying handshake to perform and expected response. In addition, many services require more sophisticated checks, such as verifying that OpenVPN server performs a proper handshake, which cannot easily be done using check_udp or check_tcp. A check that it is listening can be done, but it could simply be another service running at the same port.

Monitoring the network software

Monitoring IT resources often requires verifying that the network services are working properly. This can be anything—a web, SSH or FTP server as well as many other protocols. There are also a large number of custom protocols that also require monitoring. Popular network services have a working plugin already that can simply be used. However, often it is up to us to create a check.

In many cases, it is sufficient to just use check_udp or check_tcp and check for a specific string. It is often enough to just check the result message on the VMware server. With other services, it may also require sending a specific command. For example, the following command definition allows monitoring the Redis service (visit http://redis.io/ for more details), which has a simple, line-based protocol:

define command

{

command_name check_redis

command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 6379

-E -s "PING\r\n" -e "+PONG" -w 1 -c 2

}

Redis is a key-value-based store that is often used by server applications to store information and communicate between instances. It is commonly used for large web applications such as cache or temporary data storage.

The preceding example connects to the host on port 6379 (which is the port on which Redis is listening) and sends a PING command followed by newline characters, expecting a +PONG as response. The response time has to be below 1 second for OK status; it is aWARNING if it is below 2 seconds, and a CRITICAL status if it longer than that.

This approach can also be used to also send more complex commands such as authenticating Redis using the AUTH command:

define command

{

command_name check_redis_auth

command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 6379

-E -s "AUTH $ARG1$\r\nSELECT 0\r\n"

-e "+OK" -w 1 -c 2

}

This check will result in failure if authentication using specified password does not work, as the SELECT command will not return +OK unless the authentication succeeds. Similarly, a check can be made for memcached (visit http://memcached.org/ for more details), which is a cache mechanism often used by web applications as well:

define command

{

command_name check_memcached

command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 11211

-E -s "version\n" -e "VERSION" -w 1 -c 2

}

In this case, the only difference is the use of port 11211 (which is the default port for memcached) and sending different commands. The standard check commands can be used for almost all protocols. However, it is a practical solution mainly for text-based protocols, since encoding binary data requires more work and it is often easier to create a custom plugin for this—especially if a library to communicate over the protocol is available. This approach is described in more details in Chapter 11, Programming Nagios, of this book.

Using third-party plugins

There are also many cases where a simple monitoring by sending protocol-specific messages is not enough. For instance, monitoring the MySQL replication status requires a dedicated plugin to report delays properly and set warning or critical status if it exceeds the specified threshold.

In such cases, it is best to use the existing plugins if they exist, or write new ones if they don't. The Nagios Exchange at http://exchange.nagios.org/ is the best place to start looking for plugins as it is historically the first and the largest directory of additions created by the Nagios community.

The website contains a dedicated category for plugins and at the time of writing this book, the directory contains over 3,000 plugins. They are grouped into categories based on the type of checks performed.

The category with largest number of plugins is Network Protocols (visit http://exchange.nagios.org/directory/Plugins/Network-Protocols), and it is over 20 percent of all plugins available on the website. It contains ready-to-use plugins to perform various types of checks, such as mail system, VoIP file, and web protocols checks.

Nagios Exchange also has a section for databases with a lot of plugins available. It provides ready-to-use code for monitoring many types of servers such as MySQL, PostgreSQL, Oracle, DB2, and the SQL Server—some of which do not have a dedicated check in the nagios-plugins project. For many databases, there are multiple plugins available, ranging from a basic service check to more advanced features, such as monitoring replication status, disk usage, and memory usage. All of the plugins can be found in the Databases section available at http://exchange.nagios.org/directory/Plugins/Databases.

The website provides a large number of plugins for monitoring web servers and web applications. This includes checks for common web servers, such as Apache and IIS, but there are also multiple choices for monitoring other web and application servers, such as Nginx, IIS, Tomcat, and JBoss. There are many plugins for monitoring specific solutions, such as Fast-CGI processes, PHP-FPM (a Fast-CGI based solution to run PHP applications with many web servers), and Passenger module (used to serve Ruby on Rails and Python/Django applications on top of Apache and Nginx). Also, there are different plugins aspects of monitoring web servers—number of processes, memory and CPU usage, and many more. The web-related plugins for monitoring can be found on Nagios Exchange under the Web Servers category available at http://exchange.nagios.org/directory/Plugins/Web-Servers.

Nagios Exchange also provides a lot of ready-to-use plugins to monitor a wide range of devices and services. There are multiple plugins to monitor various operating systems (at http://exchange.nagios.org/directory/Plugins/Operating-Systems), network devices (athttp://exchange.nagios.org/directory/Plugins/Hardware/Network-Gear), and network connectivity (at http://exchange.nagios.org/directory/Plugins/Network-Connections%2C-Stats-and-Bandwidth).

When using a third-party plugin, either from Nagios Exchange or after downloading from another website directly, it is important to remember about security and licensing issues.

As plugins are run using the same user as Nagios itself, a malicious or erroneous plugin may be able to remove Nagios data files or other important data. It is always best to use a plugin that is in active development, and preferably has its source code available so that in case of problems it is possible to fix them or get support from the author of the plugin.

Some plugins may have licenses that prevent them from being used in certain environments or that require a license in such case. There are also cases where a plugin may depend on libraries or software that requires a license for each server on which it is installed, for instance, a plugin to monitor a proprietary service may require a client library to connect, which may require additional license.

It is also possible to create your own plugins, and as the plugin interface is very easy, it can be done in almost any language—all that is needed is to print the result to standard output and use appropriate exit code to indicate the status. Writing your own plugins is described in more detail in Chapter 11, Programming Nagios.

Summary

The Nagios plugins package offers a large variety of checks that can be performed to monitor your infrastructure. Whether you are an administrator of an IT company managing a large network or just want to monitor a small server room, these plugins will allow you to check the majority of the services that you are currently using.

In this chapter, we have learned how the plugins report status to Nagios using standard output and exit codes. We have also learned about the Nagios plugins project and the standard options for all of the plugins within the package.

We have also covered the generic communication plugins for checking remote host connectivity using ping, as well as generic TCP and UDP checking plugins. The chapter also described how to perform checks of standard networking protocols, such as e-mail, FTP, DHCP, website checking as well as Nagios process information.

We have also learned about checking various databases and how it can also be used to monitor the propagation of data to slave databases. The chapter also covers information about monitoring disk and swap space, as well as monitoring system resources and processes.

We have also learned how to monitor additional operations, such as APT package management status, UPS, and lm-sensors. We have also learned how to use third-party plugins in Nagios. The next chapter will cover how to create the Nagios configuration so that it can be used for monitoring both small and large infrastructures. It also covers advanced configuration options such as dependencies, custom variables, inheriting, and flapping.