Creating a new network host
In this recipe, we'll start with the default Nagios Core configuration and set up a host definition for a server that responds to PING on our local network. The end result will be that Nagios Core will add our new host to its internal tables when it starts up and will automatically check it (probably using PING) on a regular basis. In this example, I'll use the example of my Nagios Core monitoring server with the DNS name olympus.example.net
and add a host definition for a web server with the DNS name sparta.example.net
. This is all on an example network 192.0.2.0/24
.
Getting ready
You'll need a working Nagios Core 4.0 or greater installation with a web interface and all the Nagios Core plugins installed. If you have not yet installed Nagios Core, you should start with the quick start guide at http://nagios.sourceforge.net/docs/nagioscore/4/en/quickstart.html that is appropriate to your operating system.
We'll assume that the configuration file Nagios Core reads on startup is at /usr/local/nagios/etc/nagios.cfg
, as is the case with the default installation. It shouldn't matter where you include this new host definition in the configuration, as long as Nagios Core is going to read the file at some point. However, it might be a good idea to give each host its own file in a separate objects directory, which we'll do here. You should have access to a shell on the server and be able to write text files using an editor of your choice; I'll use vi
. You will need root
privileges on the server via su
or sudo
.
You should know how to reload Nagios Core on the server so that the configuration you're going to add gets applied. It shouldn't be necessary to restart the whole machine to do this! A common location for the startup/shutdown script on Unix-like hosts is /etc/init.d/nagios
, which I'll use here. On modern GNU/Linux systems, it may be a better practice to use system nagios reload
.
You should also get the hostname or IP address of the server you'd like to monitor ready. We'll use IP addresses rather than DNS hostnames here, which means that our checks will keep working even if DNS is unavailable. You may prefer to use hostnames if your addresses change regularly. You shouldn't need the subnet mask or anything like that; Nagios Core will only need whatever information the ping(8)
tool would need for its own check_ping
command.
Finally, you should test things first; confirm that you're able to reach the host from the Nagios Core server using ping(8)
by checking directly from the shell, to make sure your network stack, routes, firewalls, and netmasks are all correct:
user@olympus:~$ ping 192.0.2.21 PING sparta.example.net (192.0.2.21) 56(84) bytes of data. 64 bytes from sparta.example.net (192.0.2.21): icmp_req=1 ttl=64 time=0.149 ms
How to do it...
We can create the new host definition for sparta.example.net
as follows:
- Change the directory to
/usr/local/nagios/etc/objects
and create a new file calledsparta.example.net.cfg
:# cd /usr/local/nagios/etc/objects # vi sparta.example.net.cfg
- Write the following code into the file, changing the values in bold as appropriate for your own setup:
define host { host_name sparta.example.net alias sparta address 192.0.2.21 max_check_attempts 3 check_period 24x7 check_command check-host-alive contacts nagiosadmin notification_interval 60 notification_period 24x7 }
- Change the directory to
/usr/local/nagios/etc
and edit thenagios.cfg
file:# cd .. # vi nagios.cfg
At the end of the file, add the following line:
cfg_file=/usr/local/nagios/etc/objects/sparta.example.net.cfg
- Reload the configuration:
# /etc/init.d/nagios reload
If the server restarted successfully, the web interface should now show a brand new host in the hosts list and a PENDING
state as it waits to verify that the host is alive:
In the next few minutes, the host's background should change to green to show that the verification was complete and the host status should change to UP
, assuming that the checks succeeded:
If the test failed and Nagios Core was not able to get a PING response from the target machine after three tries, for whatever reason, it would probably look something like this:
How it works...
The configuration we included in the preceding adds a host to Nagios Core's list of hosts to check, Nagios Core will periodically send a PING request to 192.0.2.21
, checking whether it receives a reply, and will update the status as shown in the Nagios Core web interface appropriately. We have neither defined any other services to check for this host yet, nor have we specified what action it should take if the host is down. However, the host itself will be automatically checked at regular intervals by Nagios Core and we can view its state in the web interface at any time.
The directives we defined in the preceding configuration are as follows:
host_name
: This defines the hostname of the machine that is used internally by Nagios Core to refer to this host. It will end up being used in other parts of the configuration.alias
: This defines a more recognizable human-readable name for the host; this appears in the web interface. It could also be used for a full-text description of the host.address
: This defines the IP address of the machine. This is the actual value that Nagios Core will use to contact the server; using an IP address rather than a DNS name is generally a best practice, so the checks continue to work even if DNS is not functioning. In Nagios 4.0 or newer, if you leave this field blank, the value ofhost_name
will be used instead. Before using Nagios 4.0, you must define it.max_check_attempts
: This defines the number of times Nagios Core should try to run the check if the checks fail. Here, we've defined a value of3
, meaning that Nagios Core will make a total of three attempts to contact the host before flagging it asDOWN
.check_period
: This references the time period during which this host should be checked. The24x7
time period is defined in the default configuration for Nagios Core. This is a sensible value for hosts, as it means the host will always be checked. This defines how often Nagios Core will check the host, not how often it will notify anyone.check_command
: This references the command that will be used to check whether the host isUP
,DOWN
, orUNREACHABLE
. In this case, a standard Nagios Core configuration definescheck-host-alive
as a PING check, which suits as a good test of basic network connectivity and a sensible default for most hosts. This directive is actually not required to make a valid host, but you will want to include it under most circumstances; without it, no checks will be run.contacts
: This references the contact or contacts that will be told about state changes in the host. In this instance, we've usednagiosadmin
, which is defined in the default Nagios Core configuration.notification_interval
: This defines how regularly the host should repeat its notifications if it is having problems. Here, we've used a value of60
, which corresponds to 60 minutes, or 1 hour.notification_period
: This references the time period during which Nagios Core should send out notifications if there are problems. Here, we're again use the24x7
time period, but for other hosts, another time period such asworkhours
might be more appropriate.
Note that we added the definition in its own file called sparta.example.net.cfg
and then referred to it in the main configuration file nagios.cfg
. This is simply a conventional way of laying out hosts and it happens to be a tidy way to manage things to keep definitions in their own files.
There's more...
There are a lot of other useful parameters for hosts, but the ones we've used include everything that's required.
While this is a perfectly valid way of specifying a host, it's more typical to define a host based on a
template, with definitions of how often the host should be checked, who should be contacted when its state changes and on what basis, and similar properties. Nagios Core defines a simple template host called generic-host
, which could be used by extending the host definition, as with the use
directive:
define host { use generic-host name sparta host_name sparta.example.net address 192.0.2.21 max_check_attempts 3 contacts nagiosadmin check_period 24x7 check_command check-host-alive }
This uses all the parameters defined for generic-host
and then adds on the details of the specific host that needs to be checked. If you're curious to see what's defined in generic-host
, you'll find its definition by navigating to /usr/local/nagios/etc/objects/templates.cfg
.
See also
- Specifying how frequently to check a host, Chapter 3, Working with Checks and States
- Using an alternative check command for hosts, Chapter 2, Working with Commands and Plugins
- Grouping configuration files in directories, Chapter 9, Managing Configuration
- Using inheritance to simplify configuration, Chapter 9, Managing Configuration