Nagios Installation and Configuration (Part 2)


Nagios Configuration Files

We have already dicussed sample configuration files will appear in /usr/local/nagios/etc folder.The following files are basic configuration files if you don't see any one of these file you need to create each file with the exact syntax.

We will explain each file with the complete syntax in the following sections

Nagios has a list of important files on which they depend upon. These range from the config files to the plugins, logs, command files etc.

The following are the files of importance in Nagios:

Note: The file path is assumed based on the default locations of the files.

Main Configuration File

/usr/local/nagios/etc/nagios.cfg

This is the configuration file which defines the various directives that Nagios uses. These directives include the path to various folders where Nagios needs to check in for the required files, the object config files, the command files etc and various other parameters which decide how Nagios operates.

Resource File

/usr/local/nagios/etc/resource.cfg

This file has the suer defined macros and other sensitive configuration information which are denied access for the CGIs.

Commands Config File

/usr/local/nagios/etc/commands.cfg

CGI Config file

/usr/local/nagios/etc/cgi.cfg

Other Object Configuration files include but not limited to the following:

/usr/local/nagios/etc/hosts.cfg

/usr/local/nagios/etc/hostgroup.cfg

/usr/local/nagios/etc/services.cfg

/usr/local/nagios/etc/servicegroup.cfg

/usr/local/nagios/etc/contacts.cfg

/usr/local/nagios/etc/contactsgroup.cfg

/usr/local/nagios/etc/timeperiod.cfg

Nagios Command File

/usr/local/nagios/var/rw/nagios.cmd

Nagios check this file for external commands to process. The command CGI writes commands to this file. Other third party programs can write to this file if proper file permissions have been granted as outline in here. The external command file is implemented as a named pipe (FIFO), which is created when Nagios starts and removed when it shuts down. If the file exists when Nagios starts, the Nagios process will terminate with an error message.

Nagios Log Files

Status Log

/usr/local/nagios/var/status.log

Downtime Log File

/usr/local/nagios/var/downtime.log

Comment log File

/usr/local/nagios/var/comment.log

Nagios Lock File

/tmp/nagios.lock

Nagios creates this file when it runs as a daemon. This file contains the process id (PID) number of the running Nagios process.

Nagios Temp File

/usr/local/nagios/var/nagios.tmp

State Retention File

/usr/local/nagios/var/status.sav

This is the file that Nagios will use for storing service and host state information before it shuts down. When Nagios is restarted it will use the information stored in this file for setting the initial states of services and hosts before it starts monitoring anything. This file is deleted after Nagios reads in initial state information when it (re)starts.
Configure nagios Files

These are the Object configuration files for nagios these files are pointed in nagios.cfg file which is the main configuration file.If you don't have the following files just create these files using the follwing command

#touch <filename>

and Check the file permissions and ownership

/usr/local/nagios/etc/contactgroups.cfg
/usr/local/nagios/etc/contacts.cfg
/usr/local/nagios/etc/services.cfg
/usr/local/nagios/etc/dependencies.cfg
/usr/local/nagios/etc/escalations.cfg
/usr/local/nagios/etc/hostgroups.cfg
/usr/local/nagios/etc/hosts.cfg
/usr/local/nagios/etc/servicegroups.cfg
/usr/local/nagios/etc/timeperiods.cfg

You will first need to set the authentication option for the nagiosadmin user in $NAGIOSHOME/etc/cgi.cfg:-

use_authentication=1
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin

Of course, other users can be set up with different privileges. Remember to create them in $NAGIOSHOME/etc/htpasswd.users.

Also, you need to make sure that the relevant users have the correct permissions for nagios. Usually, you will want the admin user to be able to do everything. So, edit these lines in $NAGIOSHOME/etc/cgi.cfg as follows:-

authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin

Check through the $NAGIOSHOME/etc/nagios.cfg to see which are the best options for you with things like whether nagios allows external commands to be executed through the web interface, how often to rotate log files etc.

If you decide to make external commands accessible to nagios, then you make ensure that the directory $NAGIOSHOME/var/rw is readable and writeable by the web server user (usually 'www-data').

If you do want to allow external commands to be parsed and acted on by Nagios, you need to set the directive:

check_external_commands=1

in $NAGIOSHOME/etc/nagios.cfg Then we need a new user group and relevant permissions on $NAGIOSHOME/var/rw and $NAGIOSHOME/var/rw/nagios.cmd accordingly:-

#groupadd nagiocmd
#usermod -G nagiocmd nagios
#usermod -G nagiocmd www-data

where "www-data" is the apache user. Now make the command directory (if it does not already exist).

#mkdir $NAGIOSHOME/var/rw

and set the permissions

#chown nagios:nagiocmd $NAGIOSHOME/var/rw
#chmod u+rwx $NAGIOSHOME/var/rw
#chmod g+rwx $NAGIOSHOME/var/rw
#chmod g+s $NAGIOSHOME/var/rw

You'll need to restart apache so that it can take advantage of being part of the nagiocmd group.

Templating Configuration Files

With all of the object configuration files, you can use templates to make the files smaller and save you time and effort when you need to make changes to them. Let's take the example of the services definitions (see later for more explanation):-

# Generic service definition template
define service{
name generic-service ; The 'name' of this service template, referenced in other service definitions
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
contact_groups $CONTACT_GROUP1
is_volatile 0
check_period $PERIOD
max_check_attempts #n
normal_check_interval #n
retry_check_interval #n
notification_interval #n
notification_period $PERIOD
notification_options w,u,c,r
check_command $COMMAND $ARGUMENTS
service_description $SERVICE

register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

# Service definition
define service{
use generic-service
host_name $HOST1,$HOST2,$HOST3...
}

# Service definition
define service{
use generic-service
host_name $HOST4,$HOST5...
contact_groups $CONTACT_GROUP1,$CONTACTGROUP2
}

Any pretty common directives to the service checking can go into the template section at the top, then specify only the bits that would differ for specific (groups of) hosts in the service definition sections. Also, you can over-ride templated settings in the specific service definition sections.

Configure time periods (timeperiods.cfg)

You need to think about what time periods you would want to separate out the notifications and checking of services. e.g.

# '24x7' timeperiod definition
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}

# 'workhours' timeperiod definition
define timeperiod{
timeperiod_name workhours
alias "Normal" Working Hours
monday 08:00-18:00
tuesday 08:00-18:00
wednesday 08:00-18:00
thursday 08:00-18:00
friday 08:00-18:00
}

# 'nonworkhours' timeperiod definition
define timeperiod{
timeperiod_name nonworkhours
alias Non-Work Hours
sunday 00:00-24:00
monday 00:00-09:00,17:00-24:00
tuesday 00:00-09:00,17:00-24:00
wednesday 00:00-09:00,17:00-24:00
thursday 00:00-09:00,17:00-24:00
friday 00:00-09:00,17:00-24:00
saturday 00:00-24:00
}

# 'none' timeperiod definition
define timeperiod{
timeperiod_name none
alias No Time Is A Good Time
}

Notice that time period definitions are allowed to overlap.

For most purposes, the existing configuration is pretty good, though you may just want to tweak the "workhours" definitions (and thus the "nonworkhours" from 9am-5pm to your local requirements. This edit can be made in the $NAGIOSHOME/etc/timeperiods.cfg If you plan to make no changes from the supplied timeperiods.cfg-sample file, then just copy it to timeperiods.cfg and you're done.

Configure contacts (contacts.cfg)

Obviously, the point of monitoring is that the relevant people know when something isn't right. So, one thing we need to do is to set up a list of people who will be notified in the event of problems. e.g.:- Let's say we have 6 servers, 2 in London (LON1 and LON2), 2 in New York (NY1 and NY2) and 2 in Hong Kong (HK1 and HK2). Each location has one machine that is a gateway and firewall (machine 1) and the other machine is mail and webcache (machine 2) and the webserver runs on LON1. There are people in the company responsible for various services and hardware and there are those who would need to know in the event of an outage, for escalation purposes.

You will need one section per person. Let's take two people; Fred Bloggs (login ID fbloggs, email address fbloggs@bigcorp.com), who is the operations manager and needs to know 24x7x365 about problems and Joanna Smith (login ID jsmith, email address jsmith@bigcrop.com), who is a web architect and needs to know about critical problems with her web servers on weekdays, in working hours, but someone else covers at weekends and warnings aren't of interest.

# 'fbloggs' contact definition
define contact{
contact_name fbloggs
alias Fred Bloggs
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email fbloggs@bigcorp.com
}

# 'jsmith' contact definition
define contact{
contact_name jsmith
alias Joanna Smith
service_notification_period workinghours
host_notification_period workinghours
service_notification_options u,c
host_notification_options d,u
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email jsmith@bigcorp.com
}

Configure contact groups (contactsgroup.cfg)

In our hypothetical company, we have various functional groups responsible for technical issues:-

Mail admins - Fred
New York admins - Fred, Joanna

... etc. and we can define these groups in the $NAGIOSHOME/etc/contactgroups.cfg file:-

# 'mail-admins' contact group definition
define contactgroup{
contactgroup_name mail-admins
alias Mail Admins
members fbloggs
}

# 'ny-admins' contact group definition
define contactgroup{
contactgroup_name ny-admins
alias New York Admins
members fbloggs,jsmith
}
...and so on.

Configure host groups (hostgroup.cfg)

Host groups are useful to separate different physical locations, functions and services. Hosts can be members of one or more groups. We could group them as follows:-

Hong Kong Group: HK1,HK2
New York Group: NY1,NY2
London Group: LON1,LON2,LON3
Mail Servers: HK2,NY2,LON2
Gateways: HK1,NY1,LON1
Firewalls: HK1,NY1,LON1
Webcaches: HK1,NY1,LON1
Webservers: LON3

So, in the view of host groups, there is a logical set-out by location and by function, making it easier to spot problems. We can specify the groups in the $NAGIOSHOME/etc/hostgroups.conf for this example like this:-

# 'hong-kong' host group definition
define hostgroup{
hostgroup_name hong-kong
alias Hong Kong Group
contact_groups hk-admins*
members HK1,HK2
}

# 'new-york' host group definition
define hostgroup{
hostgroup_name new-york
alias New York Group
contact_groups ny-admins*
members NY1,NY2
}

# 'london' host group definition
define hostgroup{
hostgroup_name london
alias London Group
contact_groups lon-admins*
members LON1,LON2,LON3
}

# 'mail' host group definition
define hostgroup{
hostgroup_name mail
alias Mail Servers
contact_groups mail-admins,hk-admins,ny-admins,lon-admins*
members HK2,NY2,LON2
}

# 'gateway' host group definition
define hostgroup{
hostgroup_name gateway
alias Gateway Servers
contact_groups infrastructure,hk-admins,ny-admins,lon-admins*
members HK1,NY1,LON1
}

# 'firewall' host group definition
define hostgroup{
hostgroup_name firewall
alias Firewalls
contact_groups security,hk-admins,ny-admins,lon-admins*
members HK1,NY1,LON1
}

# 'cache' host group definition
define hostgroup{
hostgroup_name cache
alias Webcaches
contact_groups infrastructure*
members HK1,NY1,LON1
}

# 'www' host group definition
define hostgroup{
hostgroup_name www
alias Web Servers
contact_groups infrastructure, webbies*
members LON3
}
* - host groups do not take contact_groups as a directive in Nagios 2.0.

Configure hosts (hosts.cfg)

This is the part where you tell nagios which hosts you are interested in. In $NAGIOSHOME/etc/hosts.cfg you can specify the hosts by IP address, give them a label and set which check command to use for testing whether it is alive and finally, what time period you want to use for notifications. e.g. for our company's webserver, LON3, we reference the generic host definition given at the top of the hosts.cfg-sample file which we retain in hosts.cfg and specify specifics:-

# 'LON1' host definition
define host{
use generic-host

host_name LON3
alias Solaris/Apache webserver
address 192.168.1.13
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}

Now, when it comes to the status map, where you will want to make the map look like the physical layout, you can use the "parents" parameter to specify which host is the parent to the one you are defining. For example, if you want the map to show LON1, LON2 and LON3 connected to a router "Route1" on the way to NY1 and NY2, you would specify that LON1, LON2, LON3, NY1 and NY2 have the parent "Route1" like this in the hosts.cfg:-

# 'LON3' host definition
define host{
use generic-host

host_name LON3
parents Route1
alias Solaris/Apache webserver
address 192.168.1.13
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}

# 'LON2' host definition
define host{
use generic-host

host_name LON2
parents Route1
alias Solaris/Mail server
address 192.168.1.14
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}

Status Map

Also in the status map, you would probably like to have pretty icons for each of the hosts. Download and unpack imagepak-base.tar.gz(http://prdownloads.sourceforge.net/nagios/imagepak-base.tar.gz) and copy the contents to $NAGIOSHOME/share/images/logos Now, we need to tell nagios which icons to use for each host. In $NAGIOSHOME/etc/cgi.cfg you need to point to an external template file which will contain the definitions:-

xedtemplate_config_file=$NAGIOSHOME/etc/hostextinfo.cfg

and create that file, with the definitions for the hosts:-

define hostextinfo{
host_name LON2
2d_coords 40,40
icon_image sun40.png
icon_image_alt Solaris/Mail server
vrml_image sun40.png
statusmap_image sun40.gd2
}

where the *_image files are appropriately selected from those in $NAGIOSHOME/share/images/logos, though you must use a .gd2 file for the statusmap_image. The 2d_coords are where the icon should appear on the status map if you are using an option of the statusmap layout (set in $NAGIOSHOME/etc/cgi.cfg) that allows for specifying the location. It is a good idea to start out using the default layout 5 (Circular, Marked Up), which does not required co-ordinates to be set. You can modify the setting later (or not), when you have a better idea of where you want them placed.

Configure commands (commands.cfg)

This part is quite complex, so I've made the details a separate guide, here. However, basically what you need to do is to look in the $NAGIOSHOME/libexec directory to see what commands are there, check out the switches and flags (usually by running the command with a --help option) and configure the ones you want in $NAGIOSHOME/etc/checkcommands.cfg

Here is a basic example for the command to check whether a secure apache is running on a host:-

# 'check_apache' command definition
define command{
command_name check_apache
command_line $USER1$/check_https -H $HOSTADDRESS$
}
$USER1$ refers to a configuration in the $NAGIOSHOME/etc/resource.cfg file which usually (and in the frame of this installation guide) refers to the location of the executable checking commands/plugins. $HOSTADDRESS$ is the variable passed into the command denoting on which host that service should be checked.

Configure dependencies

Dependencies between services can be configured in $NAGIOSHOME/etc/dependencies.cfg For the moment, this will not be covered by this set of guidelines.

Configure escalations

Dependencies between services can be configured in $NAGIOSHOME/etc/escalations.cfg For the moment, this will not be covered by this set of guidelines.

Configure resources

The $NAGIOSHOME/etc/resource.cfg file is where some common variables and macros are defined. You can define up to 32 $USERx$ macros, which can in turn be used in command definitions in your host config file(s). $USERx$ macros are useful for storing sensitive information such as usernames, passwords, etc. They are also handy for specifying the path to plugins and event handlers - if you decide to move the plugins or event handlers to a different directory in the future, you can just update one or two $USERx$ macros, instead of modifying a lot of command definitions.

Most importantly, the CGIs will not attempt to read the contents of resource files, so you can set restrictive permissions (600 or 660) on them.

After installing nagios, the default resource.cfg-sample file is generally good enough to be used as resource.cfg, unless you have some fancy stuff to configure in.

NRPE Addon Configuration in Nagios

nrpe is the commonly used client application or agent that runs on the hosts to be monitored to gather local data which cannot (or is less logical to) be retrieved directly from the Nagios host.

Download a copy of nrpe-<your version>.tar.gz and untar somewhere sensible. Now build it:-

#./configure
#make all
#cp ./src/nrpe /usr/local/nagios
#cp ./src/check_nrpe /usr/local/nagios
#cp nrpe.cfg /usr/local/nagios

Add nrpe to the network services:-Edit /etc/services to add the following line:-

nrpe 5666/tcp # nrpe, nagios monitoring service

We have already installed the nagios plugins packages

Now Configure the checks:- Edit nrpe.cfg to configure locally and to add any checks to run on that host:-

allowed hosts=10.141.145.117command[check_data1]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /data1
command[check_data2]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /data2
command[check_mysql_5]=/usr/local/nagios/libexec/check_mysql_5 -H database.domain.uk -u nagios -p nagios -P 3309
command[check_mysql_4]=/usr/local/nagios/libexec/check_mysql_4 -H database.domain.uk -u nagios -p nagios -P 3306

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_home]=/usr/local/nagios/libexec/check_disk -w 10 -c 2 -p /home
command[check_root]=/usr/local/nagios/libexec/check_disk -w 10 -c 2 -p /
command[check_var]=/usr/local/nagios/libexec/check_disk -w 10 -c 2 -p /var
command[check_usr]=/usr/local/nagios/libexec/check_disk -w 10 -c 2 -p /usr

command[check_u01]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /u01
command[check_u02]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /u02
command[check_u03]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /u03
command[check_u04]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /u04

(above are example checks, obviously) Check nrpe responds from your main Nagios host:-

#/usr/local/nagios/libexec/check_nrpe -H machine.domain.uk -c check_root
#/home/nagios/libexec/check_nrpe -H machine.domain.uk -c check_root

And add services to your main Nagios host services.cfg:-

# Service definition
define service{
use nrpe-service
host_name dbdev2
service_description load
contact_groups engineers
check_command check_nrpe!check_load
}

# Service definition
define service{
use nrpe-service
host_name dbdev2
service_description /home
contact_groups engineers
check_command check_nrpe!check_home
}

...Then reload the nagios config on the Nagios host:-

#/etc/init.d/nagios reload

[* - if checking mysql, you might want to add a nagios user so you're not using real ones:-

grant select on test.* to nagios@'%' identified by 'nagios';
grant select on test.* to nagios@'dev8' identified by 'nagios';
grant select on test.* to nagios@'localhost' identified by 'nagios';]

Configure services (services.cfg)

This is a quite large part of the configuration. The basics are as follows.

In the file $NAGIOSHOME/etc/services.cfg, you need to specify which services are to be monitored for each host. This ranges from the basic ping to checking apache is running, SMTP is working etc. For each server, you must at least specify a ping service. The example I'll give is generic and based on the generic-service template which is supplied in the file services.cfg-sample (which must be included in services.cfg if you want to reference it).

# Service definition
define service{
use generic-service
host_name $HOST1,$HOST2,$HOST3...
service_description $SERVICE
is_volatile 0
check_period $PERIOD
max_check_attempts #n
normal_check_interval #n
retry_check_interval #n
contact_groups unix-admins
notification_interval #n
notification_period $PERIOD
notification_options w,u,c,r
check_command $COMMAND $ARGUMENTS
}

One thing to note... if you are probing the availability of machines/services which are not owned by you, it is probably best to set the normal_check_interval to a conservative time period, say 10 minutes. The interval_length is set in $NAGIOSHOME/etc/nagios.cfg, defaults to 60 (seconds). The check_interval is set in multiples of the normal_check_interval, so for 10 minutes, leave interval_length at the default and set normal_check_interval to 10.

Configure service groups (servicegroup.cfg only forNagios v2.0 or higher)

As with host groups, you can group services into logical clumps, specifying the host and service name for each service in the group:-

# 'Live Databases' service group definition
define servicegroup{
servicegroup_name live_db
alias Live Databases
members $HOST1,$SERVICE1,$HOST2,$SERVICE2,$HOST2,$SERVICE3,$HOST3,$SERVICE4,$HOST4,$SERVICE5
}

Service groups do not take contact_groups as a directive.

Configure mail alerts (misccommands.cfg)

This is specific to Solaris. The default setup of mail uses mail, which does not take -s under Solaris, so the subject lines of the alert emails will be blank. You need to use mailx. So, edit $NAGIOSHOME/etc/misccommands.cfg and find the lines:-

# 'notify-by-email' command definition
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios 1.0 *****\n\nNotifica
tion Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddr
ess: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $DATETIME$\n\nAdditional
Info:\n\n$OUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIA
S$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}

and change mail to mailx. Also in this section, you can configure what will appear on the subject line. Basically, just modify the section in quotes after mailx -s, using relevant variables for what you want to see.

Troubleshooting Nagios Configuration

If you have problems with the status map, histograms etc., then you do need to make sure that your libraries are linked as follows:-

crle -l /usr/lib:/usr/local/lib:/usr/local/ssl/lib:/opt/sfw/lib
(crle - configure runtime linking environment)

Remember, your system may be using libraries in other places in addition to these locations. Take care to include those if you need to.

Also, for problems with status map and histograms, check back to when you installed the GD, jpeg and png libraries. Did you install them in the correct order and did gd report jpeg and png support something like this:-

** Configuration summary for gd 2.0.33:

Support for PNG library: yes
Support for JPEG library: yes
Support for Freetype 2.x library: no
Support for Fontconfig library: no
Support for Xpm library: yes
Support for pthreads: yes

If not, you may need to re-visit your gd installation.

Start her up and see what happens

$NAGIOSHOME/bin/nagios start

Then point your browser at: http://yourserver/nagios/ and attempt to log in.