2016-12-22

During a Nagios monitoring implementation, we often need to depend NRPE plugins and custom commands to execute server monitoring tasks such as load monitoring, disk usage monitoring etc. on remote servers. While majority of the disk checks can be performed through simple tweaking of the existing commands, Raid disk health evaluation demands some advanced level of operations due to the architecture and raid controller differences with each RAID setup.

This article is to highlight the steps to be followed to add raid check for servers using the `MegaCli` utility. I assume that you already configured a Nagios server for server monitoring using NRPE plugin and are familiar with its working. Here we are focusing our discussion only on the configuration of RAID check.

Before delving into how to add the check, lets first look at what MegaCli is. MegaCLI is a command line interface (CLI) binary used to communicate with the full LSI family of raid controllers.

For a complete reference either call MegaCli -h or refer to the manual at: http://www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/3rd-party/lsi/mrsas/userguide/LSI_MR_SAS_SW_UG.pdf

Now let us move to the step by step instructions to enable RAID Monitoring on Nagios.

Step 1:

Login to the client server  or the server to be monitored as root user

Step 2:

Before moving forward, verify the path to MegaCli. You can do that by issuing the command

As you already knew, the binary paths can vary according to the installations. If for some reason the path to the binary is different like /usr/sbin/MegaCli etc, then modify the script and commands below by replacing all instances of /sbin/MegaCli with the correct path to binary.

The below instructions are to be read only if megacli is not found. Otherwise, skip to Step 3

For Centos Machines, you may get an error like below.

This doesn’t necessarily mean that  MegaCli is absent. The path and name to access the utility might be  different.  In CentOs machines, the binary is installed at /opt/MegaRAID/MegaCli/MegaCli64

Try executing the below command

If you see the output as above, it means the binary is present. The reason the command does not show up without full path is because the path to the binary is not included in the users PATH variable.

PATH is an environmental variable in Linux and other Unix-like operating systems that tells the shell which directories to search for executable files (i.e., ready-to-run programs) in response to commands issued by a user.

If this is the case, do the step below.

For easy access, lets create an alias for the command with name megacli and add it to .bashrc to make the change permanent.

Execute the commands below.

The bash built-in command “source” executes the content of the file /root/.bashrc and loads the variables to the current shell. So you can continue with your current session.

Now verify the binary

If you see the version details, proceed to the next step

Step 3:

Create a new file check_raid at /usr/local/nagios/libexec Add the following code to the file check_raid

Do change the binary location in accordance with your installation and OS. For eg. in case of a CentOS server, replace /sbin/MegaCli as /opt/MegaRAID/MegaCli/MegaCli64 in the above script as it is the correct path to the Binary in Centos distributions.

Give the script execute permission by issuing

Step 4:

Now we have to assign a command for this task to /usr/local/nagios/etc/nrpe.cfg

To do this, add the following line to the end of file /usr/local/nagios/etc/nrpe.cfg

If you are not comfortable with direct editing of configuration files, you can perform it using the following commands

This is because, when we communicate from the nagios server, we will be calling up this command from the server which we are monitoring. While this happens, the client server executes the associated command and returns the output.

Step 5:

Now test if the script is running correctly by the following command.

Step 6:

Now open the file /etc/sudoers and add the following lines to the bottom of the file:

a) If the system is running Debian

Editing the configurations files are always a risky shot. So the best way for this operation using the editor visudo .

Similarly you can execute the below command to get the same result as well

b) If it is a CentOS server, add the following code

or can use the following command

Also, if a line ‘Defaults requiretty‘ is present in /etc/sudoers, you must comment out the “Defaults requiretty” line as follows:

EasyWay: execute the below command

As you know, nrpe checks the commands as user nagios. The check we did above returned the output as RAID OK because the command was executed as root user.

When we check at the client server, the query returns output but when checked from monitoring server, it will return error like ‘NRPE: Unable to read output‘. This is because we overlooked what user the command is executed as and if they have privilege to issue the command. The above lines allow the user nagios access to the commands /bin/bash and MegaCli. This is required because the nagios user is created with shell /sbin/nologin and MegaCli by default is a command which only root user has access to.

Step 7:

At this point, we have created a script to check Raid Status, we have configured a command in nrpe referencing it and have allowed the permissions required for the user nagios to execute the script. Now restart nrpe issuing the following command.

Step 8:

Now login to monitoring server and issue the following command to check if its working

Be sure to replace the IP aaa.bbb.ccc.ddd with the client IP.

Step 9:

If the results are fine, then move ahead and add the check to the configuration file of the script. In our servers, locate the cfg file of the server under  /usr/local/nagios/etc/objects/clients/ and add the following entries.

Be sure to replace the hostname and contact group if you are pasting the above snippet. You can also open the .cfg file of the client server and copy one of the service checks once again and just modify the service_description and check_command as above. The rest of the fields will be the same for all service checks within a cfg file.

Step 10

Now restart nagios server for the changes made to reflect.

Now logon to Nagios Web Interface and verify that the check is reflecting correctly there <img src="https://s.w.org/images/core/emoji/2/72x72/1f642.png" alt="

Show more