This is the second and last part of my article about building a distributed monitoring solution with Nagios, you can find part 1 here
Now you know all you need to know to set up service checks on the slaves and send information from the slaves to the master.
A benefit of a master/slave configuration is the ability to centrally configure all the Nagios nodes, both master and slaves. There are many ways to do this.
One of my favorite ways to manage distributed Nagios configuration is to use a version control system (VCS) such as Subversion. In this setup you store all the configurations under the VCS (which is a good practice anyway, to keep your configuration file with a version number and a change history). The various Nagios sites each have their own directories where they can put their files; I suggest a setup like this:
/etc/nagios/conf.d/ master/ site1/ site2/ ...
In this way the people in charge of each site can manage only their files and commit them to the main repository once they’re done. You can also add an hook to this operation to update all the other sites.
This configuration could work but is really hard to maintain if many people work on it, and you can encounter problems with templates and names that overlap. To keep a solution like this working you need to enforce a strong configuration policy, such as requiring the use of the the fully qualified domain name of the server as a prefix for every check name.
Another approach is to use a configuration tool that can manage multiple Nagios installations, such asNagiosQL, opcfg, or Nconf. I’ve personally tested and used NagiosQL (full disclosure: I also help the project with the Italian translation). With it (or with the other projects) you can configure templates, checks, and services for all your Nagios installation from a single point. NagiosQL supports FTP and SCP to copy files remotely, and keeps all the configurations in a MySQL database.
Performance and Privacy
Now you’re in business, running slaves that report information back to your master. You might think, “I should display all the performance graphs on the master so I can check all the information there.” I had this idea too, but usually it will lead to a disaster. Your master is the machine that aggregates all the checks, so in most cases displaying the graphs there too will slow down the machine and make it useless for its real job of reporting and notification. Instead, I suggest you keep the performance tools on the slaves, because every peripheral Nagios will have less work than the master. This way you won’t have to worry about sending performance information to the master. But I know, you really want all the information there. You can do that, but to do it right you should think about things like disk optimization, using a RAM disk, and parallel processes – topics beyond the scope of this article.
As I mentioned earlier, having multiple Nagios installations can also help you also with the privacy. You can define contacts on the master who can see only the web front end information coming from different slaves. Alternatively, you could define a contact on a slave who would be able to see only the services defined there.
Setting up a distributed monitoring solution with Nagios, and using tools and plugins to make your task easier, can give you many benefits, but you need an accurate planning and strict policy guidelines for the staff that manages the configurations. I suggest using a graphical configuration tool such as NagiosQL that supports templates; using templates makes it easier to keep things in order.