In these days I have seen in a somewhat more detailed way collectd, an excellent tool for collecting statistics on various aspects of our Linux servers.
From Wikipedia: “collectd is a UNIX-daemon which collects, transfers and stores performance data of computers and network equipment. The acquired data is meant to help system administrators maintain an overview over available resources in order to detect existing or looming bottlenecks.
The first version of the daemon was written in 2005 by Florian Forster and has been further developed as free open-source project. Other developers have written improvements and extensions to the software that have been incorporated into the project. Most files of the source code are licensed under the terms of the GNU General Public License, version 2 (GPLv2), the remaining files are licensed under other open source licenses”
There are other free, open source projects that are similar to collectd – a few links are listed at the end of the article. So why should you use collectd? There are some key differences of collectd. For one, it’s written in C for performance and portability, allowing it to run on systems without scripting language or cron daemon, such as embedded systems. At the same time it includes optimizations and features to handle hundreds of thousands of data sets. It comes with over 90 plugins which range from standard cases to very specialized and advanced topics. It provides powerful networking features and is extensible in numerous ways. Last but not least: collectd is actively developed and supported and well documented.
Everything in collectd is done in plugins. Well, except parsing the configfile. This means that the main daemon doesn’t have any external dependencies and should run on nearly anything that has heard of POSIX. The daemon has been reported as working on Linux, Solaris, Mac OS X, FreeBSD, NetBSD, and OpenBSD. It’s likely that other UNIX® flavors work to some extend, too.
collectd’s configuration is kept as easy as possible: Besides which modules to load you don’t need to configure anything else, but you can customize the daemon to your liking if you want.
Build to scale
collectd is able to handle any number of hosts, from one to several hundred (or possibly thousand, but no one has reported that yet). This is achieved by utilizing the resources as efficient as possible, e. g. by merging multiple RRD-updates into one update operation, merging the biggest possible number of values into each one network packet and so on. The multithreaded layout allows for multiple plugins to be queried simultaneously – without running into problems due to IO-latencies.
The Simple Network Management Protocol (SNMP) is in widespread use with various network equipment, for example switches, routers, rack monitoring systems, thermometers, UPSes, and so on. The SNMP plugin provides a generic interface to the SNM-protocol which you can use to query values and dispatch them over collectd’s mechanisms, e. g. transmit them to a server instance somewhere else.
Integration with monitoring solutions
With version 4.3 the concept of notifications and thresholds has been added to collectd. This allows you to send notifications through the daemon and allows for simple threshold checking. However, collectd is not a monitoring solution. We will probably add some features to make the notification system more usable, but at the moment collectd is no match for a sophisticated monitoring solution.
To make it possible to integrate collectd into the popular monitoring solution Nagios, a “check” has been written for that. It’s called collectd-nagios and allows you to use Nagios to monitor if certain values have been collected and if they were in an appropriate range.
The CPU plugin collects the amount of time spent by the CPU in various states, most notably executing user code, executing system code, waiting for IO-operations and being idle.
This since has been an FAQ: The CPU plugin does not collect percentages. It collects “jiffies”, the units of scheduling. On many Linux systems there are circa 100 jiffies in one second, but this does not mean you will end up with a percentage.
The Apache plugin queries the page generated by mod_status, the status module of the Apache web server, parses it and submits the number of bytes transfered, the number of requests received, and the number of processes in the various states of the scoreboard.
The Memory plugin collects physical memory utilization.
The values are reported by their use by the operating system. Under Linux, the categories are:
The GenericJMX plugin reads Managed Beans (MBeans) from an MBeanServer using JMX. The plugin is written in Java and requires the Java plugin to function.
The Java Management Extensions (JMX) is a generic framework to provide and query various management information. The interface is used by the Java Virtual Machine (JVM) to provide information about the memory used, threads and so on. These basic performance values can therefore be collected for every Java process without any support in the Java process itself.
The MySQL plugin connects to an MySQL-database and issues a SHOW STATUS command periodically. The command returns the server status variables, many of which are collected. The plugin has successfully been tested with the MySQL versions 4 and 5.
Main site of Collectd
The following is a list of projects similar to collectd and a short note on how they differ from collectd. Projects that focus on monitoring and do some performance measurement on the side are not on this list.
Focus on compute clusters and basic system statistics.
Data is collected by forking / executing plugins (i. e. scripts).
- eLuna Graph System
Written in Perl; relies on cron; local system only.