Contents
Svmonitor is simple helper that lets you run some tests on your systems and set up actions if they fail.
For example, you could use it to check that some of your important programs are still running, or to regularly check that you haven't run out of disk space...
Svmonitor is available both as a Pacman package (x86_64 only), or in source form. For source or Pacman installation, please refer to the Generic download and install instructions.
Svmonitor runs tests. Tests are just commands that can fail or succeed. Upon failure or success, the program takes actions, like sending email or shutting down your server.
Tests and actions are setup in a global configuration file at /etc/svmonitor-config/svmonitor.conf. Each test and action gets referred to using a simple name.
Suppose you have set up a test named check-disk-usage that checks if you have enough disk space. Then to run it you would just run:
$ runtest check-disk-usage
This will take care of running the actual commands and taking the configured actions in case of success or failure.
Actions can be configured to run everytime the test fail, or e.g only once after the first failure.
Note that svmonitor does not schedule the tests: this can be done using another tool such as fcron.
The configuration file is in /etc/svmonitor-config/svmonitor.conf. It describes two things: the tests and the actions mapped to those tests.
The file's syntax is very simple (INI like). Sections, whose names are between brackets, contain variables, each of which is given line by line using the varname = value format. Any line beginning with a # is considered a comment and isn't parsed.
Here is an example file:
[action_mail] action_command = testaction-sendmail root@localhost [test_fail] test_command = false failure_actions = mail [test_fail-once] test_command = false failure_actions_once = mail [test_fail-once-2] test_command = cat /home/sebastien/tmp/svmonitor.test failure_actions_once = mail [test_fail-only-once] test_command = false failure_actions_once_only = mail [test_success] test_command = true success_actions = mail
This file defines five tests, named fail, fail-once, fail-once-2, fail-only-once and success, and one action, named mail.
Formally, the file can describe any number of actions and tests, each of which in its own section.
Each action should be in its own section. If the action's name is $name, it should be in a section named action_$name.
An action can be seen as a command taking three arguments: the error code returned by the test, the test's name, and a path to a file providing more details on the test's result. This command is specified by the action_command variable.
The testaction-sendmail command from the example is provided by the svmonitor package for convenience. Such actions are documented in Implemented actions.
Each test should be in its own section. If the test's name is $name, it should be in a section named test_$name.
Each test is described by seven variables:
To run a test, simply call runtest $testname, where $testname is the name of your test. Two other modes of operation are possible:
svmonitor must keep track of e.g the previous results of the tests it ran. Those results are stored in /var/lib/svmonitor. This directory should be owned by root:svmonitor, and its permission bits should be 1770. Users that want to use svmonitor must be members of the svmonitor group.
For convenience, some common actions commands are implemented in the svmonitor package. They are documented below.
This sends a mail to the email addresses given as first argument, comma-separated.
For convenience, some common test commands are implemented in the svmonitor package. They are documented below.
This can be used to check that enough disk space is available. The command exits with 0 status if and only if all partitions have enough disk space.
The definition of enough disk space must be provided by the user. Three methods can be used:
This is the simplest one: diskspace-check exits with an error if the percentage of available disk space (with respect to the total space) is lower than some threshold. To use that method, just do:
$ diskspace-check $minpercent
for example, for a threshold of 5%, you will run diskspace-check 5.
This is less restrictive than "pure" minimum percentage: the program will exit with an error if the minimum percentage criteria is satisfied and the available space is lower than some threshold. This is useful when working with large drives, e.g assume you have a 1TB drive: if it is 95% full, there is still 50GB available, which is a lot.
To use that method, run:
$ diskspace-check $minpercent $minsize
The size is assumed to be in bytes by default, but you can use all the standard multipliers, e.g 1KB is 1000 bytes, 1Kb is 1000 bits and 1KiB is 1024 bytes.
You can also specify your own program to do the tests: the program should take as argument the used space and total space, in bits and return with 0 status if and only if enough disk space is available. To use that method, run:
$ diskspace-check $progpath
Where $progpath is the path to your program.
Some partitions can of course be excluded from the tests. There are two useful options to do that:
This can be used to check that a given webpage is responding correctly, and hasn't changed. The script uses wget for downloading the page. It takes two arguments:
In addition, one option, --wget-opts, can be used to specify additionnal space-separated options to pass to wget.
This is used to check that some programs are running on your system. The script takes as arguments the name of the programs that should be running, and compare each of them to the last column of a ps -ef output. If the two begin the same way, the program is considered as running, otherwise the test exits with an error.