You can check the exit status of a program or a script. This test may only be used within a program service entry in the Monit control file.
check program myscript with path "/usr/local/bin/myscript.sh"
if status != 0 then alert
Monit will execute the program periodically and if the exit status of the program does not match the expected result, Monit can perform an action. In the example above, Monit will raise an alert if the exit value of myscript is different from 0. By convention, 0 means the program exited normally.
Program checks are asynchronous. Meaning that Monit will not wait for the program to exit, but instead, Monit will start the program in the background and immediately continue checking the next service entry in monitrc. At the next cycle, Monit will check if the program has finished and if so, collect the programs exit status - if the status indicate a failure, Monit will raise an alert message containing the program's error (stderr) output, if any. If the program has not exited after the first cycle, Monit will wait another cycle and so on. If the program is still running after 5 minutes, Monit will kill it and generate a program timeout event. It is possible to override the default timeout (see the syntax below).
The asynchronous nature of the program check allows for non-blocking behavior in the current Monit design, but it comes with a side-effect: when the program has finished executing and is waiting for Monit to collect the result, it becomes a so-called "zombie" process. A zombie process does not consume any system resources (only the PID remains in use) and it is under Monit's control; The zombie process is removed from the system as soon as Monit collects the exit status. This means that every "check program" will be associated with either a running process or a temporary zombie. This unwanted zombie side-effect will be removed in a later release of Monit.
The syntax of the program status statement is:
IF STATUS operator value [TIMEOUT <N> SECONDS] [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
operator is a choice of "<",">","!=","==" in c notation, "gt", "lt", "eq", "ne" in shell sh notation and "greater", "less", "equal", "notequal" in human readable form (if not specified, default is EQUAL).
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
I've turned off the old xinetd-based management and replaced it with monit running directly from /etc/inittab and set to manage Git using this init.d script. If a connection to port 9418 is refused the daemon will be restarted. Checks happen every 30 seconds.
I'm reopening this one. It still happens from time to time. Monitoring on port 9418 is not enough; I think I'll need to use a script that attempts to perform a successful clone of a repo, and if that fails, be prepared to kill -9 the dead git-daemon process which is holding onto the port but not serving traffic.