monit - Best process management tool

 There are several ways to manage processes on a Linux system. Most important processes are managed as daemon processes. However, since the daemon process is only a process whose parent process is init (pid = 1), the service provided by the process is terminated when abnormally terminated. Therefore, many IT system administrators monitor critical processes. For such monitoring, management tools such as Zabbix are sometimes used. Zabbix can monitor networks, servers, virtual machines and cloud services, and it can even automatically trigger an alarm when a number that exceeds the limits set by the administrator is detected. 

While managing these large systems is important, monitoring critical processes on smaller servers and automatically restarting them upon specified shutdown is equally important. monit is a great tool to do this. It is easy to install and very light compared to packages such as Zabbix. Note that monit only works on the server where the package is installed.


Install monit

You can install it with the yum command on Red Hat.

yum install monit


On Ubuntu or Debian, install with the apt or apt-get command.

apt-get install monit

Of course you need root privileges, so you have to use the sudo command as well.


Setting up monit

The file that configures the monit service is monitrc, and this file is in the /etc directory for Red Hat and /etc/monit for Debian.

Important settings in this file are as follows.

  • set daemon 120 : The process is checked every 120 seconds by default. If there are not many monitored processes, reduce this interval. I prefer 30 seconds.
  • set log /var/log/monit.log : Log file location. Unless there is a special case, this value is used as it is. If you want to use syslog, use "set syslog" instead.
  • pidfile /var/run/monit.pid : Uncomment this option to create a pid file.
  • set httpd : It is recommended to use it after uncommenting it as follows.


  • include : Used to include additional configuration files. 

<my CentOS ocnfiguration>

<my Ubuntu configuration>


And important process information to be managed is saved as an extension conf file in the directory set in include.

There are 3 conf files in the conf.d directory of my Ubuntu server.

root@ubuntusrv:/etc/monit/conf.d# ls -al
total 20
drwxr-xr-x 2 root root 4096 Jan  6  2021 .
drwxr-xr-x 6 root root 4096 Sep  1 16:10 ..
-rw-r--r-- 1 root root  242 Dec 30  2020 blueivr.monit.conf
-rw-r--r-- 1 root root  127 Jan  3  2021 node-red.monit.conf
-rw-r--r-- 1 root root  184 Jan  6  2021 unimrcp.monit.conf


Process Management

The following is how to manage individual processes directly using monit without using systemd's service management function.

Let's take a look at the unimrcp.monit.conf file among the above three conf files.

root@ubuntusrv:/etc/monit/conf.d# cat unimrcp.monit.conf
check process unimrcpserver
        matching "unimrcpserver"
        start program = "/usr/local/unimrcp/bin/unimrcpserver -d -r /usr/local/unimrcp"
        stop program = "/usr/bin/killall unimrcpserver"

The process name to check is unimrcpserver. The process to be checked is found with the name "unimrcpserver". This name, of course, must be readable with the ps command like this.

root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep unimrcpserver
root         898       1  0 Aug31 ?        00:05:30 /usr/local/unimrcp/bin/unimrcpserver -d -r /usr/local/unimrcp
root      274843  265618  0 16:22 pts/0    00:00:00 grep --color=auto unimrcpserver

If the process does not exist, start it with the command "/usr/local/unimrcp/bin/unimrcpserver -d -r /usr/local/unimrcp". If you want to kill the process, kill it with the /usr/bin/killall unimrcpserver command. Executing this command in the shell should terminate the process. Of course, after the configuration is finished, monit will run this process again after a while.

Check that the unimrcpserver process terminates with this command as follows.

root@ubuntusrv:/etc/monit/conf.d# /usr/bin/killall unimrcpserver
root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep unimrcpserver
root      278419  265618  0 16:32 pts/0    00:00:00 grep --color=auto unimrcpserver

Tips : It is not necessary to use only the killall command with the stop program. In a moment, we will also explain how to kill a process with the kill -9 pid command.


Now let's look at another conf file.

root@ubuntusrv:/etc/monit/conf.d# cat blueivr.monit.conf
check process blueivr
        with pidfile /usr/local/freeswitch/run/freeswitch.pid
        start program = "/usr/bin/blueivr.sh"
        stop program = "/bin/bash -c 'kill -9 `cat /usr/local/freeswitch/run/freeswitch.pid`'"

This time, "with pidfile" is used instead of "matching" keyword to determine whether the process is running or not.

In this case, the managed process automatically creates the pid file. Process id 913 is monitored.

root@ubuntusrv:/etc/monit/conf.d# cat /usr/local/freeswitch/run/freeswitch.pid
913

And if there is no process id 913 or the /usr/local/freeswitch/run/freeswitch.pid file cannot be found, the script file "/usr/bin/blueivr.sh" specified in "start program" is executed. Of course, the "/usr/bin/blueivr.sh" script must have execute permission. And process termination also uses the pid value. In this case, it is equivalent to "kill -9 913".


Manage python process

I often create daemon processes using python. Therefore, it is necessary to manage the Python process using monit.

It is easier to manage by creating a bash script as follows. 

#!/bin/bash

PIDFILE=/var/run/myprocess.pid

case $1 in
    start)
        /usr/bin/python3 /usr/local/src/myprocess.py
    ;;
    stop)
        kill -9 `cat ${PIDFILE}`
        rm ${PIDFILE}
    ;;
    *)
        echo "usage : myprocess (start|stop) ";;
esac
exit 0

The bash script above is an example of managing a Python program that leaves a pid file. Assume that the full path to this script is /usr/bin/myprocess.sh. Execute permission must be granted to this script file.

Now you can create a conf file like this:

check process myprocess
    with pidfile "/var/run/myprocess.pid"
    start program = "/usr/bin/myprocess.sh start"
    stop program = "/usr/bin/myprocess.sh stop"


create pid file in python

You can manage pid files simply by using python-pidfile. Another advantage of using a pid file is that it prevents duplicate execution of processes.

pip3 install python-pidfile

And at the beginning of the Python code, add the following code:

import pidfile
import time, sys

print('Starting process')
try:
    with pidfile.PIDFile("/var/run/myprocess.pid"):
        print('Process started')
        #your code Here
except pidfile.AlreadyRunningError:
    print('Already running.')
    sys.exit(1)

time.sleep(60)

print('Exiting')

There is one thing to note when using python-pidfile. As in the example above, you should always use the "with" syntax.


Finding Process ID

If three Python programs are running as follows, and you want to know the pid of python3 hello.py from among them.

root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep python
root         815       1  0 Aug31 ?        00:00:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root      289107  288956  0 16:59 pts/1    00:00:00 python3
root      290299  289403  0 17:02 pts/2    00:00:00 python3 hello.py
root      290332  265618  0 17:02 pts/0    00:00:00 grep --color=auto python

You can print only the pid with nested use of grep and the awk command as follows.

root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep python3|grep hello
root      290299  289403  0 17:02 pts/2    00:00:00 python3 hello.py
root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep python3|grep hello |awk '{print $2}'
290299

Now, if you apply the following to "stop program", you can terminate the Python program in monit.

stop program = "/bin/bash -c 'kill -9 `ps -ef|grep python3|grep hello |awk '{print $2}'`'"


File Monitoring

Recently, a program was created to monitor packets using a pcap library on a Linux system running on VM. 

However, the program was stopped very often. It was impossible to manage the process ID from monit because the PID was searched normally because the process was not terminated.

It is as if the program does not receive any CPU quata from the operating system.

The exact cause is unknown, but I had to restart the process somehow.

After much thought, my idea was to create a thread and log a simple log every 5 seconds. Therefore, if there is a difference between the time stamp of the log file and the current time for more than 5 seconds, it can be considered that the process is abnormal.

Monit can manage the process by checking the time of the file like this.

check file stt_alive_ens224.log with path /LOG/dcc/khive/stt_alive_ens224.log
   if timestamp > 1 minutes then alert
  # if timestamp > 1 minutes then exec "/usr/bin/monit restart myprocess" repeat every 5 cycles

An alert can be generated if the time in the /LOG/dcc/khive/stt_alive_ens224.log file is more than a minute different from the current time, as shown below.

If you want to restart the process, refer to the annotated part below.


Important monit commands

You can check the usage easily with the --help option.

root@ubuntusrv:/etc/monit/conf.d# monit --help
Usage: monit [options]+ [command]
Options are as follows:
 -c file       Use this control file
 -d n          Run as a daemon once per n seconds
 -g name       Set group name for monit commands
 -l logfile    Print log information to this file
 -p pidfile    Use this lock file in daemon mode
 -s statefile  Set the file monit should write state information to
 -I            Do not run in background (needed when run from init)
 --id          Print Monit's unique ID
 --resetid     Reset Monit's unique ID. Use with caution
 -B            Batch command line mode (do not output tables or colors)
 -t            Run syntax check for the control file
 -v            Verbose mode, work noisy (diagnostic output)
 -vv           Very verbose mode, same as -v plus log stacktrace on error
 -H [filename] Print SHA1 and MD5 hashes of the file or of stdin if the
               filename is omited; monit will exit afterwards
 -V            Print version number and patchlevel
 -h            Print this text
Optional commands are as follows:
 start all             - Start all services
 start <name>          - Only start the named service
 stop all              - Stop all services
 stop <name>           - Stop the named service
 restart all           - Stop and start all services
 restart <name>        - Only restart the named service
 monitor all           - Enable monitoring of all services
 monitor <name>        - Only enable monitoring of the named service
 unmonitor all         - Disable monitoring of all services
 unmonitor <name>      - Only disable monitoring of the named service
 reload                - Reinitialize monit
 status [name]         - Print full status information for service(s)
 summary [name]        - Print short status information for service(s)
 report [up|down|..]   - Report state of services. See manual for options
 quit                  - Kill the monit daemon process
 validate              - Check all services and start if not running
 procmatch <pattern>   - Test process matching pattern

Bold is the commands I use often.


monit status

Shows detailed information about service items registered in conf file.

root@ubuntusrv:/etc/monit/conf.d# monit status
Monit 5.26.0 uptime: 21h 50m

Process 'unimrcpserver'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  pid                          278465
  parent pid                   1
  uid                          0
  effective uid                0
  gid                          0
  uptime                       45m
  threads                      9
  children                     0
  cpu                          0.2%
  cpu total                    0.2%
  memory                       0.2% [3.8 MB]
  memory total                 0.2% [3.8 MB]
  security attribute           unconfined
  disk read                    0 B/s [184 kB total]
  data collected               Wed, 01 Sep 2021 17:17:55

Process 'nodered'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  pid                          60325
  parent pid                   1
  uid                          0
  effective uid                0
  gid                          0
  uptime                       10h 25m
  threads                      11
  children                     0
  cpu                          0.0%
  cpu total                    0.0%
  memory                       4.0% [78.3 MB]
  memory total                 4.0% [78.3 MB]
  security attribute           unconfined
  disk read                    0 B/s [488 kB total]
  data collected               Wed, 01 Sep 2021 17:17:55

Process 'blueivr'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  pid                          913
  parent pid                   1
  uid                          0
  effective uid                0
  gid                          0
  uptime                       21h 50m
  threads                      26
  children                     0
  cpu                          0.4%
  cpu total                    0.4%
  memory                       1.6% [31.1 MB]
  memory total                 1.6% [31.1 MB]
  security attribute           unconfined
  disk read                    0 B/s [18.2 MB total]
  disk write                   0 B/s [1.7 MB total]
  data collected               Wed, 01 Sep 2021 17:17:55

System 'ubuntusrv'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  load average                 [0.33] [0.32] [0.29]
  cpu                          0.0%us 0.4%sy 0.0%wa
  memory usage                 765.3 MB [39.1%]
  swap usage                   3.2 MB [0.1%]
  uptime                       108d 18h 22m
  boot time                    Sat, 15 May 2021 22:55:31
  data collected               Wed, 01 Sep 2021 17:17:55


monit summary

This is useful when simply checking whether a process is running or not. 



monit reload

If the conf configuration file is changed, the changed conf file is read again. After reload, it is recommended to restart the monit service that is applied with the changed conf file.

monit restart <name>


Wrapping up

In addition to the above, monit has a number of other functions, including the ability to monitor cpu and memory and send alerts when thresholds are crossed. Today, we learned a few things for process management. The website https://mmonit.com/monit/ introduces many features of monit. If you want to implement more functions, please refer to the contents of the homepage.





댓글

이 블로그의 인기 게시물

Connecting to SQL Server on Raspberry Pi

Making VoIP Phone Using Raspberry Pi

MQTT - Mosquitto MQTT Broker setup on the Ubuntu 20.04