monit - Best process management tool
There are several ways to manage processes on a Linux system. Most important processes are managed as daemon processes. However, since the daemon process is only a process whose parent process is init (pid = 1), the service provided by the process is terminated when abnormally terminated. Therefore, many IT system administrators monitor critical processes. For such monitoring, management tools such as Zabbix are sometimes used. Zabbix can monitor networks, servers, virtual machines and cloud services, and it can even automatically trigger an alarm when a number that exceeds the limits set by the administrator is detected.
While managing these large systems is important, monitoring critical processes on smaller servers and automatically restarting them upon specified shutdown is equally important. monit is a great tool to do this. It is easy to install and very light compared to packages such as Zabbix. Note that monit only works on the server where the package is installed.
Install monit
You can install it with the yum command on Red Hat.
yum install monit
On Ubuntu or Debian, install with the apt or apt-get command.
apt-get install monit
Of course you need root privileges, so you have to use the sudo command as well.
Setting up monit
The file that configures the monit service is monitrc, and this file is in the /etc directory for Red Hat and /etc/monit for Debian.
Important settings in this file are as follows.
- set daemon 120 : The process is checked every 120 seconds by default. If there are not many monitored processes, reduce this interval. I prefer 30 seconds.
- set log /var/log/monit.log : Log file location. Unless there is a special case, this value is used as it is. If you want to use syslog, use "set syslog" instead.
- pidfile /var/run/monit.pid : Uncomment this option to create a pid file.
- set httpd : It is recommended to use it after uncommenting it as follows.
- include : Used to include additional configuration files.
And important process information to be managed is saved as an extension conf file in the directory set in include.
There are 3 conf files in the conf.d directory of my Ubuntu server.
root@ubuntusrv:/etc/monit/conf.d# ls -al total 20 drwxr-xr-x 2 root root 4096 Jan 6 2021 . drwxr-xr-x 6 root root 4096 Sep 1 16:10 .. -rw-r--r-- 1 root root 242 Dec 30 2020 blueivr.monit.conf -rw-r--r-- 1 root root 127 Jan 3 2021 node-red.monit.conf -rw-r--r-- 1 root root 184 Jan 6 2021 unimrcp.monit.conf
Process Management
The following is how to manage individual processes directly using monit without using systemd's service management function.
Let's take a look at the unimrcp.monit.conf file among the above three conf files.
root@ubuntusrv:/etc/monit/conf.d# cat unimrcp.monit.conf check process unimrcpserver matching "unimrcpserver" start program = "/usr/local/unimrcp/bin/unimrcpserver -d -r /usr/local/unimrcp" stop program = "/usr/bin/killall unimrcpserver"
The process name to check is unimrcpserver. The process to be checked is found with the name "unimrcpserver". This name, of course, must be readable with the ps command like this.
root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep unimrcpserver
root 898 1 0 Aug31 ? 00:05:30 /usr/local/unimrcp/bin/unimrcpserver -d -r /usr/local/unimrcp
root 274843 265618 0 16:22 pts/0 00:00:00 grep --color=auto unimrcpserver
If the process does not exist, start it with the command "/usr/local/unimrcp/bin/unimrcpserver -d -r /usr/local/unimrcp". If you want to kill the process, kill it with the /usr/bin/killall unimrcpserver command. Executing this command in the shell should terminate the process. Of course, after the configuration is finished, monit will run this process again after a while.
Check that the unimrcpserver process terminates with this command as follows.
root@ubuntusrv:/etc/monit/conf.d# /usr/bin/killall unimrcpserver
root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep unimrcpserver
root 278419 265618 0 16:32 pts/0 00:00:00 grep --color=auto unimrcpserver
Tips : It is not necessary to use only the killall command with the stop program. In a moment, we will also explain how to kill a process with the kill -9 pid command.
Now let's look at another conf file.
root@ubuntusrv:/etc/monit/conf.d# cat blueivr.monit.conf check process blueivr with pidfile /usr/local/freeswitch/run/freeswitch.pid start program = "/usr/bin/blueivr.sh" stop program = "/bin/bash -c 'kill -9 `cat /usr/local/freeswitch/run/freeswitch.pid`'"
This time, "with pidfile" is used instead of "matching" keyword to determine whether the process is running or not.
In this case, the managed process automatically creates the pid file. Process id 913 is monitored.
root@ubuntusrv:/etc/monit/conf.d# cat /usr/local/freeswitch/run/freeswitch.pid
913
And if there is no process id 913 or the /usr/local/freeswitch/run/freeswitch.pid file cannot be found, the script file "/usr/bin/blueivr.sh" specified in "start program" is executed. Of course, the "/usr/bin/blueivr.sh" script must have execute permission. And process termination also uses the pid value. In this case, it is equivalent to "kill -9 913".
Manage python process
I often create daemon processes using python. Therefore, it is necessary to manage the Python process using monit.
It is easier to manage by creating a bash script as follows.
#!/bin/bash PIDFILE=/var/run/myprocess.pid case $1 in start) /usr/bin/python3 /usr/local/src/myprocess.py ;; stop) kill -9 `cat ${PIDFILE}` rm ${PIDFILE} ;; *) echo "usage : myprocess (start|stop) ";; esac exit 0
The bash script above is an example of managing a Python program that leaves a pid file. Assume that the full path to this script is /usr/bin/myprocess.sh. Execute permission must be granted to this script file.
Now you can create a conf file like this:
check process myprocess with pidfile "/var/run/myprocess.pid" start program = "/usr/bin/myprocess.sh start" stop program = "/usr/bin/myprocess.sh stop"
create pid file in python
You can manage pid files simply by using python-pidfile. Another advantage of using a pid file is that it prevents duplicate execution of processes.
pip3 install python-pidfile
And at the beginning of the Python code, add the following code:
import pidfile import time, sys print('Starting process') try: with pidfile.PIDFile("/var/run/myprocess.pid"): print('Process started') #your code Here except pidfile.AlreadyRunningError: print('Already running.') sys.exit(1) time.sleep(60) print('Exiting')
There is one thing to note when using python-pidfile. As in the example above, you should always use the "with" syntax.
Finding Process ID
If three Python programs are running as follows, and you want to know the pid of python3 hello.py from among them.
root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep python
root 815 1 0 Aug31 ? 00:00:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 289107 288956 0 16:59 pts/1 00:00:00 python3
root 290299 289403 0 17:02 pts/2 00:00:00 python3 hello.py
root 290332 265618 0 17:02 pts/0 00:00:00 grep --color=auto python
You can print only the pid with nested use of grep and the awk command as follows.
root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep python3|grep hello root 290299 289403 0 17:02 pts/2 00:00:00 python3 hello.py root@ubuntusrv:/etc/monit/conf.d# ps -ef|grep python3|grep hello |awk '{print $2}' 290299
Now, if you apply the following to "stop program", you can terminate the Python program in monit.
stop program = "/bin/bash -c 'kill -9 `ps -ef|grep python3|grep hello |awk '{print $2}'`'"
File Monitoring
Recently, a program was created to monitor packets using a pcap library on a Linux system running on VM.
However, the program was stopped very often. It was impossible to manage the process ID from monit because the PID was searched normally because the process was not terminated.
It is as if the program does not receive any CPU quata from the operating system.
The exact cause is unknown, but I had to restart the process somehow.
After much thought, my idea was to create a thread and log a simple log every 5 seconds. Therefore, if there is a difference between the time stamp of the log file and the current time for more than 5 seconds, it can be considered that the process is abnormal.
Monit can manage the process by checking the time of the file like this.
check file stt_alive_ens224.log with path /LOG/dcc/khive/stt_alive_ens224.log if timestamp > 1 minutes then alert # if timestamp > 1 minutes then exec "/usr/bin/monit restart myprocess" repeat every 5 cycles
An alert can be generated if the time in the /LOG/dcc/khive/stt_alive_ens224.log file is more than a minute different from the current time, as shown below.
If you want to restart the process, refer to the annotated part below.
Important monit commands
You can check the usage easily with the --help option.
root@ubuntusrv:/etc/monit/conf.d# monit --help Usage: monit [options]+ [command] Options are as follows: -c file Use this control file -d n Run as a daemon once per n seconds -g name Set group name for monit commands -l logfile Print log information to this file -p pidfile Use this lock file in daemon mode -s statefile Set the file monit should write state information to -I Do not run in background (needed when run from init) --id Print Monit's unique ID --resetid Reset Monit's unique ID. Use with caution -B Batch command line mode (do not output tables or colors) -t Run syntax check for the control file -v Verbose mode, work noisy (diagnostic output) -vv Very verbose mode, same as -v plus log stacktrace on error -H [filename] Print SHA1 and MD5 hashes of the file or of stdin if the filename is omited; monit will exit afterwards -V Print version number and patchlevel -h Print this text Optional commands are as follows: start all - Start all services start <name> - Only start the named service stop all - Stop all services stop <name> - Stop the named service restart all - Stop and start all services restart <name> - Only restart the named service monitor all - Enable monitoring of all services monitor <name> - Only enable monitoring of the named service unmonitor all - Disable monitoring of all services unmonitor <name> - Only disable monitoring of the named service reload - Reinitialize monit status [name] - Print full status information for service(s) summary [name] - Print short status information for service(s) report [up|down|..] - Report state of services. See manual for options quit - Kill the monit daemon process validate - Check all services and start if not running procmatch <pattern> - Test process matching pattern
Bold is the commands I use often.
monit status
Shows detailed information about service items registered in conf file.
root@ubuntusrv:/etc/monit/conf.d# monit status Monit 5.26.0 uptime: 21h 50m Process 'unimrcpserver' status OK monitoring status Monitored monitoring mode active on reboot start pid 278465 parent pid 1 uid 0 effective uid 0 gid 0 uptime 45m threads 9 children 0 cpu 0.2% cpu total 0.2% memory 0.2% [3.8 MB] memory total 0.2% [3.8 MB] security attribute unconfined disk read 0 B/s [184 kB total] data collected Wed, 01 Sep 2021 17:17:55 Process 'nodered' status OK monitoring status Monitored monitoring mode active on reboot start pid 60325 parent pid 1 uid 0 effective uid 0 gid 0 uptime 10h 25m threads 11 children 0 cpu 0.0% cpu total 0.0% memory 4.0% [78.3 MB] memory total 4.0% [78.3 MB] security attribute unconfined disk read 0 B/s [488 kB total] data collected Wed, 01 Sep 2021 17:17:55 Process 'blueivr' status OK monitoring status Monitored monitoring mode active on reboot start pid 913 parent pid 1 uid 0 effective uid 0 gid 0 uptime 21h 50m threads 26 children 0 cpu 0.4% cpu total 0.4% memory 1.6% [31.1 MB] memory total 1.6% [31.1 MB] security attribute unconfined disk read 0 B/s [18.2 MB total] disk write 0 B/s [1.7 MB total] data collected Wed, 01 Sep 2021 17:17:55 System 'ubuntusrv' status OK monitoring status Monitored monitoring mode active on reboot start load average [0.33] [0.32] [0.29] cpu 0.0%us 0.4%sy 0.0%wa memory usage 765.3 MB [39.1%] swap usage 3.2 MB [0.1%] uptime 108d 18h 22m boot time Sat, 15 May 2021 22:55:31 data collected Wed, 01 Sep 2021 17:17:55
monit summary
This is useful when simply checking whether a process is running or not.
monit reload
If the conf configuration file is changed, the changed conf file is read again. After reload, it is recommended to restart the monit service that is applied with the changed conf file.
monit restart <name>
Wrapping up
In addition to the above, monit has a number of other functions, including the ability to monitor cpu and memory and send alerts when thresholds are crossed. Today, we learned a few things for process management. The website https://mmonit.com/monit/ introduces many features of monit. If you want to implement more functions, please refer to the contents of the homepage.
댓글
댓글 쓰기