Linux HA using keepalived
HA Configuration
keepalived is a package that can configure HA of 2 servers using virtual IP. Keepalived allows you to easily duplicate your web server, application server, etc.
Web Server, Application Server. These services respond to requests from external users.
Articles on HA configurations for these inbound services are easy to find. A simple service configuration is as follows
Both computer 192.168.126.91, 192.168.126.92 must be the same OS.(I'm using CentOS 7)
First install keepalived package and enable the service on both computers.
Important parameters areas follows.
The router_id value in global_defs must be used in vrrp_instance name (VI_XXXXXX)
Then restart keepalived service for both computers.
Now test the virtual IP.
I restart the Node 2 keepalived service first. So the Node 2 get the virtual IP(192.168.126.90). Now the service request to 192.168.126.90 will go to the Node 2.
Then I stop the Node2's ens33 interface by ifdown command. So the Node2's network will be down. Next I check the Node1's IP address. Node 1 get the virtual IP successfully.
So the service request to 192.168.126.90 will go to the Node 1 from now on.
For example, if you find that the MASTER have run out of available disk space, or if you find certain processes behaving abnormally, You may want topass the virtual IP to another computer.
A check script is a script written in the language of your choice which is executed regularly. This script needs to have a return value: 0 for "everything is fine", 1 (or other than 0) for "something went wrong".
This value is used by Keepalived to take action. Scripts are defined like this:
As you can see in the example it's possible to specify the interval in seconds and also how many times the script needs to succeed or fail until any action is taken.
The script can check anything you want. Here are some ideas:
This script definition can now be used in a vrrp_instance: As soon as the track_script returns another code than 0 two times, the VRRP instance will change the state to FAULT, removes the IP 192.168.126.90 from ens33 and stops sending multicast VRRP packets.
It can f.e. start or stop a daemon, depending on the VRRP state. And this is how it's defined in the Keepalived configuration:
The script is called after any state change with the following parameters:
Next configure both computers' keepalived.conf for using the script file.
You can use any script(sh, pl, py, ...) you are familiar with.
Make these files at /usr/local/ha/ and grant execution authority.
Notify script is also very usefull when you need a master start up process or clean up process.
Web Server, Application Server. These services respond to requests from external users.
Articles on HA configurations for these inbound services are easy to find. A simple service configuration is as follows
Keepalived Install
From now on, I'll use IP "192.168.126.91" for the first node, "192.168.126.92" for the second node.And "192.168.126.90" will be the virtual IP.Both computer 192.168.126.91, 192.168.126.92 must be the same OS.(I'm using CentOS 7)
First install keepalived package and enable the service on both computers.
shell>yum install keepalived
shell>systemctl enable keepalived
Basic Keepalived Configuration
Next configure both computers' keepalived.conf(/etc/keepalived/keepalived.conf).Node 1 (192.168.126.91)
global_defs { router_id FREESW } vrrp_instance VI_FREESW { state BACKUP interface ens33 virtual_router_id 51 # higher is preferred for master # disable to have failover be sticky priority 110 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.126.90 dev ens33 } }
Important parameters areas follows.
- interface : ens33 is interface name. So use your machine's interface name instead.
- priority : higher number means higher priority. So Node's priority is higher than Node2.
- virtual_router_id : you must use the same name for 2 computers, and used as VRID of VRRP(Virtual Router Redundancy Protocol) between 0 and 255
- state : MASTER or BACKUP. Even though you set BACKUP, there's no MASTER when the keepalived service is started, the computer will be a MASTER.
- advert_int : period to send ADVERTISEMENT, in seconds. A decimal point can also be specified. If the backup node fails to receive VRRP advertisements for a period longer than three times of the advertisement timer, the backup node takes the master state and assigns the virtual IP to itself.
- authentication : routers participating in VRRP(2 computers) must all have the same authentication settings. Keepalived 1.2 or above(VRRP version 2 or 3) can work without this option.
- virtual_ipaddress : VIP(virtual ip address) that the MASTER will get.
The router_id value in global_defs must be used in vrrp_instance name (VI_XXXXXX)
Node2 (192.168.126.92)
global_defs { router_id FREESW } vrrp_instance VI_FREESW { state BACKUP interface ens33 virtual_router_id 51 # higher is preferred for master # disable to have failover be sticky priority 105 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.126.90 dev ens33 } }
Then restart keepalived service for both computers.
[root@localhost ~]# service keepalived restart
Now test the virtual IP.
I restart the Node 2 keepalived service first. So the Node 2 get the virtual IP(192.168.126.90). Now the service request to 192.168.126.90 will go to the Node 2.
Then I stop the Node2's ens33 interface by ifdown command. So the Node2's network will be down. Next I check the Node1's IP address. Node 1 get the virtual IP successfully.
So the service request to 192.168.126.90 will go to the Node 1 from now on.
Advanced Keepalived Configuration
In addition to handing over virtual IP due to network problems, for other reasons, you may need to deliberately pass in a virtual IP to the BACKUP.For example, if you find that the MASTER have run out of available disk space, or if you find certain processes behaving abnormally, You may want topass the virtual IP to another computer.
Check Script
This content is from https://tobru.ch/keepalived-check-and-notify-scripts/.A check script is a script written in the language of your choice which is executed regularly. This script needs to have a return value: 0 for "everything is fine", 1 (or other than 0) for "something went wrong".
This value is used by Keepalived to take action. Scripts are defined like this:
vrrp_script chk_myscript { script "/usr/local/bin/mycheckscript.py" interval 2 # check every 2 seconds fall 2 # require 2 failures for KO rise 2 # require 2 successes for OK }
As you can see in the example it's possible to specify the interval in seconds and also how many times the script needs to succeed or fail until any action is taken.
The script can check anything you want. Here are some ideas:
- Is the daemon X running?
- Is the daemon X run normally?
- Is there enough disk space available to run my application?
This script definition can now be used in a vrrp_instance: As soon as the track_script returns another code than 0 two times, the VRRP instance will change the state to FAULT, removes the IP 192.168.126.90 from ens33 and stops sending multicast VRRP packets.
Notify Scripts
A notify script can be used to take other actions, not only removing or adding an IP to an interface.It can f.e. start or stop a daemon, depending on the VRRP state. And this is how it's defined in the Keepalived configuration:
vrrp_instance MyVRRPInstance { [...] notify /usr/local/bin/keepalivednotify.py }
The script is called after any state change with the following parameters:
- $1 = "GROUP" or "INSTANCE"
- $2 = name of group or instance
- $3 = target state of transition ("MASTER", "BACKUP", "FAULT")
Next configure both computers' keepalived.conf for using the script file.
Node 1 (192.168.126.91)
global_defs { router_id FREESW } vrrp_script chk_myscript { script "/usr/local/bin/mycheckscript.py" interval 2 # check every 2 seconds fall 2 # require 2 failures for KO rise 2 # require 2 successes for OK } vrrp_instance VI_FREESW { state BACKUP interface ens33 virtual_router_id 51 # higher is preferred for master # disable to have failover be sticky priority 110 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 }
notify "/usr/local/bin/keepalivedscript.py" virtual_ipaddress { 192.168.126.90 dev ens33 } { track_script { chk_myscript } }
Node 2 (192.168.126.92)
global_defs { router_id FREESW } vrrp_script chk_myscript { script "/usr/local/bin/mycheckscript.py" interval 2 # check every 2 seconds fall 2 # require 2 failures for KO rise 2 # require 2 successes for OK } vrrp_instance VI_FREESW { state BACKUP interface ens33 virtual_router_id 51 # higher is preferred for master # disable to have failover be sticky priority 105 advert_int 1 authentication { auth_type PASS auth_pass 1111 } notify "/usr/local/bin/keepalivedscript.py" virtual_ipaddress { 192.168.126.90 dev ens33 } { track_script { chk_myscript } }
Script files python sample
You can use any script(sh, pl, py, ...) you are familiar with.
Make these files at /usr/local/ha/ and grant execution authority.
chmod 755 /usr/local/ha/*.py
mycheckscript.py
This simple python script check memory usages, and return 1 when the usage rate is larger than 90%. So if the memory usage > 90%, then the MASTER will abandon the virtual IP.import os import sys tot_m, used_m, free_m = map(int, os.popen('free -t -m').readlines()[-1].split()[1:]) print("Total Mem:%d, Used:%d, Free:%d"%(tot_m, used_m, free_m)) print( float(used_m) / float(tot_m) ) ret = 0 if int(tot_m) == 0: ret = 1 # Error if ( float(used_m) / float(tot_m)) > 0.9: ret = 1 # Error sys.exit(ret)
keepalivedscript.py
import os import sys # this always >= 1 length = len(sys.argv) print('sys.argv length : ', length) ''' for x in range(length): print('sys.argv[%d] :%s '%(x, sys.argv[x])) ''' if(length != 4): sys.exit(0) if(sys.argv[3] == "MASTER"): #Do your new master job pass elif(sys.argv[3] == "BACKUP"): #Do your new backup job pass
Trouble Shooting
You might have a terrible experience of owning the same VIP on 2 servers. Mostly, this phenomenon is related to firewall settings. The keepalive service uses the VRRP (Virtual Router Redundancy Protocol) for two or more servers to verify each other's existence. However, if the firewall restricts vrrp access, it is not possible to check whether the server is alive or not.
Therefore, if your server uses the iptables service, it allows access to the vrrp protocol as follows.
sudo iptables -I INPUT -p vrrp -j ACCEPT sudo iptables-save
If you are using firewalld or Ubuntu's ufw, you can find a manual and apply it referring to the case of iptables.
Wrapping up
I made HA system that take over the virtual IP when the other's interface is down or the other's memory usage is too high. You can make your own conditions using your favorite script language.Notify script is also very usefull when you need a master start up process or clean up process.
댓글
댓글 쓰기