Linux HA using keepalived


HA Configuration

keepalived is a package that can configure HA of 2 servers using virtual IP. Keepalived allows you to easily duplicate your web server, application server, etc.
Web Server, Application Server. These services respond to requests from external users.
Articles on HA configurations for these inbound services are easy to find. A simple service configuration is as follows






Keepalived Install

From now on, I'll use IP "192.168.126.91" for the first node, "192.168.126.92" for the second node.And "192.168.126.90" will be the virtual IP.

Both computer 192.168.126.91, 192.168.126.92 must be the same OS.(I'm using CentOS 7)
First install keepalived package and enable the service on both computers.


shell>yum install keepalived
shell>systemctl enable keepalived 


Basic Keepalived Configuration

Next configure both computers' keepalived.conf(/etc/keepalived/keepalived.conf).


Node 1 (192.168.126.91)



global_defs {
    router_id FREESW
}

vrrp_instance VI_FREESW {
    state BACKUP
    interface ens33
    virtual_router_id 51
    # higher is preferred for master
    # disable to have failover be sticky
    priority 110
    advert_int 1
    nopreempt
    authentication { 
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.126.90 dev ens33
    }
}


Important parameters areas follows.
  • interface : ens33 is interface name. So use your machine's interface name instead.
  • priority : higher number means higher priority. So Node's priority is higher than Node2.
  • virtual_router_id : you must use the same name for 2 computers, and used as VRID of VRRP(Virtual Router Redundancy Protocol) between 0 and 255 
  • state : MASTER or BACKUP. Even though you set BACKUP, there's no MASTER when the keepalived service is started, the computer will be a MASTER.
  • advert_int : period to send ADVERTISEMENT, in seconds. A decimal point can also be specified. If the backup node fails to receive VRRP advertisements for a period longer than three times of the advertisement timer, the backup node takes the master state and assigns the virtual IP to itself.
  • authentication : routers participating in VRRP(2 computers) must all have the same authentication settings. Keepalived 1.2 or above(VRRP version 2 or 3) can work without this option.
  • virtual_ipaddress : VIP(virtual ip address) that the MASTER will get. 

The router_id value in global_defs must be used in vrrp_instance name (VI_XXXXXX)



Node2 (192.168.126.92)



global_defs {
    router_id FREESW
}

vrrp_instance VI_FREESW {
    state BACKUP
    interface ens33
    virtual_router_id 51
    # higher is preferred for master
    # disable to have failover be sticky
    priority 105
    advert_int 1
    authentication { 
        auth_type PASS
        auth_pass 1111
    }

    virtual_ipaddress {
        192.168.126.90 dev ens33
    }
}



Then restart keepalived service for both computers.


[root@localhost ~]# service keepalived restart

Now test the virtual IP.


I restart the Node 2 keepalived service first. So the Node 2 get the virtual IP(192.168.126.90). Now the service request to 192.168.126.90 will go to the Node 2.
Then I stop the Node2's ens33 interface by ifdown command. So the Node2's network will be down. Next I check the Node1's IP address. Node 1 get the virtual IP successfully.
So the service request to 192.168.126.90 will go to the Node 1 from now on.



Advanced Keepalived Configuration

In addition to handing over virtual IP due to network problems, for other reasons, you may need to deliberately pass in a virtual IP to the BACKUP.
For example, if you find that the MASTER have run out of available disk space, or if you find certain processes behaving abnormally, You may want topass the virtual IP to another computer.

Check Script

This content is from https://tobru.ch/keepalived-check-and-notify-scripts/.
A check script is a script written in the language of your choice which is executed regularly. This script needs to have a return value: 0 for "everything is fine", 1 (or other than 0) for "something went wrong".
This value is used by Keepalived to take action. Scripts are defined like this:


vrrp_script chk_myscript {
  script       "/usr/local/bin/mycheckscript.py"
  interval 2   # check every 2 seconds
  fall 2       # require 2 failures for KO
  rise 2       # require 2 successes for OK
}

As you can see in the example it's possible to specify the interval in seconds and also how many times the script needs to succeed or fail until any action is taken.

The script can check anything you want. Here are some ideas:
  • Is the daemon X running?
  • Is the daemon X run normally?
  • Is there enough disk space available to run my application?

This script definition can now be used in a vrrp_instance: As soon as the track_script returns another code than 0 two times, the VRRP instance will change the state to FAULT, removes the IP 192.168.126.90 from ens33 and stops sending multicast VRRP packets.


Notify Scripts

A notify script can be used to take other actions, not only removing or adding an IP to an interface.
It can f.e. start or stop a daemon, depending on the VRRP state. And this is how it's defined in the Keepalived configuration:


vrrp_instance MyVRRPInstance {
 [...]
 notify /usr/local/bin/keepalivednotify.py
}

The script is called after any state change with the following parameters:

  • $1 = "GROUP" or "INSTANCE"
  • $2 = name of group or instance
  • $3 = target state of transition ("MASTER", "BACKUP", "FAULT")
Third parameter is the most important one. I'll show you the notify python file soon.



Next configure both computers' keepalived.conf for using the script file.

Node 1 (192.168.126.91)



global_defs {
    router_id FREESW
}

vrrp_script chk_myscript {
  script       "/usr/local/bin/mycheckscript.py"
  interval 2   # check every 2 seconds
  fall 2       # require 2 failures for KO
  rise 2       # require 2 successes for OK
}

vrrp_instance VI_FREESW {
    state BACKUP
    interface ens33
    virtual_router_id 51
    # higher is preferred for master
    # disable to have failover be sticky
    priority 110
    advert_int 1
    nopreempt
    authentication { 
        auth_type PASS
        auth_pass 1111
    }
    notify       "/usr/local/bin/keepalivedscript.py" 
    virtual_ipaddress {
        192.168.126.90 dev ens33
    }
    {
        track_script {
        chk_myscript
    }

}


Node 2 (192.168.126.92)



global_defs {
    router_id FREESW
}
vrrp_script chk_myscript {
  script       "/usr/local/bin/mycheckscript.py"
  interval 2   # check every 2 seconds
  fall 2       # require 2 failures for KO
  rise 2       # require 2 successes for OK
}
vrrp_instance VI_FREESW {
    state BACKUP
    interface ens33
    virtual_router_id 51
    # higher is preferred for master
    # disable to have failover be sticky
    priority 105
    advert_int 1
    authentication { 
        auth_type PASS
        auth_pass 1111
    }
    notify       "/usr/local/bin/keepalivedscript.py"
    virtual_ipaddress {
        192.168.126.90 dev ens33
    }
    {
        track_script {
        chk_myscript
    }    
}



Script files python sample


You can use any script(sh, pl, py, ...) you are familiar with.
Make these files at /usr/local/ha/ and grant execution authority.


chmod 755 /usr/local/ha/*.py

mycheckscript.py 

This simple python script check memory usages, and return 1 when the usage rate is larger than 90%. So if the memory usage > 90%, then the MASTER will abandon the virtual IP.


import os
import sys

tot_m, used_m, free_m = map(int, os.popen('free -t -m').readlines()[-1].split()[1:])
print("Total Mem:%d, Used:%d, Free:%d"%(tot_m, used_m, free_m))
print( float(used_m) / float(tot_m) )
ret = 0
if int(tot_m) == 0:
    ret = 1    # Error
if ( float(used_m) / float(tot_m)) > 0.9:
    ret = 1    # Error
sys.exit(ret)

keepalivedscript.py



import os
import sys


# this always >= 1
length = len(sys.argv)
print('sys.argv length : ', length)  
'''
for x in range(length):
    print('sys.argv[%d] :%s '%(x, sys.argv[x]))  
'''
if(length != 4):
    sys.exit(0)
    
if(sys.argv[3] == "MASTER"):
    #Do your new master job
    pass
elif(sys.argv[3] == "BACKUP"):
    #Do your new backup job
    pass 


Trouble Shooting

You might have a terrible experience of owning the same VIP on 2 servers. Mostly, this phenomenon is related to firewall settings. The keepalive service uses the VRRP (Virtual Router Redundancy Protocol) for two or more servers to verify each other's existence. However, if the firewall restricts vrrp access, it is not possible to check whether the server is alive or not.

Therefore, if your server uses the iptables service, it allows access to the vrrp protocol as follows.

sudo iptables -I INPUT -p vrrp -j ACCEPT
sudo iptables-save

If you are using firewalld or Ubuntu's ufw, you can find a manual and apply it referring to the case of iptables.


Wrapping up

I made HA system that take over the virtual IP when the other's interface is down or the other's memory usage is too high. You can make your own conditions using your favorite script language.
Notify script is also very usefull when you need a master start up process or clean up process.









댓글

이 블로그의 인기 게시물

Connecting to SQL Server on Raspberry Pi

Making VoIP Phone Using Raspberry Pi

MQTT - Mosquitto MQTT Broker setup on the Ubuntu 20.04