Keepalived軟件主要是通過VRRP協(xié)議實(shí)現(xiàn)高可用功能的捌归。VRRP是Virtual Router RedundancyProtocol(虛擬路由器冗余協(xié)議
)的縮寫碌奉,VRRP出現(xiàn)的目的就是為了解決靜態(tài)路由單點(diǎn)故障問題的耕蝉,它能夠保證當(dāng)個(gè)別節(jié)點(diǎn)宕機(jī)時(shí),整個(gè)網(wǎng)絡(luò)可以不間斷地運(yùn)行蜜氨。
Keepalived的功能介紹
1.管理LVS軟件
2.基于VRRP實(shí)現(xiàn)高可用
3.健康檢查、故障切換
工作原理
想要了解工作原理就必須先了解一定的網(wǎng)絡(luò)知識(shí)
Keepalived工作在TCP/IP協(xié)議的IP層蚜迅、TCP層、應(yīng)用層俊抵,既Layer 3/4/5;
Layer3
:當(dāng)Keepalived工作在這層時(shí)谁不,它會(huì)定期向服務(wù)器群中的服務(wù)器發(fā)送ICMP包,如果發(fā)現(xiàn)某臺(tái)服務(wù)器IP沒有激活就會(huì)報(bào)告這臺(tái)服務(wù)器失效徽诲,并且將其從服務(wù)器群剔除刹帕。Layer3的是以服務(wù)器IP地址是否有效作為判斷是否存活的標(biāo)準(zhǔn);
Layer4
:當(dāng)工作在這層時(shí)谎替,主要是以TCP端口狀態(tài)來判斷服務(wù)器工作是否正常偷溺;
Layer5
:當(dāng)工作在這層時(shí),主要是以用戶設(shè)定的服務(wù)運(yùn)行是否正常來判斷是否存活钱贯;
Keepalived高可用主要是通過VRRP進(jìn)行通信的挫掏,VRRP是通過競(jìng)選機(jī)制來確定主備的,主的優(yōu)先級(jí)高于備秩命。所以正常工作時(shí)尉共,主會(huì)優(yōu)先提供服務(wù),備處于等待階段弃锐,只有當(dāng)主出現(xiàn)異常袄友,備才會(huì)接管主的任務(wù)向外提供服務(wù)。
在Keepalived服務(wù)器群之間霹菊,只有作為主的服務(wù)器不斷發(fā)送VRRP廣播包杠河,告訴備它還活著,此時(shí)備不會(huì)搶占主浇辜,只有當(dāng)主不可用券敌,既備接受不到主的VRRP廣播包,這時(shí)候備就會(huì)啟動(dòng)相關(guān)的服務(wù)接管主的任務(wù)向外提供服務(wù)柳洋,以保證服務(wù)的正常使用待诅。
具體安裝應(yīng)用
安裝命令
yum install keepalived -y
查看安裝后包文件
rpm -ql keepalived
/etc/keepalived
/etc/keepalived/keepalived.conf 主配置文件
/etc/sysconfig/keepalived
/usr/bin/genhash
/usr/lib/systemd/system/keepalived.service
/usr/libexec/keepalived
/usr/sbin/keepalived
了解配置文件中的一些相關(guān)參數(shù)/etc/keepalived/keepalived.conf
global_defs {
notification_email { # 定義郵件地址
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc # 定義發(fā)送郵件地址
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 { # 定義實(shí)例
state MASTER # 狀態(tài)參數(shù):MASTER/BACKUP只是說明
interface eth0 # 虛擬IP放置的網(wǎng)卡地址
virtual_router_id 51 # 設(shè)置集群ID,同一個(gè)組的ID要一致
priority 100 # 優(yōu)先級(jí)設(shè)置熊镣,數(shù)值越大優(yōu)先級(jí)越高
advert_int 1 # 主備通訊時(shí)間間隔
authentication { # 驗(yàn)證相關(guān)
auth_type PASS
auth_pass 1111
}
virtual_ipaddress { # 虛擬IP地址
192.168.200.16
192.168.200.17
192.168.200.18
}
}
應(yīng)用到主從架構(gòu)中
Keepalived的主備配置文件的主要區(qū)別有:
router_id 不一致
state 描述信息不一致
priority 優(yōu)先級(jí)不一致
virtual_router_id 51 這個(gè)參數(shù)必須一致
1.Master配置
! Configuration File for keepalived
global_defs {
router_id lb01
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 51
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.169.200
}
}
2.Slave配置
! Configuration File for keepalived
global_defs {
router_id lb02
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 140
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.169.200
}
}
3.啟動(dòng)
systemctl start keepalived
4.查看VIP是否啟動(dòng)
[root@localhost keepalived]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:24:d1:b5 brd ff:ff:ff:ff:ff:ff
inet 192.168.169.131/24 brd 192.168.169.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.169.200/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::fd8:8531:b2e2:c6bb/64 scope link noprefixroute
valid_lft forever preferred_lft forever
5.測(cè)試關(guān)閉master上的keepalived,查看VIP是否漂移到slave卑雁。
# 關(guān)閉主master
systemctl stop keepalived
# 查看備上的IP信息
[root@localhost keepalived] ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:d9:69:08 brd ff:ff:ff:ff:ff:ff
inet 192.168.169.130/24 brd 192.168.169.255 scope global dynamic ens33
valid_lft 1464sec preferred_lft 1464sec
inet 192.168.169.200/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::51f1:7ad1:f554:65cc/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:88:bb:94:f6 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
預(yù)防腦裂
腦裂介紹
在高可用系統(tǒng)中,如果兩個(gè)節(jié)點(diǎn)的心跳線斷開绪囱,本來兩個(gè)節(jié)點(diǎn)為一個(gè)整體测蹲、動(dòng)作協(xié)調(diào)的一個(gè)HA系統(tǒng),現(xiàn)在由于兩個(gè)之間的心跳線斷開導(dǎo)致它們分裂成了兩個(gè)單獨(dú)的個(gè)體鬼吵。由于雙方互相失去了聯(lián)系扣甲,都會(huì)以為對(duì)方出了故障。
這時(shí)候這兩個(gè)單獨(dú)的個(gè)體就像"腦裂人"一樣互相爭(zhēng)搶共享資源、爭(zhēng)用應(yīng)用服務(wù)琉挖,這樣就會(huì)造成嚴(yán)重問題:
共享資源被瓜分启泣,兩邊服務(wù)都起不來;
兩邊服務(wù)都起來了示辈,同時(shí)提供服務(wù)寥茫,同時(shí)讀寫存儲(chǔ),導(dǎo)致數(shù)據(jù)不一致甚至損壞矾麻。
產(chǎn)生腦裂的原因
一般來說纱耻,腦裂的發(fā)生,有以下幾種原因:
HA服務(wù)器之間心跳線故障险耀,導(dǎo)致無(wú)法正常通信膝迎;
HA服務(wù)器上開啟了防火墻,阻擋了心跳線的信息傳輸胰耗;
HA服務(wù)器上心跳網(wǎng)卡配置不正確限次,導(dǎo)致心跳信息發(fā)送失敗柴灯;
其他服務(wù)器配置不當(dāng)?shù)脑蚵袈1热缧奶绞讲煌奶鴱V播沖突赠群,軟件BUG等羊始;
Keepalived配置里同一 VRRP實(shí)例中如果 virtual_router_id兩端參數(shù)配置不一致也會(huì)導(dǎo)致裂腦問題發(fā)生。
常見的解決辦法
在實(shí)際環(huán)境中查描,我們可以從以下幾個(gè)方面來防止腦裂的問題:
1.同時(shí)使用串行線路或者以太網(wǎng)電纜連接突委,同時(shí)使用兩條心跳線路,如果一條壞了冬三,另外一條還能正常提供服務(wù)匀油;
2.當(dāng)檢測(cè)到腦裂時(shí)強(qiáng)行關(guān)閉一個(gè)節(jié)點(diǎn)(該功能需要特殊設(shè)備支持,如Stonith,feyce)勾笆,相當(dāng)于備節(jié)點(diǎn)接受不到心跳心跳消患敌蚜,通過單獨(dú)的線路發(fā)送關(guān)機(jī)命令關(guān)閉主節(jié)點(diǎn)的電源;
3.做好腦裂監(jiān)控報(bào)警(用zabbix等來監(jiān)控)窝爪,在問題發(fā)生時(shí)能在第一時(shí)間介入仲裁弛车,降低損失。
4.啟動(dòng)磁盤鎖蒲每。正在服務(wù)一方鎖住共享磁盤纷跛,裂腦發(fā)生時(shí),讓對(duì)方完全搶不走共享磁盤資源邀杏。但使用鎖磁盤也會(huì)有一個(gè)不小的問題贫奠,如果占用共享盤的一方不主動(dòng)解鎖,另一方就永遠(yuǎn)得不到共享磁盤。現(xiàn)實(shí)中假如服務(wù)節(jié)點(diǎn)突然死機(jī)或崩潰叮阅,就不可能執(zhí)行解鎖命令刁品。后備節(jié)點(diǎn)也就接管不了共享資源和應(yīng)用服務(wù)泣特。于是有人在HA中設(shè)計(jì)了智能鎖浩姥。即:正在服務(wù)的一方只在發(fā)現(xiàn)心跳線全部斷開(察覺不到對(duì)端)時(shí)才啟用磁盤鎖,平時(shí)就不上鎖了状您;
5.加入仲裁機(jī)制勒叠。例如設(shè)置網(wǎng)關(guān)IP,當(dāng)腦裂發(fā)生時(shí)膏孟,兩個(gè)節(jié)點(diǎn)都各自ping以下這個(gè)網(wǎng)關(guān)IP眯分,不通則表明斷點(diǎn)就在本端,不僅心跳柒桑、還兼對(duì)外服務(wù)的本端網(wǎng)絡(luò)鏈路斷了弊决,即使啟動(dòng)(或繼續(xù))應(yīng)用服務(wù)也沒有用了,那就主動(dòng)放棄競(jìng)爭(zhēng)魁淳,讓能夠ping通網(wǎng)關(guān)IP的一端去起服務(wù)飘诗。更保險(xiǎn)一些,ping不通網(wǎng)關(guān)IP的一方干脆就自我重啟界逛,以徹底釋放有可能還占用著的那些共享資源昆稿。
Keepalived監(jiān)控nginx防止腦裂
執(zhí)行腳本,用來檢測(cè)
vim check_keepalived.sh
#!/bin/bash
NGINX_SBIN=`which nginx`
NGINX_PORT=80
function check_nginx(){
NGINX_STATUS=`nmap localhost -p ${NGINX_PORT} | grep "80/tcp open" | awk '{print $2}'`
NGINX_PROCESS=`ps -ef | grep nginx|grep -v grep|wc -l`
}
check_nginx
if [ "$NGINX_STATUS" != "open" -o $NGINX_PROCESS -lt 2 ]
then
${NGINX_SBIN} -s stop
${NGINX_SBIN}
sleep 3
check_nginx
if [ "$NGINX_STATUS" != "open" -o $NGINX_PROCESS -lt 2 ];then
systemctl stop keepalived
fi
fi
添加執(zhí)行權(quán)限
chmod +x check_keepalived.sh
配置keepalived
Master
! Configuration File for keepalived
global_defs {
router_id lb01
}
# 定義腳本
vrrp_script check_ng {
script "/etc/keepalived/check_keepalived.sh" # 腳本路徑
interval 2 # 執(zhí)行時(shí)間間隔
weight -5 # 計(jì)算權(quán)重值息拜,腳本結(jié)果導(dǎo)致的優(yōu)先級(jí)變更溉潭,檢測(cè)失敗(腳本返回非0)則優(yōu)先級(jí) -5
fall 3 # 檢測(cè)連續(xù)3次失敗才算確定是真失敗少欺。會(huì)用weight減少優(yōu)先級(jí)(1-255之間)
rise 2 # 檢測(cè)2次成功就算成功喳瓣。但不修改優(yōu)先級(jí)
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 51
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.169.200
}
# 調(diào)用腳本
track_script {
check_ng
}
}
slave:
! Configuration File for keepalived
global_defs {
router_id lb02
}
vrrp_script check_ng {
script "/etc/keepalived/check_keepalived.sh"
interval 2
weight -5
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 147
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.169.200
}
track_script {
check_ng
}
}
Keepalived設(shè)置master故障恢復(fù)后不重新?lián)尰豓IP
master配置
! Configuration File for keepalived
global_defs {
router_id lb01
}
vrrp_script check_ng {
script "/etc/keepalived/check_keepalived.sh"
interval 2
weight -5
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP # 主上也設(shè)置為備
interface ens33
virtual_router_id 51
priority 150
advert_int 1
nopreempt # 設(shè)置為不搶奪VIP
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.169.200
}
track_script {
check_ng
}
}
slave配置
! Configuration File for keepalived
global_defs {
router_id lb02
}
vrrp_script check_ng {
script "/etc/keepalived/check_keepalived.sh"
interval 2
weight -5
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 149
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
virtual_ipaddress {
192.168.169.200
}
track_script {
check_ng
}
}