一. 簡介
在項(xiàng)目驗(yàn)收階段, 單點(diǎn)Redis的問題被提出來, 參照諸位大神的博客(本文最下方), 最后確定使用keepalived實(shí)現(xiàn)redis高可用, 現(xiàn)將其記錄下來, 大致思路如下:
- 本項(xiàng)目前端部分使用PHP包Predis, redis主要用于保存Cache和Session, Predis包操作redis集群不支持del操作(實(shí)操報(bào)錯(cuò)), 遂考慮搭建雙機(jī)熱備Redis環(huán)境, 使用keepalived實(shí)現(xiàn)主備故障轉(zhuǎn)移, 以及vip漂移, 環(huán)境如下:
初始backup1 : 192.168.203.129
初始Backup2 : 192.168.203.130
vip: 192.168.203.240
不了解keepalived vrrp的同學(xué)可以參照如下博客:
http://outofmemory.cn/wiki/keepalived-configuration
http://hugnew.com/?p=745keepalived安裝后兩臺機(jī)器初始配置狀態(tài)都是 BACKUP , 優(yōu)先級都設(shè)置為100 , 分別啟動(dòng)兩臺機(jī)器的redis和keepalived, 最初時(shí), 兩個(gè)機(jī)器都是BACKUP狀態(tài), 最先啟動(dòng)keepalived機(jī)器由于路由組中只有自己一臺機(jī)器, 會被推舉成為master節(jié)點(diǎn)(自己推舉自己成為master), 隨后啟動(dòng)keepalived的機(jī)器由于優(yōu)先級和前一臺機(jī)器一致, 所以會成為backup節(jié)點(diǎn)
keepalived優(yōu)先級有效范圍為0-255(博客上都這么說, 原因未知), 超過255會被轉(zhuǎn)成100
啟動(dòng)keepalived前, 需要先啟動(dòng)兩臺機(jī)器的redis, 并且配置主從, 搭建方法參考上一篇博客http://www.reibang.com/p/acd3281d9074, 配置主從:
# 在backup2上配置redis成為backup1 redis 的slave
./bin/redis-cli SLAVEOF 192.168.203.129 6379
- 其中backup1需要先啟動(dòng)keepalived成為主節(jié)點(diǎn), 搶占vip, (通過
ip a
查看網(wǎng)卡ip)
二. 安裝keepalived
yum -y install keepalived
- 配置keepalived, 主節(jié)點(diǎn)和從節(jié)點(diǎn)都是BACKUP
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id redis
}
# keepalived監(jiān)控腳本
vrrp_script chk_redis {
#keepalived 健康檢測執(zhí)行腳本
script /usr/local/redis/keepalived/scripts/redis-check.sh
#每隔幾秒發(fā)一次健康檢測請求
interval 2
#自我確定健康異常, 優(yōu)先級加多少, priority += weight
#當(dāng)腳本返回非0, 則認(rèn)為健康異常, 優(yōu)先級 + -10
weight -10
#檢測失敗幾次,認(rèn)為是redis 服務(wù)器掛了
fall 3
}
# 實(shí)例
vrrp_instance redis {
# 主和從都是BACKUP
state BACKUP
#eth0 表示監(jiān)聽哪塊兒網(wǎng)卡
interface eth0
#主從一致
virtual_router_id 51
#優(yōu)先級, backup1和backup2都設(shè)為100, 重要, 關(guān)系到vip漂移問題
priority 100
# 發(fā)送vrrp通告間隔, 對比優(yōu)先級
advert_int 1
virtual_ipaddress {
#虛擬的ip 是多少
192.168.203.240
}
# 健康檢查腳本
track_script {
chk_redis
}
#keepalived 內(nèi)部通信,本機(jī)ip 地址
unicast_src_ip 192.168.203.129
unicast_peer {
#指定其它keepalived 地址,如果這個(gè)不指定,可能出現(xiàn),主從都虛擬出了192.168.203.240 ip地址
192.168.203.130
}
#keepalived 被推選為主服務(wù)器器時(shí)執(zhí)行的腳本
notify_master /usr/local/redis/keepalived/scripts/redis-master.sh
#keepalived 被降級為從服務(wù)器時(shí)執(zhí)行的腳本
notify_backup /usr/local/redis/keepalived/scripts/redis-backup.sh
#keepalived 運(yùn)行出現(xiàn)錯(cuò)誤的時(shí)候執(zhí)行的腳本
notify_fault /usr/local/redis/keepalived/scripts/redis-fault.sh
#keepalived 服務(wù)停止時(shí)執(zhí)行腳本
notify_stop /usr/local/redis/keepalived/scripts/redis-stop.sh
}
- backup2 keepalived配置和backup1區(qū)別如下:
vrrp_instance redis {
unicast_src_ip 192.168.203.130
unicast_peer {
192.168.203.129
}
}
三. 創(chuàng)建監(jiān)控腳本
除redis-backup.sh外, 其他腳本在backup1 和2 上都保持一致
- 創(chuàng)建腳本保存目錄, 以及日志保存目錄
mkdir -p /usr/local/redis/keepalived/scripts
mkdir -p /usr/local/redis/keepalived/logs
- redis-check.sh 腳本: 用于檢測redis 服務(wù)健康狀態(tài)
vim /usr/local/redis/keepalived/scripts/redis-check.sh
#!/bin/bash
#日志文件位置
logFile=/usr/local/redis/keepalived/logs/check.log
#ping 本機(jī)redis服務(wù)
pingRS=`/usr/local/redis/bin/redis-cli PING`
#如果ping 的結(jié)果為PONG,那么返回0 ,否則返回1
if [ "$pingRS"x == "PONG"x ] ; then
exit 0
else
echo "[`date`] ping is error !" >> $logFile
exit 1
fi
- redis-master.sh 腳本: keepalived 被推選為主服務(wù)器時(shí)執(zhí)行
vim /usr/local/redis/keepalived/scripts/redis-master.sh
#!/bin/bash
# redis-cli 命令絕對路徑
cliCmd=/usr/local/redis/bin/redis-cli
# keepalived 日志文件位置
logFile=/usr/local/redis/keepalived/logs/master.log
echo "`[date]` master " >> $logFile
# 成為主節(jié)點(diǎn)則redis需要取消復(fù)制, 也成為主節(jié)點(diǎn)
$cliCmd SLAVEOF NO ONE &>>$logFile
- redis-backup.sh 腳本: keepalived 被降級為從服務(wù)器時(shí)執(zhí)行
backup1 和 backup2 中$cliCmd SLAVEOF 192.168.203.130 6379
不一致, 需要修改
vim /usr/local/redis/keepalived/scripts/redis-backup.sh
#!/bin/bash
#日志文件
logFile=/usr/local/redis/keepalived/logs/backup.log
# redis-cli 命令絕對路徑
cliCmd=/usr/local/redis/bin/redis-cli
echo "[`date`] begin to slave ..." >> $logFile
# 成為從節(jié)點(diǎn), 需要檢測redis是否啟動(dòng), 沒有啟動(dòng), 則啟動(dòng)redis
service redis-server start
# 設(shè)置0,5s睡眠, 重要, 不要大 , 也不要小,
# 太小, 下一步設(shè)置主從不成功(原因還未知)
# 太大, 如5s, 在5s期間, 節(jié)點(diǎn)會由backup轉(zhuǎn)成master, 5s后才會執(zhí)行slaveof, redis出現(xiàn)問題
sleep 0.5
# 設(shè)置主從關(guān)系
# keepalived成為從節(jié)點(diǎn)后, redis需要后成為兄弟節(jié)點(diǎn)的從節(jié)點(diǎn)
# backup2 此行為$cliCmd SLAVEOF 192.168.203.129 6379
$cliCmd SLAVEOF 192.168.203.130 6379
echo "[`date`] slave done !" $ >> $logFile
- redis-fault.sh 腳本: keepalived 執(zhí)行出現(xiàn)錯(cuò)誤時(shí)執(zhí)行
vim /usr/local/redis/keepalived/scripts/redis-fault.sh
#!/bin/bash
#Desc keepalived 發(fā)生錯(cuò)誤時(shí)執(zhí)行腳本
# keepalived 日志文件位置
logFile=/usr/local/redis/keepalived/logs/fault.log
# 向日志輸出錯(cuò)誤信息
echo "[$(date)] ***** redis falut ***" >> $logFile
- redis-stop.sh 腳本: keepavlied 服務(wù)停止時(shí)執(zhí)行
vim /usr/local/redis/keepalived/scripts/redis-stop.sh
#!/bin/bash
#Desc keepalived 停止時(shí)執(zhí)行腳本
#日志文件
logFile=/usr/local/redis/keepalived/logs/stop.log
#輸出日志信息
echo "[`date`] stop ..." >> $logFile
- 重要: 給腳本添加可執(zhí)行權(quán)限
cd /usr/local/redis/keepalived/scripts
chmod u+x *`
- 在/etc/rc.local中添加啟動(dòng)
vim /etc/rc.local
service redis-server start
service keepalived start
- 啟動(dòng)backup1(主節(jié)點(diǎn))和backup2(從節(jié)點(diǎn))的redis, 并通過slave of設(shè)置redis主從
- 啟動(dòng)主節(jié)點(diǎn)的keepalived, 搶占vip
- 啟動(dòng)從節(jié)點(diǎn)的keepalived, 成為從節(jié)點(diǎn)
- ip a查看各機(jī)器ip, 是否綁定了vip
ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:f5:b3:41 brd ff:ff:ff:ff:ff:ff
inet 192.168.203.130/24 brd 192.168.203.255 scope global eth0
# vip綁定成功
inet 192.168.203.240/32 scope global eth0
- 在其他機(jī)器通過虛擬ip連接到redis進(jìn)行檢查, info查看redis當(dāng)前狀態(tài)
./bin/redis-cli -h 192.168.203.240
192.168.203.240:6379> info
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.203.130,port=6379,state=online,offset=12071,lag=1
master_repl_offset:12071
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11904
repl_backlog_histlen:168
四. 高可用原理分析
1. 主節(jié)點(diǎn)redis掛掉
- 主節(jié)點(diǎn)redis掛掉后, 健康檢查腳本ping不通過, 會返回1, weight設(shè)置為-10, 所以優(yōu)先級會降低, 發(fā)送自己的vrrp通告為100-10
- 從節(jié)點(diǎn)keepalived對比自己優(yōu)先級, 發(fā)出vrrp通告宣誓自己優(yōu)先級為100, 捍衛(wèi)主權(quán)(其實(shí)一直在發(fā)送), 從而被推選成為master, 觸發(fā)從節(jié)點(diǎn)機(jī)器的redis_master.sh腳本并執(zhí)行, redis停止復(fù)制, 成為主節(jié)點(diǎn), 向局域網(wǎng)內(nèi)其他機(jī)器發(fā)出arp包, 說明自己是10.101.67.240 ip對應(yīng)的機(jī)器, 各機(jī)器將backup2的MAC地址和10.101.67.240映射關(guān)系緩存起來, 通過
ip a
查看ip地址 - 主節(jié)點(diǎn)被降級成為backup節(jié)點(diǎn), 觸發(fā)redis_backup.sh腳本并執(zhí)行, 腳本再次嘗試啟動(dòng)redis, 并開啟復(fù)制 slave of backup2, redis成為slave, vip發(fā)生漂移
- 經(jīng)過以上幾步, 主節(jié)點(diǎn)redis掛掉發(fā)生故障轉(zhuǎn)移.
- 可以通過查看日志來監(jiān)控狀態(tài)切換
tail -f /usr/local/redis/keepalived/logs/backup.log
tail -f /usr/local/redis/keepalived/logs/master.log
- 整個(gè)過程不需要人為干預(yù)
2. 主節(jié)點(diǎn)keepalived掛掉
- 主節(jié)點(diǎn)keepalived掛掉后, vip直接發(fā)生漂移至備份節(jié)點(diǎn), 觸發(fā)備份節(jié)點(diǎn)成為主節(jié)點(diǎn), 執(zhí)行redis_master.sh, redis成為master, 應(yīng)用通過vip直接訪問該redis, 實(shí)現(xiàn)故障轉(zhuǎn)移,
- 發(fā)生故障的機(jī)器手動(dòng)啟動(dòng)掛掉的keepalived, 由于優(yōu)先級相同, 則成為backup節(jié)點(diǎn). 不會搶占vip, 觸發(fā)backup腳本, redis開啟復(fù)制
3. 主節(jié)點(diǎn)機(jī)器宕機(jī)
- 同主節(jié)點(diǎn)keepalived掛掉情況一致
4. 從節(jié)點(diǎn)redis掛掉
- 可以通過重啟keepalived, 重新開啟復(fù)制, 不會搶占master
5. 從節(jié)點(diǎn)keepalived掛掉和從節(jié)點(diǎn)宕機(jī)情況一致, 啟動(dòng)即可
五. 總結(jié)
經(jīng)過以上幾部, 可以實(shí)現(xiàn)redis高可用和故障轉(zhuǎn)移, 已在真機(jī)上驗(yàn)證
參考博客:
Redis + Keepalived主從集群的搭建及故障轉(zhuǎn)移: https://blog.csdn.net/ECHO_FOLLOW_HEART/article/details/51595228
redis中文網(wǎng): http://www.redis.cn/documentation.html
https://blog.csdn.net/ws891033655/article/details/39834457
http://blog.51cto.com/hao360/1435297
keepalived:
Keepalived原理與實(shí)戰(zhàn)精講--VRRP協(xié)議: https://blog.csdn.net/wngua/article/details/54668794
http://outofmemory.cn/wiki/keepalived-configuration
http://hugnew.com/?p=745
http://fengchj.com/?p=2156#respond