1宵膨、MHA高可用
1.1 高級(jí)應(yīng)用架構(gòu)演變
1、基本結(jié)構(gòu)
(1)一主一從
(2)一主多從
(3)多級(jí)主從
(4)雙主
(5)循環(huán)復(fù)制
2炸宵、高級(jí)應(yīng)用架構(gòu)演變
高性能架構(gòu)
(1)讀寫(xiě)分離中間件——MySQL proxy(atlas辟躏,mysql router,proxySQL土全,maxscale)捎琐、amoeba、xx-dbproxy等裹匙。
(2)分布式架構(gòu):分庫(kù)分表——cobar瑞凑、Mycat,自主研發(fā)等概页。高可用架構(gòu)
(3)單活:MMM架構(gòu)——mysql-mmm(google)
(4)單活:MHA架構(gòu)——mysql-master-ha(日本DeNa)
(5)多活:MGR(未來(lái)趨勢(shì))——5.7 新特性MySQL Group replication
(6)多活:PXC籽御、MySQL Cluster架構(gòu)
2.2 MHA介紹
1、MHA工作原理
manager節(jié)點(diǎn)通過(guò)mha專(zhuān)用用戶惰匙,監(jiān)控所有主從節(jié)點(diǎn)
系統(tǒng),網(wǎng)絡(luò),SSH連接性
主從狀態(tài),重點(diǎn)是主庫(kù)根據(jù)配置策略選主
(1) 如果判斷從庫(kù)(position或者GTID),數(shù)據(jù)有差異,最接近于Master的slave,成為備選主
(2) 如果判斷從庫(kù)(position或者GTID),數(shù)據(jù)一致,按照配置文件順序,選主.
(3) 如果設(shè)定有權(quán)重技掏,按照權(quán)重強(qiáng)制指定備選主庫(kù)candidate_master=1
check_repl_delay=0數(shù)據(jù)補(bǔ)償
(1) 當(dāng)SSH能連接原主庫(kù),從庫(kù)對(duì)比主庫(kù)GTID 或者position號(hào)项鬼,找出缺失部分的事務(wù)哑梳,立即將二進(jìn)制日志保存至各個(gè)從節(jié)點(diǎn)/tmp目錄下(save_binary_logs ),并且所有從庫(kù)都立即補(bǔ)償缺失部分?jǐn)?shù)據(jù)绘盟。
s1從庫(kù):
stop slave;
reset slave all; //清除所有原從庫(kù)信息
s2從庫(kù):
stop slave;
reset slave all; //清除所有原從庫(kù)信息
兩個(gè)從庫(kù)數(shù)據(jù)一致鸠真,s2從庫(kù) change master to s1 從庫(kù)
(2) 當(dāng)SSH不能連接原主庫(kù), 對(duì)比從庫(kù)之間的relaylog的差異(apply_diff_relay_logs) ,s2從庫(kù)應(yīng)用差異日志
s1從庫(kù):
stop slave;
reset slave all; //清除所有原從庫(kù)信息
s2從庫(kù):
stop slave;
reset slave all; //清除所有原從庫(kù)信息
兩個(gè)從庫(kù)數(shù)據(jù)一致龄毡,s2從庫(kù) change master to s1 從庫(kù)vip 會(huì)切換到新的主庫(kù)吠卷,實(shí)現(xiàn)應(yīng)用透明。
manager 做完以上操作后稚虎,自動(dòng)關(guān)閉撤嫩,清理掉配置文件中的故障節(jié)點(diǎn)信息。
2蠢终、架構(gòu)介紹
1主2從序攘,master:db01 slave:db02 db03
必須是獨(dú)立節(jié)點(diǎn),不支持多實(shí)例寻拂。
3程奠、MHA 高可用方案軟件構(gòu)成
Manager軟件:選擇一個(gè)從節(jié)點(diǎn)安裝(主節(jié)點(diǎn)除外)
Node軟件:所有節(jié)點(diǎn)都要安裝
Manager工具包主要包括以下幾個(gè)工具:
masterha_manger 啟動(dòng)MHA
masterha_check_ssh 檢查MHA的SSH配置狀況
masterha_check_repl 檢查MySQL復(fù)制狀況
masterha_master_monitor 檢測(cè)master是否宕機(jī)
masterha_check_status 檢測(cè)當(dāng)前MHA運(yùn)行狀態(tài)
masterha_master_switch 控制故障轉(zhuǎn)移(自動(dòng)或者手動(dòng))
masterha_conf_host 添加或刪除配置的server信息
Node工具包主要包括以下幾個(gè)工具:
這些工具通常由MHA Manager的腳本觸發(fā),無(wú)需人為操作
save_binary_logs 保存和復(fù)制master的二進(jìn)制日志
apply_diff_relay_logs 識(shí)別差異的中繼日志事件并將其差異的事件應(yīng)用于其他的
purge_relay_logs 清除中繼日志(不會(huì)阻塞SQL線程)
2祭钉、MHA環(huán)境搭建
2.1 規(guī)劃
主庫(kù): 51 node
從庫(kù): 52 node
53 node manager
2.2 環(huán)境準(zhǔn)備(略瞄沙。1主2從GTID復(fù)制,上文有)
- MHA上不能搭建過(guò)濾復(fù)制
2.3 搭建前準(zhǔn)備
1、設(shè)置relay-log的自動(dòng)刪除功能距境,從庫(kù)設(shè)置只讀
# 臨時(shí)關(guān)閉自動(dòng)清除relay log功能(所有庫(kù))
set global relay_log_purge = 0;
# 永久關(guān)閉自動(dòng)清除relay log功能(所有庫(kù)
[root@db01 data]# vim /etc/my.cnf
[mysqld]
relay_log_purge = 0
# 臨時(shí)開(kāi)啟只讀功能(所有從庫(kù))
set global read_only=1;
2申尼、配置關(guān)鍵程序軟連接(所有庫(kù))
ln -s /data/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /data/mysql/bin/mysql /usr/bin/mysql
3、配置各節(jié)點(diǎn)互信
# db01:
rm -rf /root/.ssh
ssh-keygen
cd /root/.ssh
mv id_rsa.pub authorized_keys
scp -r /root/.ssh 10.0.0.52:/root
scp -r /root/.ssh 10.0.0.53:/root
# 各節(jié)點(diǎn)驗(yàn)證垫桂,不需要輸密碼即成功
db01:
ssh 10.0.0.51 date
ssh 10.0.0.52 date
ssh 10.0.0.53 date
db02:
ssh 10.0.0.51 date
ssh 10.0.0.52 date
ssh 10.0.0.53 date
db03:
ssh 10.0.0.51 date
ssh 10.0.0.52 date
ssh 10.0.0.53 date
4师幕、在db01主庫(kù)中創(chuàng)建mha需要的用戶
grant all privileges on *.* to mha@'10.0.0.%' identified by 'mha';
2.4 下載MHA軟件
mha官網(wǎng):https://code.google.com/archive/p/mysql-master-ha/
github下載地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
安裝軟件:
# 所有節(jié)點(diǎn)安裝Node軟件依賴包
yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
# db03 安裝manager軟件
yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
2.5 db03配置文件準(zhǔn)備
# 創(chuàng)建配置文件目錄
mkdir -p /etc/mha
創(chuàng)建日志目錄
mkdir -p /var/log/mha/app1
# 編輯mha配置文件
vim /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/data/mysql/
user=mha
password=mha
ping_interval=2 //2秒種連接一次主庫(kù)
repl_password=123 //切換主從時(shí),change master 用到的復(fù)制用戶
repl_user=repl
ssh_user=root
[server1]
hostname=10.0.0.51
port=3306
[server2]
hostname=10.0.0.52
port=3306
[server3]
hostname=10.0.0.53
port=3306
2.6 狀態(tài)檢查
# 1诬滩、互信檢查
[root@db03 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Apr 19 16:39:34 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 19 16:39:34 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [info] Starting SSH connection tests..
Fri Apr 19 16:39:35 2019 - [debug]
Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.52(10.0.0.52:22)..
Fri Apr 19 16:39:34 2019 - [debug] ok.
Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.53(10.0.0.53:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:36 2019 - [debug]
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.51(10.0.0.51:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.53(10.0.0.53:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:37 2019 - [debug]
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.51(10.0.0.51:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.52(10.0.0.52:22)..
Fri Apr 19 16:39:36 2019 - [debug] ok.
Fri Apr 19 16:39:37 2019 - [info] All SSH connection tests passed successfully.
# 2霹粥、主從檢查
[root@db03 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Fri Apr 19 16:40:50 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 19 16:40:50 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:40:50 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:40:50 2019 - [info] MHA::MasterMonitor version 0.56.
Fri Apr 19 16:40:51 2019 - [info] GTID failover mode = 1
Fri Apr 19 16:40:51 2019 - [info] Dead Servers:
Fri Apr 19 16:40:51 2019 - [info] Alive Servers:
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.52(10.0.0.52:3306)
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.53(10.0.0.53:3306)
Fri Apr 19 16:40:51 2019 - [info] Alive Slaves:
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.52(10.0.0.52:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Fri Apr 19 16:40:51 2019 - [info] GTID ON
Fri Apr 19 16:40:51 2019 - [info] Replicating from 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.53(10.0.0.53:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Fri Apr 19 16:40:51 2019 - [info] GTID ON
Fri Apr 19 16:40:51 2019 - [info] Replicating from 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - [info] Current Alive Master: 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - [info] Checking slave configurations..
Fri Apr 19 16:40:51 2019 - [info] read_only=1 is not set on slave 10.0.0.52(10.0.0.52:3306).
Fri Apr 19 16:40:51 2019 - [info] read_only=1 is not set on slave 10.0.0.53(10.0.0.53:3306).
Fri Apr 19 16:40:51 2019 - [info] Checking replication filtering settings..
Fri Apr 19 16:40:51 2019 - [info] binlog_do_db= , binlog_ignore_db=
Fri Apr 19 16:40:51 2019 - [info] Replication filtering check ok.
Fri Apr 19 16:40:51 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Apr 19 16:40:51 2019 - [info] Checking SSH publickey authentication settings on the current master..
Fri Apr 19 16:40:51 2019 - [info] HealthCheck: SSH to 10.0.0.51 is reachable.
Fri Apr 19 16:40:51 2019 - [info]
10.0.0.51(10.0.0.51:3306) (current master)
+--10.0.0.52(10.0.0.52:3306)
+--10.0.0.53(10.0.0.53:3306)
Fri Apr 19 16:40:51 2019 - [info] Checking replication health on 10.0.0.52..
Fri Apr 19 16:40:51 2019 - [info] ok.
Fri Apr 19 16:40:51 2019 - [info] Checking replication health on 10.0.0.53..
Fri Apr 19 16:40:51 2019 - [info] ok.
Fri Apr 19 16:40:51 2019 - [warning] master_ip_failover_script is not defined.
Fri Apr 19 16:40:51 2019 - [warning] shutdown_script is not defined.
Fri Apr 19 16:40:51 2019 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
2.7 db03 啟動(dòng)MHA
# db03啟動(dòng)
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
# 查看mha
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:4719) is running(0:PING_OK), master:10.0.0.51
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.51 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 51 |
+---------------+-------+
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.52 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 52 |
+---------------+-------+
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.53 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 53 |
+---------------+-------+
2.8 Manager 參數(shù)介紹
# 啟動(dòng)命令參數(shù)
nohup //后臺(tái)管理進(jìn)程
masterha_manager //MHA啟動(dòng)命令腳本
--conf=/etc/mha/app1.cnf //指定配置文件
--remove_dead_master_conf //從配置文件中,移除宕機(jī)的主庫(kù)信息
--ignore_last_failover //忽略最后一次切換
# MHA配置文件額外參數(shù)
# 設(shè)置監(jiān)控主庫(kù)疼鸟,發(fā)送ping包(mysqldadmin ping)的時(shí)間間隔后控,嘗試三次沒(méi)有回應(yīng)的時(shí)候自動(dòng)進(jìn)行failover
ping_interval=1
# 設(shè)置為候選master,如果設(shè)置該參數(shù)以后空镜,發(fā)生主從切換以后將會(huì)將此從庫(kù)提升為主庫(kù)浩淘,即使這個(gè)主庫(kù)不是集群中事件最新的slave;放在節(jié)點(diǎn)標(biāo)簽下姑裂。
candidate_master=1
# 默認(rèn)情況下如果一個(gè)slave落后master 100M的relay logs的話馋袜,
MHA將不會(huì)選擇該slave作為一個(gè)新的master,因?yàn)閷?duì)于這個(gè)slave的恢復(fù)需要花費(fèi)很長(zhǎng)時(shí)間舶斧,通過(guò)設(shè)置check_repl_delay=0,MHA觸發(fā)切換在選擇一個(gè)新的master的時(shí)候?qū)?huì)忽略復(fù)制延時(shí),這個(gè)參數(shù)對(duì)于設(shè)置了candidate_master=1的主機(jī)非常有用察皇,因?yàn)檫@個(gè)候選主在切換的過(guò)程中一定是新的master茴厉;配合上個(gè)參數(shù)使用。
check_repl_delay=0
MHA工作機(jī)制:在MHA做完一次切換后什荣,會(huì)在工作目錄(/etc/mha/app1)中矾缓,會(huì)生成app1.failover.complete
,可以理解是鎖文件,8小時(shí)之內(nèi)稻爬,不會(huì)做第二次切換嗜闻。
- 傳統(tǒng)主從復(fù)制與基于MHA的主從復(fù)制的區(qū)別
傳統(tǒng)主從復(fù)制
1.主庫(kù)要開(kāi)啟binlog,從庫(kù)不用開(kāi)啟binlog
2.主庫(kù)要有中從復(fù)制用戶桅锄,從庫(kù)不需要?jiǎng)?chuàng)建主從復(fù)制用戶
3.主庫(kù)server_id=5,從庫(kù)只要不等于5即可琉雳,從庫(kù)之間可以相同
基于MHA的主從復(fù)制(從庫(kù)隨時(shí)會(huì)被提升為新的主庫(kù))
1.主庫(kù)要開(kāi)啟binlog,從庫(kù)也要開(kāi)啟binlog
2.主庫(kù)要?jiǎng)?chuàng)建主從復(fù)制用戶友瘤,從庫(kù)也必須創(chuàng)建主從復(fù)制用戶
3.主庫(kù)server_id=5翠肘,從庫(kù)只要不等于5即可,從庫(kù)之間必須不相同
2.9 MHA切換優(yōu)先級(jí)
- 所有從節(jié)點(diǎn)日志都是一致的辫秧,默認(rèn)會(huì)以配置文件的順序去選擇一個(gè)新主束倍。
- 從節(jié)點(diǎn)日志不一致,自動(dòng)選擇最接近于主庫(kù)的從庫(kù)
- candidate_master=1
check_repl_delay=0
如果對(duì)于某節(jié)點(diǎn)設(shè)定了權(quán)重(candidate_master=1),權(quán)重節(jié)點(diǎn)會(huì)優(yōu)先選擇绪妹。
但是此節(jié)點(diǎn)日志量落后主庫(kù)100M日志的話甥桂,也不會(huì)被選擇∮士酰可以配合check_repl_delay=0黄选,關(guān)閉日志量的檢查,強(qiáng)制選擇候選節(jié)點(diǎn)廊移。
2.10 故障模擬及處理
模擬故障
# 停主庫(kù)db01:
/etc/init.d/mysqld stop
# 觀察manager日志糕簿;末尾必須顯示如下,才算正常切換成功狡孔。
tail -f /var/log/mha/app1/manager
···
Master failover to 10.0.0.52(10.0.0.52:3306) completed successfully.
修復(fù)MHA架構(gòu)
# 修復(fù)主庫(kù)懂诗;觀察日志,當(dāng)看到切換主庫(kù) successfully 成功苗膝,再啟動(dòng)主庫(kù)殃恒。
/etc/init.d/mysqld start
# 修復(fù)主從結(jié)構(gòu)
CHANGE MASTER TO
MASTER_HOST='10.0.0.52',
MASTER_PORT=3306,
MASTER_AUTO_POSITION=1,
MASTER_USER='repl',
MASTER_PASSWORD='123';
start slave ;
# 修復(fù)配置文件
vim /etc/mha/app1.cnf
[server1]
hostname=10.0.0.51
port=3306
# db03 manager啟動(dòng)
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
3、MHA的vip功能
3.1 準(zhǔn)備腳本及配置文件
注意:腳本master_ip_failover必須事先準(zhǔn)備好辱揭,放在manager節(jié)點(diǎn) /usr/local/bin/下
# 修改腳本配置
vi /usr/local/bin/master_ip_failover
my $vip = '10.0.0.55/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
# 更改manager配置文件
vi /etc/mha/app1.cnf
添加:
master_ip_failover_script=/usr/local/bin/master_ip_failover
# 轉(zhuǎn)換中文字符格式
yum install -y dos2unix
dos2unix /usr/local/bin/master_ip_failover
# 添加執(zhí)行權(quán)限
[root@db03 ~]# chmod +x /usr/local/bin/master_ip_failover
3.2 手工生成第一個(gè)vip地址
- 手工在主庫(kù)上綁定vip离唐,注意一定要和腳本文件中的ethN一致,我的是eth0:1(1是key指定的值)
ifconfig eth0:1 10.0.0.55/24
3.3 重啟mha
masterha_stop --conf=/etc/mha/app1.cnf //關(guān)閉
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 & //啟動(dòng)
masterha_check_status --conf=/etc/mha/app1.cnf //檢查狀態(tài)
故障模擬:
# 停主庫(kù)db01
/etc/init.d/mysqld stop
# 查看db02 vip 是否漂移问窃;
# 查看管理機(jī)日志
tail -f /var/log/mha/app1/manager
···
Sat Feb 8 22:20:56 2020 - [info] Master failover to 10.0.0.52(10.0.0.52:3306) completed successfully.
# 修復(fù)db01亥鬓,登錄;重新復(fù)制
stop slave;
reset slave all; //清除所有原從庫(kù)信息
change master to
master_host='10.0.0.51',
master_user='repl',
master_password='123' ,
MASTER_AUTO_POSITION=1;
start slave;
# MHA配置文件將刪除的節(jié)點(diǎn)補(bǔ)上域庇;如果db02是主庫(kù)發(fā)生故障嵌戈,則補(bǔ)上db02節(jié)點(diǎn)以此類(lèi)推。
[root@db03 ~]# vim /etc/mha/app1.cnf
[server1]
hostname=10.0.0.51
port=3306
# 啟動(dòng)MHA
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
# 檢查
masterha_check_status --conf=/etc/mha/app1.cnf
- 生產(chǎn)中主庫(kù)故障听皿,要確定原主庫(kù)與新主庫(kù)數(shù)據(jù)的一致性熟呛,如不一致,需要將原主庫(kù)數(shù)據(jù)備份恢復(fù)到新主庫(kù)尉姨,原主庫(kù)變?yōu)閺膸?kù)復(fù)制新主庫(kù)庵朝。
3.4 調(diào)用自動(dòng)修復(fù)節(jié)點(diǎn)腳本
[root@db03 ~]# cat recovery_mha.sh
#! /bin/bash
# has down master IP
# old_master= `ssh 10.0.0.53 "sed -nr 's#^Master (.*)\(.*!$#\1#gp' /var/log/mha/app1/manager"`
# replication:change master to ...
change=`ssh 10.0.0.53 "grep -i 'change master to' /var/log/mha/app1/manager"|awk -F: 'END{print $NF}'|sed 's#xxx#123#g'`
# has down master mysqld start
/etc/init.d/mysqld start
# do change master to and start slave;
mysql -e "$change start slave;"
# recover /etc/mha/app1.cnf
ssh 10.0.0.53 '\cp /etc/mha/app1.cnf.ori /etc/mha/app1.cnf' //需要提前將MHA配置文件拷貝一份為 /etc/mha/app1.cnf.ori,作覆蓋之用
- 該腳本執(zhí)行完后需手動(dòng)啟動(dòng)MHA又厉;
每個(gè)節(jié)點(diǎn)都存放一個(gè) recovery_mha.sh 腳本九府,故障節(jié)點(diǎn)執(zhí)行該腳本;
用master_ip_failover腳本添加命令調(diào)用此腳本馋没;添加位置如下:
73 sub start_vip() {
74 `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
75 `ssh $ssh_user\@$orig_master_host \" /bin/sh /root/recovery_mha.sh \"`; //此行為調(diào)用命令
76 }
3.5 郵件提醒
1. 參數(shù):
vim /etc/mha/app1.cnf
report_script=/usr/local/bin/send
2. 準(zhǔn)備郵件腳本
send_report
(1)準(zhǔn)備發(fā)郵件的腳本(上傳 email_2019-最新.zip中的腳本昔逗,到/usr/local/bin/中)
(2)將準(zhǔn)備好的腳本添加到mha配置文件中,讓其調(diào)用
3. 修改manager配置文件,調(diào)用郵件腳本
vi /etc/mha/app1.cnf
report_script=/usr/local/bin/send
(3)停止MHA
masterha_stop --conf=/etc/mha/app1.cnf
(4)開(kāi)啟MHA
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
(5) 關(guān)閉主庫(kù),看警告郵件
故障修復(fù):
1. 恢復(fù)故障節(jié)點(diǎn)
(1)實(shí)例宕掉
/etc/init.d/mysqld start
(2)主機(jī)損壞篷朵,有可能數(shù)據(jù)也損壞了
備份并恢復(fù)故障節(jié)點(diǎn)勾怒。
2.恢復(fù)主從環(huán)境
看日志文件:
CHANGE MASTER TO MASTER_HOST='10.0.0.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
start slave ;
3.恢復(fù)manager
3.1 修好的故障節(jié)點(diǎn)配置信息婆排,加入到配置文件
[server1]
hostname=10.0.0.51
port=3306
3.2 啟動(dòng)manager
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
3.6 binlog Server
作用:實(shí)時(shí)保存主庫(kù)二進(jìn)制日志的服務(wù),主庫(kù)有的二進(jìn)制日志笔链,binlog Server一定有
部署原因:如果主庫(kù)宕機(jī)段只,從庫(kù)無(wú)法ssh連接主庫(kù)纠永,從庫(kù)會(huì)缺事務(wù)
使用前提:mysql 5.6以上版本扁眯,支持gtid并開(kāi)啟茎毁,MHA開(kāi)啟
db03配置:
vim /etc/mha/app1.cnf
[binlog1] //定義一個(gè)新的標(biāo)簽
no_master=1 //永遠(yuǎn)不會(huì)切換為主庫(kù)
hostname=10.0.0.53 //有binlog Server 的機(jī)器IP
master_binlog_dir=/data/mysql/binlog //日志存放位置停撞,與MHA配置文件里的master_binlog_dir=位置不能一致
# 創(chuàng)建目錄
mkdir -p /data/mysql/binlog
chown -R mysql.mysql /data/*
# 刪除參數(shù)
vim /etc/my.cnf
prompt=\\ pc [\\d]> //此參數(shù)刪除,否則會(huì)報(bào)錯(cuò)
# 看一下主庫(kù)的二進(jìn)制日志
show binary logs;
# 將主庫(kù)binlog拉過(guò)來(lái)(從000001開(kāi)始拉承绸,之后的binlog會(huì)自動(dòng)按順序過(guò)來(lái))
cd /data/mysql/binlog //必須進(jìn)入到自己創(chuàng)建好的目錄
mysqlbinlog -R --host=10.0.0.52 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &
# 注意:拉取日志的起點(diǎn),需要按照目前從庫(kù)的已經(jīng)獲取到的二進(jìn)制日志點(diǎn)為起點(diǎn)
# 重啟MHA
masterha_stop --conf=/etc/mha/app1.cnf
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
故障處理
# 主庫(kù)宕機(jī)兜蠕,binlogserver 自動(dòng)停掉唐片,manager 也會(huì)自動(dòng)停止莱预。
處理思路:
1柠掂、重新拉取新主庫(kù)的binlog到binlogserver存儲(chǔ)目錄中
2、重新配置文件binlog server主機(jī)信息
3依沮、最后再啟動(dòng)MHA
# binlog Server報(bào)錯(cuò)示例
···
Mon Feb 10 18:36:06 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/binlog --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000033
Mon Feb 10 18:36:06 2020 - [info] Connecting to root@10.0.0.53(10.0.0.53:22)..
Failed to save binary log: Binlog not found from /data/mysql/binlog! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.
at /usr/bin/save_binary_logs line 123.
eval {...} called at /usr/bin/save_binary_logs line 70
main::main() called at /usr/bin/save_binary_logs line 66
Mon Feb 10 18:36:06 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln158] Binlog setting check failed!
Mon Feb 10 18:36:06 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln264] Binlog server configuration failed.
Mon Feb 10 18:36:06 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48.
Mon Feb 10 18:36:06 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Mon Feb 10 18:36:06 2020 - [info] Got exit code 1 (Not master dead).
# 報(bào)錯(cuò)原因:連接53管理機(jī)涯贞,在/data/mysql/binlog/目錄沒(méi)有找到binlog文件,對(duì)比主庫(kù)和管理機(jī)的/data/mysql/binlog/目錄下的binlog文件是否一致危喉。
4宋渔、管理員在高可用架構(gòu)維護(hù)的職責(zé)
- 搭建:MHA+VIP+SendReport+BinlogServer
- 監(jiān)控及故障處理
- 高可用架構(gòu)的優(yōu)化
核心是:盡可能降低主從的延時(shí),讓MHA花在數(shù)據(jù)補(bǔ)償上的時(shí)間盡量減少辜限。
5.7 版本皇拣,開(kāi)啟GTID模式,開(kāi)啟從庫(kù)SQL并發(fā)復(fù)制薄嫡。