【高可用】MHA

[TOC]

MHA部署安裝

環(huán)境

描述 IP hostname Desc
Master 192.168.20.1 initnode101 Mysql Master節(jié)點,read-write
Slave1 192.168.20.2 initnode102 Mysql Slave 1節(jié)點,read-only
Slave2 192.168.20.3 initnode103 Mysql Slave 3節(jié)點,read-only,當Master宕機后,被提升為Master(New)
MHA Manager 192.168.20.4 initnode104 Mysql Master節(jié)點

配置主從

簡單介紹下配置好主從卡儒,已開啟GTID,建議在5.7環(huán)境下配置為無損復(fù)制。

  • 創(chuàng)建復(fù)制用戶
#
### mysql 101執(zhí)行
#
mysql> create user 'rpl'@'%' identified by '123456';
mysql> grant replication slave on *.* to rpl;
mysql> flush privileges;
  • 備份同步
#
# master 101
#
[root@initnode ~]# mysqldump --single-transaction --master-data=1 -A > mysql101_99.sql

# 102、103 需要去執(zhí)行下
flush privileges;

# 將文件傳輸?shù)絪lave服務(wù)器上
[root@initnode ~]# scp mysql101_99.sql root@192.168.20.102:/root
[root@initnode ~]# scp mysql101_99.sql root@192.168.20.103:/root

# 導(dǎo)入數(shù)據(jù)同步
#
# slave 101邪意、102
#
mysql> reset master;
[root@initnode ~]# mysql < mysql101_99.sql
  • 設(shè)置主從關(guān)系
#
# slave 101、102 分別執(zhí)行
#
mysql> change master to master_host='192.168.20.101',master_user='rpl',master_password='123456',master_port=3306,master_auto_position=1;

mysql> start slave;

mysql> show slave status\G

下載MHA安裝包

這邊下載的是 RPM 包反砌,個人喜歡簡單點的

wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm

下載安裝依賴包

#
# MHA Manager
#

# 下載 EPEL擴展包  地址:https://pkgs.org/download/epel-release
wget http://mirror.centos.org/centos/7/extras/x86_64/Packages/epel-release-7-11.noarch.rpm
# 安裝補充包EPEL
yum -y localinstall --nogpgcheck epel-release-7-11.noarch.rpm
# 安裝依賴包
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager -y

#
# node節(jié)點 101雾鬼、102、103
#
yum install perl-DBD-MySQL -y

安裝MHA的RPM包

#
# 所有服務(wù)器安裝宴树,包括 MHA Manager
#
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

#

# 僅在 MHA Manager 上安裝 manager包
#MHA節(jié)點服務(wù)器安裝依賴
yum install -y perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
# 僅在從節(jié)點安裝 master 包策菜,本例安裝在 104 上
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

注意: node 需要在所有節(jié)點上安裝!~

  • 可以查看安裝好的腳本
[root@initnode4 ~]# ll /usr/bin/ | grep masterha
-rwxr-xr-x    1 root root       1995 Mar 23  2018 masterha_check_repl
-rwxr-xr-x    1 root root       1779 Mar 23  2018 masterha_check_ssh
-rwxr-xr-x    1 root root       1865 Mar 23  2018 masterha_check_status
-rwxr-xr-x    1 root root       3201 Mar 23  2018 masterha_conf_host
-rwxr-xr-x    1 root root       2517 Mar 23  2018 masterha_manager
-rwxr-xr-x    1 root root       2165 Mar 23  2018 masterha_master_monitor
-rwxr-xr-x    1 root root       2373 Mar 23  2018 masterha_master_switch
-rwxr-xr-x    1 root root       5172 Mar 23  2018 masterha_secondary_check
-rwxr-xr-x    1 root root       1739 Mar 23  2018 masterha_stop

主庫上創(chuàng)建MHA管理用戶

#
# master 101端
#
mysql> create user 'manager'@'192.168.20.%' identified by '1122333';
Query OK, 0 rows affected (0.01 sec)

mysql> grant all privileges on *.* to 'manager'@'192.168.20.%' identified by '1122333';
Query OK, 0 rows affected, 1 warning (0.10 sec)

SSH免密登陸配置

SSH免密碼登錄是為了能夠讓 MHA Manager 可以遠程登錄到各個MySQL Server,以及各個MySQL Server之間可以相互遠程登錄 做入,然后進行相關(guān)操作冒晰,避免輸入密碼。

生成密鑰

#
# 所有服務(wù)端生成密鑰竟块,包括 MHA Manager
#一路回車
[root@initnode4 ~]# ssh-keygen -t rsa

公鑰分發(fā)

# MHA Manager 104端
[root@initnode4 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.101
[root@initnode4 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.102
[root@initnode4 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.103

# Master 101端
[root@initnode1 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.102
[root@initnode1 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.103

# Slave 102端
[root@initnode2 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.101
[root@initnode2 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.103

# Slave 103端 
[root@initnode3 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.101
[root@initnode3 .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.20.102

注意:僅僅需要 MySQL Server節(jié)點之間 壶运,以及 MHA Manager到所有MySQL Servers 之間相互打通SSH免密碼登錄;MySQL Servers到MHA Manager并不需要

修改hosts

為了方便使用主機名登錄浪秘, 同時MHA Manager在Failover階段會做resolve name 的操作蒋情,所以需要把下列信息加入到所有的節(jié)點中。

#
# 所有節(jié)點耸携,包括 MHA Manager
#
Shell> vim /etc/hosts
# 添加如下信息
192.168.20.101 initnode1
192.168.20.102 initnode2
192.168.20.103 initnode3

確認SSH已配置完成

在每個服務(wù)器上棵癣,執(zhí)行ssh@IP 和 ssh@hostname 的操作(MySQL Server 節(jié)點之間相互連接,以及MHA Manager 到所有的 MySQL Server)夺衍,確認可以免密碼登錄成功狈谊。

Shell> ssh 192.168.20.101
Shell> ssh 192.168.20.102
Shell> ssh 192.168.20.103

Shell> ssh initnode1
Shell> ssh initnode2
Shell> ssh initnode3

MHA配置

配置文件統(tǒng)一放在 /etc/masterha/ 中,針對當前這個MySQL復(fù)制組沟沙,我們定義為 app1.conf 河劝,如果還有 其他MySQL復(fù)制組 ,可以 繼續(xù)定義app2.conf / app3.conf等等

創(chuàng)建MHA目錄

#
# MHA Manager端
#
[root@initnode4 ~]# mkdir -p /etc/masterha
[root@initnode4 ~]# mkdir -p /var/log/masterha/app1

MHA配置文件設(shè)置

#
# MHA Manager端
#
[root@initnode4]# vi /etc/masterha/app1.conf
# 寫入如下矛紫,需更改對應(yīng)IP與binlog的實際路徑
[server default]
# 這兩個參數(shù)需要根據(jù)不同的集群進行修改赎瞎,工作目錄/日志目錄
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log

# 按照master服務(wù)器存放binlog的實際路徑進行修改,主要為了讓MHA拉取binlog
master_binlog_dir=/data/mysql_data/

# 設(shè)置自動failover的腳本
master_ip_failover_script= /usr/bin/master_ip_failover

# 設(shè)置手動切換時候的腳本(供(masterha_master_switch使用)
master_ip_online_change_script= /usr/bin/master_ip_online_change

# 切換郵件發(fā)送
report_script=/usr/bin/send_report

# 日志級別
log_level=debug

#監(jiān)控的用戶颊咬,因為要執(zhí)行一些必要的管理命令务甥,比如:Stop Slave、Change Master喳篇、#Reset Slave敞临,所以該賬戶應(yīng)該為高權(quán)限帳號,這也是缺省值麸澜。
user=manager
password=1122333

# 監(jiān)控主庫的時間間隔挺尿,默認是3秒,嘗試三次沒有回應(yīng)的時候自動進行failover
ping_interval=3

# 檢測方式是insert痰憎,MHA-0.56開始支持insert,默認select
# 會在Master中生成一個 infra 數(shù)據(jù)庫
ping_type=INSERT
# 設(shè)置遠端mysql在發(fā)生切換時binlog的保存位置,默認為 /var/tmp
remote_workdir=/tmp

# 復(fù)制的用戶
repl_user=rpl
# 復(fù)制用的密碼
repl_password=123456

# 告警腳本票髓,可自行修改,這里沒有使用
#report_script=/usr/send_report

# 通過從機進行二次探測的腳本铣耘, IP地址按照實際的情況進行修改洽沟,強烈建議使用兩個或多個網(wǎng)絡(luò)路由來檢查MySQL主服務(wù)器的可用性
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.20.102 -s 192.168.20.103 --user=manager --master_host=192.168.20.101 --master_port=3306

# 設(shè)置故障發(fā)生后關(guān)閉故障主機的腳本(主要作用是關(guān)閉主機防止發(fā)生腦裂,這里沒有使用,類似Fence功能)
#shutdown_script="/usr/bin/power_manager --command=stopssh2 --host=test-1 --ssh_user=root"

# 定義ssh的用戶
ssh_user=root

[server1]
# 這個hostname也可以配置成IP地址蜗细,同 ip 參數(shù)一樣
# 如果這里寫名字裆操,需要DNS配合怒详,或者使用 /etc/hosts
hostname=192.168.20.101
port=3306
# candidate_master參數(shù)的意思為:設(shè)置為候選Master,如果發(fā)生主從切換踪区,該主機會被提升為Master昆烁,即使這個服務(wù)器上的數(shù)據(jù)不是最新的(會用relay-log補全)
candidate_master=1

[server2]
hostname=192.168.20.102
port=3306
candidate_master=1

# check_repl_delay參數(shù)的意思為:默認情況下如果一個slave落后master 100M的relay logs的話,MHA將不會選擇該slave作為一個新的master缎岗;
# 因為對于這個slave的恢復(fù)需要花費很長時間;
# 通過設(shè)置check_repl_delay=0,MHA觸發(fā)切換在選擇一個新的master的時候?qū)雎詮?fù)制延時;# 這個參數(shù)對于設(shè)置了candidate_master=1的主機非常有用静尼,因為這個候選主在切換的過程中一定是新的master
check_repl_delay=0

[server3]
hostname=192.168.20.103
port=3306

# no_master 表示該主機不會被提升為Master
no_master=1

#[binlog1]
#hostname=192.168.20.101
#master_binlog_dir=/data/mysql_data/
#no_master=1

MHA腳本說明

  • 修改下面兩個腳本中的 $vip$eth,以符合自己實際的需求传泊;
  • 將下面兩個文件復(fù)制到 MHA Manager 節(jié)點的 /usr/bin中(同masterha_*在同一個目錄);
  • 賦予執(zhí)行權(quán)限
    • chmod +x master_ip_failover
    • chmod +x master_ip_online_change

master_ip_failover

vi /usr/bin/master_ip_failover

#!/usr/bin/env perl

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
#use MHA::DBHelper;
#my $if = "eth0";
my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

#自定義該組機器的vip
my $vip = '192.168.20.111/24';
# 廣播域(ip a 可查看)
my $brd='192.168.20.255';
#網(wǎng)卡名稱
my $eth = 'ens32';
# 網(wǎng)卡的別名編號
my $key = "1";
# 使用的系統(tǒng)是centos7 鼠渺,默認沒有安裝ifconfig命令,改為使用ip addr 添加
#my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
#my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";


my $ssh_start_vip = "/usr/sbin/ip addr add $vip broadcast  $brd dev $eth";
my $ssh_stop_vip = "/usr/sbin/ip addr del $vip broadcast $brd dev $eth";


GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

#sub add_vip {
#    my $output1 = `ssh -o ConnectTimeout=15  -o ConnectionAttempts=3 $orig_master_host /sbin/ip addr del $vip/32 dev $if`;
#    my $output2 = `ssh -o ConnectTimeout=15  -o ConnectionAttempts=3 $new_master_host /sbin/ip addr add $vip/32 dev $if`;
#}


sub main {

  print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {
      print "Disabling the VIP on old master: $orig_master_host \n";
      &stop_vip();
      # updating global catalog, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      ###my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      ###$new_master_handler->connect( $new_master_ip, $new_master_port,
       ### $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      ###$new_master_handler->disable_log_bin_local();
      ###print "Set read_only=0 on the new master.\n";
      ###$new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      #print "Creating app user on the new master..\n";
      #FIXME_xxx_create_user( $new_master_handler->{dbh} );
      ###$new_master_handler->enable_log_bin_local();
      ###$new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
      ###&add_vip();

      print "Enable the VIP -$vip on the new master - $new_master_host \n";
      &start_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn $@;

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    print "Checking the Status of the script.. OK \n";
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub start_vip {
    `ssh -p 22 $ssh_user\@$new_master_host \" $ssh_start_vip \"`;  # 按需修改端口
}

sub stop_vip {
    return 0 unless ($ssh_user);
    `ssh -p 22 $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

master_ip_online_change

vi /usr/bin/master_ip_online_change

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,  $new_master_ssh_user,
);

#自定義該組機器的vip
my $vip = '192.168.20.111/24';
#網(wǎng)卡名稱
my $eth = 'ens32';
# 網(wǎng)卡的別名編號
#my $key = "1";
#my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
#my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";

my $ssh_start_vip = "/usr/sbin/ip addr add $vip dev $eth";
my $ssh_stop_vip = "/usr/sbin/ip addr del $vip dev $eth";
my $ssh_user = "root";

GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();

sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
  }
  return @threads;
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
      $orig_master_handler->disable_log_bin_local();
      print current_time_us() . " Drpping app user on the orig master..\n";
      FIXME_xxx_drop_app_user($orig_master_handler);

      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      print current_time_us() . " Creating app user on the new master..\n";
      FIXME_xxx_create_app_user($new_master_handler);
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
#  die;
}

郵件腳本send_report

vim /usr/bin/send_report126眷细、163郵箱測試了下有問題拦盹,發(fā)不出郵件,換成新浪郵箱即可

#!/usr/bin/perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );

# 需要修改此處信息
my $smtp='smtp.sina.com';
my $mail_from='yangdaxianyx@sina.com';
my $mail_user='yangdaxianyx@sina.com';
my $mail_pass='db01c4d285980b6f';
#my $mail_to=['511504418@qq.com','qqwweee1@126.com'];
my $mail_to='511504418@qq.com';

GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);

# Do whatever you want here
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {
        my ($smtp, $mail_from, $mail_user, $mail_pass, $mail_to, $subject, $msg ) = @_;
        open my $DEBUG, ">/var/log/masterha/app1/mail.log"
                or die "Can't open the debug    file:$!\n";
        my $sender = new Mail::Sender {
                ctype           => 'text/plain;charset=utf-8',
                encoding        => 'utf-8',
                smtp            => $smtp,
                from            => $mail_from,
                auth            => 'LOGIN',
                TLS_allowed     => '0',
                authid          => $mail_user,
                authpwd         => $mail_pass,
                to              => $mail_to,
                subject         => $subject,
                debug           => $DEBUG
        };
        $sender->MailMsg(
                {
                        msg => $msg,
                        debug => $DEBUG
                }
        ) or print $Mail::Sender::Error;
        return 1;
}

exit 0;

Slave需要的設(shè)置【5.7開啟增強半同步不用設(shè)置】

從 MHA Failover 的過程中可以了解到溪椎, MHA Manager 在恢復(fù)(補齊)其他Slave數(shù)據(jù)時會用到 relay-log 普舆,因此這些 relay-log 需要被保留。
而默認情況下校读,SQL線程在回放完畢后沼侣,MySQL會 主動刪除relay-log ,需要 禁用 該功能地熄,確保 relay-log 不被自動刪除华临。

# 關(guān)閉relay_log自動清除功能
mysql> set global relay_log_purge = 0;
Query OK, 0 rows affected (0.00 sec)

# 設(shè)置只讀庫為 read_only
mysql> set global read_only = 1;
Query OK, 0 rows affected (0.01 sec)

但是這樣做了以后又帶來另外一個問題芯杀, relay-log會大量堆積 端考,導(dǎo)致磁盤空間緊張,所以需要 定時清空 過時的 relay-log 揭厚。
所幸的是MHA幫我們實現(xiàn)了這個功能却特, MHA Node 的安裝包中有一個 pure_relay_logs 工具,提供 刪除大量relay-log 的功能筛圆。

定時清理relay-log的腳本【5.7開啟增強半同步不用設(shè)置】

將下面這個腳本以及定時任務(wù)放入 所有的Slave節(jié)點

#!/bin/bash
# Filename:/usr/bin/cron_purge_relay_logs.sh
user=manager
passwd=1122333
port=3306
log_dir='/data/mysql_data/purge_relay_logs'
work_dir='/data/mysql_data/relay_log_hardlink'
purge='/usr/bin/purge_relay_logs'
if [[ ! -d ${work_dir} ]];then
mkdir ${work_dir} -p
fi
if [[ ! -d ${log_dir} ]];then
mkdir ${log_dir} -p
fi
${purge} --user=${user} --password=${passwd} --port=${port} --workdir=${work_dir} --disable_relay_log_purge >> ${log_dir}/purge_relay_logs.log 2>&1
  • 增加執(zhí)行權(quán)限
Shell> chmod +x /usr/bin/cron_purge_relay_logs.sh
  • 添加到計劃任務(wù)
Shell> crontab -e # 編輯定時任務(wù)
# 輸入如下
0 5 * * * /bin/bash /usr/bin/cron_purge_relay_logs.sh
# 保存退出
Shell> crontab -l # 查看當前cron信息
0 5 * * * /bin/bash /usr/bin/cron_purge_relay_logs.sh

pure_relay_logs的參數(shù)

  • --user : 用戶名
  • --password : 密碼
  • --port : 端口號
  • --workdir : 指定 創(chuàng)建relay-log的硬鏈接 的位置
    • 默認是 /var/tmp 裂明,由于 硬連接不能跨分區(qū) ,所以請 確保這個目錄和你的relay-log在同一個分區(qū) 太援;
    • 當腳本執(zhí)行成功后闽晦,硬鏈接會被刪除
  • --disable_relay_log_purge : 默認情況下,如果 relay_log_purge=1 提岔,腳本會什么都不清理仙蛉,自動退出
    • 通過設(shè)定這個參數(shù),當 relay_log_purge=1 的情況下碱蒙,會執(zhí)行腳本荠瘪,并將該參數(shù)relay_log_purge設(shè)置為0夯巷。
    • 但是最好還是要在 /etc/my.cnf 中顯示配置 relay_log_purge=0 ,避免重啟服務(wù)后被還原哀墓。

MHA配置檢測

MHA Master上操作

SSH關(guān)系檢測

[root@initnode4 bin]# masterha_check_ssh --conf=/etc/masterha/app1.conf

Tue Sep 10 00:45:20 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Sep 10 00:45:20 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Tue Sep 10 00:45:20 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
...省略...
Tue Sep 10 00:45:23 2019 - [info] All SSH connection tests passed successfully.    # 看到最后這一行為 successfully

主從復(fù)制關(guān)系檢測

[root@initnode4 bin]# masterha_check_repl --conf=/etc/masterha/app1.conf

Tue Sep 10 00:46:23 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Sep 10 00:46:23 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Tue Sep 10 00:46:23 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
Tue Sep 10 00:46:23 2019 - [info] MHA::MasterMonitor version 0.58.
Tue Sep 10 00:46:23 2019 - [debug] Connecting to servers..
...省略...
MySQL Replication Health is OK.  # 最后一行為檢測OK

啟動或關(guān)閉MHA

添加VIP

目前大概有如下幾種訪問方式:
1-虛擬IP

  • 方式一: Keepalived :Keepalived是Linux下的一個高可用組件趁餐,可以讓兩臺或多臺Linux主機組成一個高可用集群,對外提供一個虛擬IP進行訪問篮绰,但是 Keepalived 可能由于網(wǎng)絡(luò)問題導(dǎo)致腦裂后雷。
  • 方式二:使用腳本 masterha_ip_failover ;由于這個腳本的start/stop操作只有在切換(Failover)期間才會執(zhí)行吠各。即便你設(shè)置了該腳本喷面,MHA也不會在一開始( masterha_manager啟動 )的時候,幫你在Master上設(shè)置VIP走孽,初次在Master上設(shè)置VIP需要人工操作惧辈,后期如果有Failover操作,MHA會執(zhí)行腳本磕瓷,幫你切換盒齿。

2-智能DNS
此種模式下,MySQL集群本身只做高可用困食,維持復(fù)制關(guān)系边翁,Apps通過內(nèi)網(wǎng)DNS來訪問MySQL。此種情況下硕盹,需要DNS端可以檢測MySQL服務(wù)是否可用符匾,從而決定推送Master還是Slave1的IP地址給Apps,類似Smart Client的效果瘩例;

推薦使用的方式是方式二啊胶,生成VIP

#
# Master 101端
#
#在我的虛擬機中,網(wǎng)卡名稱為ens32垛贤,初始化VIP
Shell> ip addr add 192.168.20.111/24 dev ens32

## 刪除 VIP 如下
ip addr del 192.168.20.111/24 dev ens32

啟動MHA

#
# MHA Manager 104端
# 啟動如果出錯去配置的日志文件下查看
[root@initnode4 bin]# nohup masterha_manager --conf=/etc/masterha/app1.conf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[1] 22390

# 查看啟動狀態(tài)
[root@initnode4 bin]# masterha_check_status --conf=/etc/masterha/app1.conf
app1 (pid:22390) is running(0:PING_OK), master:192.168.20.101  # 當前狀態(tài)正常焰坪,master是 101

masterha_manager參數(shù)解釋

  • --conf :當前MySQL集群的配置文件,可以有多個聘惦,應(yīng)用于不同的集群
  • --remove_dead_master_conf :當發(fā)生切換操作后(Failover)某饰,需要把之前的 Dead Master 從配置文件中刪除。如果不刪除善绎,且沒有恢復(fù)的話黔漂,此時 masterha_manager 重啟后,會報錯 "there is a dead slave"
  • --ignore_last_failover :如果前一次Failover失敗了禀酱,MHA不會去再一次去做Failover操作炬守,除非人為的刪除 (manager_workdir)/(app_name).failover.error ,或者增加此參數(shù)比勉,MHA會繼續(xù)進行Failover操作劳较。

自動Failover測試

測試步驟如下:
1. 停掉 Slave的IO線程 驹止,模擬復(fù)制延時(Slave1保持不變);
2. 使用sysbench對Master進行測試观蜗,生成測試數(shù)據(jù)(可以 產(chǎn)生大量的binlog )臊恋;
3. 等待步驟2完成 后, 開啟Slave上的IO線程 墓捻,去追Master的binlog抖仅,同時立即操作第四步;
4. 關(guān)閉Master上的MySQL 砖第,讓MHA 產(chǎn)生 Failover 操作撤卢;
5. 觀察最終狀態(tài);

停止Slave的IO線程

#
#Slave 102梧兼,103端
#
# show slave status\G 會看到 Slave_IO_Running: No
mysql> stop slave io_thread;

生成測試數(shù)據(jù)放吩,模擬大量寫入

主上沒有安裝 sysbench 可以使用yum安裝,需要安裝EPEL源

  • sysbench 生成測試數(shù)據(jù)
#
# Master 101端
# 若主庫不存在 sbtest 需要創(chuàng)建羽杰,給sysbench使用
mysql> create database sbtest;

# 寫入測試數(shù)據(jù)渡紫,產(chǎn)生測試的binlig
[root@initnode1 ~]# sysbench /usr/share/sysbench/oltp_insert.lua --mysql-user=test --mysql-password=123456 --mysql-host=192.168.20.101 --mysql-port=3306 --threads=2 --events=0 --time=30 --table-size=10000 --tables=2 --report-interval=3 prepare
[root@initnode1 ~]# sysbench /usr/share/sysbench/oltp_insert.lua --mysql-user=test --mysql-password=123456 --mysql-host=192.168.20.101 --mysql-port=3306 --threads=2 --events=0 --time=30 --table-size=10000 --tables=2 --report-interval=3 run
  • sysbench跑完后,繼續(xù)手動寫入一些數(shù)據(jù)
mysql> create database aaa;
Query OK, 1 row affected (0.00 sec)

mysql> use aaa;
Database changed

mysql> create table t(id int);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t select 1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t select 2;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t select 3;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

開啟Slave的IO線程考赛,同時kill掉master上mysql進程

#
# Slave 1惕澎、Slave 2上執(zhí)行
#
mysql> start slave io_thread;

#
# master(101)上執(zhí)行
#
[root@initnode1 ~]# ps -ef|grep mysql
root       6641      1  0 Sep09 ?        00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql_data --pid-file=/data/mysql_data/initnode1.pid
mysql      7644   6641  1 Sep09 ?        00:02:20 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql_data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=error.log --pid-file=/data/mysql_data/initnode1.pid --port=3306
root       8306   7875  0 01:20 pts/1    00:00:00 grep --color=auto mysql
[root@initnode1 ~]# kill -9 6641 #記得先kill mysqld_safe進程,否則kill掉的mysql進程會被自動拉起
[root@initnode1 ~]# kill -9 7644

MHA切換日志明細

Failover過程颜骤,大致包含如下步驟:
1. 多次探測(含通過Slave進行二次探測)Master唧喉,確認是否宕機;
2. 檢查配置文件階段忍抽, 包括MySQL實例的當前的配置
3. 處理Master八孝,前提是還能ssh到Master,否則跳過該步驟梯找; 包括刪除虛擬IP唆阿,還有關(guān)機操作(防止腦裂益涧,但是關(guān)機配置我們注釋了)锈锤;
4. 如果還能SSH到Master上,則 復(fù)制binlog 到MHA Manager上闲询,否則跳過? 這一步在源碼和之前網(wǎng)上別人的日志中是能看到 scp 信息的久免,但是我這里沒出現(xiàn)scp信息,不知道是否是MHA0.57的問題扭弧,但是最終數(shù)據(jù)是能一致(表明確實是獲取binlog了)
從這里看出阎姥,單純的靠MHA是不能完全保證數(shù)據(jù)不丟的;
5. 確認包含最新更新的Slave 鸽捻;
6. 應(yīng)用從master保存的binlog 呼巴;
7. 將配置中配置為 candidate_master=1 的Slave泽腮,提升為 New Master;
8. 配置其他的Slave連接( CHANGE MASTER TO ) New Master衣赶;

Tue Sep 10 02:48:12 2019 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
# ----------------MHA 發(fā)現(xiàn)Master已經(jīng)不通了-------------------
Tue Sep 10 02:48:33 2019 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Tue Sep 10 02:48:33 2019 - [info] Executing SSH check script: exit 0
Tue Sep 10 02:48:33 2019 - [debug] SSH connection test to 192.168.20.101, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
# ----------啟用二次探測的方式诊赊,進一步確認Master是否掛了---------
Tue Sep 10 02:48:33 2019 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 192.168.20.102 -s 192.168.20.103 --user=root --master_host=192.168.20.101 --master_port=3306  --user=root  --master_host=192.168.20.101  --master_i
p=192.168.20.101  --master_port=3306 --master_user=root --master_password=1122333 --ping_type=INSERT
Tue Sep 10 02:48:34 2019 - [info] HealthCheck: SSH to 192.168.20.101 is reachable.
Monitoring server 192.168.20.102 is reachable, Master is not reachable from 192.168.20.102. OK.
Monitoring server 192.168.20.103 is reachable, Master is not reachable from 192.168.20.103. OK.
Tue Sep 10 02:48:34 2019 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Tue Sep 10 02:48:36 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.101' (111))
Tue Sep 10 02:48:36 2019 - [warning] Connection failed 2 time(s)..
Tue Sep 10 02:48:39 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.101' (111))
Tue Sep 10 02:48:39 2019 - [warning] Connection failed 3 time(s)..
Tue Sep 10 02:48:42 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.20.101' (111))
Tue Sep 10 02:48:42 2019 - [warning] Connection failed 4 time(s)..
# ----------------------確認Master已經(jīng)掛了----------------------
Tue Sep 10 02:48:42 2019 - [warning] Master is not reachable from health checker!
Tue Sep 10 02:48:42 2019 - [warning] Master 192.168.20.101(192.168.20.101:3306) is not reachable!
Tue Sep 10 02:48:42 2019 - [warning] SSH is reachable.
Tue Sep 10 02:48:42 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.conf again, and trying to connect to all servers to check server status..
Tue Sep 10 02:48:42 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
# --------------------檢查MHA配置----------------------
Tue Sep 10 02:48:42 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Tue Sep 10 02:48:42 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
Tue Sep 10 02:48:42 2019 - [debug] Skipping connecting to dead master 192.168.20.101(192.168.20.101:3306).
Tue Sep 10 02:48:42 2019 - [debug] Connecting to servers..
Tue Sep 10 02:48:43 2019 - [debug]  Connected to: 192.168.20.102(192.168.20.102:3306), user=root
Tue Sep 10 02:48:43 2019 - [debug]  Number of slave worker threads on host 192.168.20.102(192.168.20.102:3306): 0
Tue Sep 10 02:48:43 2019 - [debug]  Connected to: 192.168.20.103(192.168.20.103:3306), user=root
Tue Sep 10 02:48:43 2019 - [debug]  Number of slave worker threads on host 192.168.20.103(192.168.20.103:3306): 0
Tue Sep 10 02:48:43 2019 - [debug]  Comparing MySQL versions..
Tue Sep 10 02:48:43 2019 - [debug]   Comparing MySQL versions done.
Tue Sep 10 02:48:43 2019 - [debug] Connecting to servers done.
Tue Sep 10 02:48:43 2019 - [info] GTID failover mode = 1
Tue Sep 10 02:48:43 2019 - [info] Dead Servers:
Tue Sep 10 02:48:43 2019 - [info]   192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:43 2019 - [info] Alive Servers:
Tue Sep 10 02:48:43 2019 - [info]   192.168.20.102(192.168.20.102:3306)
Tue Sep 10 02:48:43 2019 - [info]   192.168.20.103(192.168.20.103:3306)
Tue Sep 10 02:48:43 2019 - [info] Alive Slaves:
Tue Sep 10 02:48:43 2019 - [info]   192.168.20.102(192.168.20.102:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:43 2019 - [info]     GTID ON
Tue Sep 10 02:48:43 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:43 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:43 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Sep 10 02:48:43 2019 - [info]   192.168.20.103(192.168.20.103:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:43 2019 - [info]     GTID ON
Tue Sep 10 02:48:43 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:43 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:43 2019 - [info]     Not candidate for the new Master (no_master is set)
Tue Sep 10 02:48:43 2019 - [info] Checking slave configurations..
Tue Sep 10 02:48:43 2019 - [info] Checking replication filtering settings..
Tue Sep 10 02:48:43 2019 - [info]  Replication filtering check ok.
Tue Sep 10 02:48:43 2019 - [info] Master is down!
Tue Sep 10 02:48:43 2019 - [info] Terminating monitoring script.
Tue Sep 10 02:48:43 2019 - [debug]  Disconnected from 192.168.20.102(192.168.20.102:3306)
Tue Sep 10 02:48:43 2019 - [debug]  Disconnected from 192.168.20.103(192.168.20.103:3306)
Tue Sep 10 02:48:43 2019 - [info] Got exit code 20 (Master dead).
Tue Sep 10 02:48:43 2019 - [info] MHA::MasterFailover version 0.58.
# --------------------開始進行切換----------------------
Tue Sep 10 02:48:43 2019 - [info] Starting master failover.
Tue Sep 10 02:48:43 2019 - [info] 
#==================第一階段:配置檢查(包括MySQL的配置)==============
Tue Sep 10 02:48:43 2019 - [info] * Phase 1: Configuration Check Phase..
Tue Sep 10 02:48:43 2019 - [info] 
Tue Sep 10 02:48:44 2019 - [debug] Skipping connecting to dead master 192.168.20.101.
Tue Sep 10 02:48:44 2019 - [debug] Connecting to servers..
Tue Sep 10 02:48:45 2019 - [debug]  Connected to: 192.168.20.102(192.168.20.102:3306), user=root
Tue Sep 10 02:48:45 2019 - [debug]  Number of slave worker threads on host 192.168.20.102(192.168.20.102:3306): 0
Tue Sep 10 02:48:45 2019 - [debug]  Connected to: 192.168.20.103(192.168.20.103:3306), user=root
Tue Sep 10 02:48:45 2019 - [debug]  Number of slave worker threads on host 192.168.20.103(192.168.20.103:3306): 0
Tue Sep 10 02:48:45 2019 - [debug]  Comparing MySQL versions..
Tue Sep 10 02:48:45 2019 - [debug]   Comparing MySQL versions done.
Tue Sep 10 02:48:45 2019 - [debug] Connecting to servers done.
Tue Sep 10 02:48:45 2019 - [info] GTID failover mode = 1
Tue Sep 10 02:48:45 2019 - [info] Dead Servers:
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info] Alive Servers:
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.102(192.168.20.102:3306)
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.103(192.168.20.103:3306)
Tue Sep 10 02:48:45 2019 - [info] Alive Slaves:
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.102(192.168.20.102:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:45 2019 - [info]     GTID ON
Tue Sep 10 02:48:45 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:45 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.103(192.168.20.103:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:45 2019 - [info]     GTID ON
Tue Sep 10 02:48:45 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:45 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info]     Not candidate for the new Master (no_master is set)
Tue Sep 10 02:48:45 2019 - [info] Starting GTID based failover.
Tue Sep 10 02:48:45 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [info] ** Phase 1: Configuration Check Phase completed.
# ==========================第一階段完成=======================
Tue Sep 10 02:48:45 2019 - [info] 
# ====================第二階段: 開始清理Master===================
Tue Sep 10 02:48:45 2019 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Sep 10 02:48:45 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Sep 10 02:48:45 2019 - [info] Executing master IP deactivation script:
Tue Sep 10 02:48:45 2019 - [info]   /usr/bin/master_ip_failover --orig_master_host=192.168.20.101 --orig_master_ip=192.168.20.101 --orig_master_port=3306 --command=stopssh --ssh_user=root  
Tue Sep 10 02:48:45 2019 - [debug]  Stopping IO thread on 192.168.20.102(192.168.20.102:3306)..
Tue Sep 10 02:48:45 2019 - [debug]  Stopping IO thread on 192.168.20.103(192.168.20.103:3306)..
Tue Sep 10 02:48:45 2019 - [debug]  Stop IO thread on 192.168.20.103(192.168.20.103:3306) done.
Tue Sep 10 02:48:45 2019 - [debug]  Stop IO thread on 192.168.20.102(192.168.20.102:3306) done.
# ====================第二階段完成===================

IN SCRIPT TEST====/usr/sbin/ip addr del 192.168.20.111/24 dev ens32==/usr/sbin/ip addr add 192.168.20.111/24 dev ens32===
# --------------------清除當前主上的VIP----------------------
Disabling the VIP on old master: 192.168.20.101 
Tue Sep 10 02:48:45 2019 - [info]  done.
# shutdown_script 是關(guān)機腳本,為了防止腦裂府瞄,由于配置中被注釋了碧磅,所以這里忽略
Tue Sep 10 02:48:45 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Sep 10 02:48:45 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Sep 10 02:48:45 2019 - [info] 
# ====================第三階段:Master日志拉取===================
# 如果Master服務(wù)器掛了,這個階段應(yīng)該跳過
Tue Sep 10 02:48:45 2019 - [info] * Phase 3: Master Recovery Phase..
Tue Sep 10 02:48:45 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Sep 10 02:48:45 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [debug] Fetching current slave status..
Tue Sep 10 02:48:45 2019 - [debug]  Fetching current slave status done.
Tue Sep 10 02:48:45 2019 - [info] The latest binary log file/position on all slaves is bin.000014:5757
Tue Sep 10 02:48:45 2019 - [info] Retrieved Gtid Set: a7776f71-c8be-11e9-838f-0050563bb195:751444-751708
Tue Sep 10 02:48:45 2019 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.102(192.168.20.102:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:45 2019 - [info]     GTID ON
Tue Sep 10 02:48:45 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:45 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.103(192.168.20.103:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:45 2019 - [info]     GTID ON
Tue Sep 10 02:48:45 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:45 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info]     Not candidate for the new Master (no_master is set)
Tue Sep 10 02:48:45 2019 - [info] The oldest binary log file/position on all slaves is bin.000014:5757
Tue Sep 10 02:48:45 2019 - [info] Retrieved Gtid Set: a7776f71-c8be-11e9-838f-0050563bb195:751444-751708
Tue Sep 10 02:48:45 2019 - [info] Oldest slaves:
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.102(192.168.20.102:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:45 2019 - [info]     GTID ON
Tue Sep 10 02:48:45 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:45 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.103(192.168.20.103:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:45 2019 - [info]     GTID ON
Tue Sep 10 02:48:45 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:45 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info]     Not candidate for the new Master (no_master is set)
Tue Sep 10 02:48:45 2019 - [info] 
# -----------------------選舉新的Master------------------
Tue Sep 10 02:48:45 2019 - [info] * Phase 3.3: Determining New Master Phase..
Tue Sep 10 02:48:45 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [info] Searching new master from slaves..
Tue Sep 10 02:48:45 2019 - [info]  Candidate masters from the configuration file:
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.102(192.168.20.102:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:45 2019 - [info]     GTID ON
Tue Sep 10 02:48:45 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:45 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Sep 10 02:48:45 2019 - [info]  Non-candidate masters:
Tue Sep 10 02:48:45 2019 - [info]   192.168.20.103(192.168.20.103:3306)  Version=5.7.27-log (oldest major version between slaves) log-bin:enabled
Tue Sep 10 02:48:45 2019 - [info]     GTID ON
Tue Sep 10 02:48:45 2019 - [debug]    Relay log info repository: TABLE
Tue Sep 10 02:48:45 2019 - [info]     Replicating from 192.168.20.101(192.168.20.101:3306)
Tue Sep 10 02:48:45 2019 - [info]     Not candidate for the new Master (no_master is set)
Tue Sep 10 02:48:45 2019 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Sep 10 02:48:45 2019 - [info] New master is 192.168.20.102(192.168.20.102:3306)
#----------------------開始切換 new master----------------
Tue Sep 10 02:48:45 2019 - [info] Starting master failover..
Tue Sep 10 02:48:45 2019 - [info] 
From:
192.168.20.101(192.168.20.101:3306) (current master)
 +--192.168.20.102(192.168.20.102:3306)
 +--192.168.20.103(192.168.20.103:3306)

To:
192.168.20.102(192.168.20.102:3306) (new master)
 +--192.168.20.103(192.168.20.103:3306)
Tue Sep 10 02:48:45 2019 - [info] 
# -------------------新master恢復(fù)---------------
Tue Sep 10 02:48:45 2019 - [info] * Phase 3.3: New Master Recovery Phase..
Tue Sep 10 02:48:45 2019 - [info] 
# --------------------等待日志被應(yīng)用-------------
Tue Sep 10 02:48:45 2019 - [info]  Waiting all logs to be applied.. 
Tue Sep 10 02:48:45 2019 - [info]   done.
Tue Sep 10 02:48:45 2019 - [debug]  Stopping slave IO/SQL thread on 192.168.20.102(192.168.20.102:3306)..
Tue Sep 10 02:48:45 2019 - [debug]   done.
Tue Sep 10 02:48:45 2019 - [info] Getting new master's binlog name and position..
Tue Sep 10 02:48:45 2019 - [info]  bin.000001:61448
# ------------------執(zhí)行Change Master操作--------------
Tue Sep 10 02:48:45 2019 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.20.102', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rpl', MASTER_PASSWORD='xxx';
Tue Sep 10 02:48:45 2019 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: bin.000001, 61448, a7776f71-c8be-11e9-838f-0050563bb195:1-751708
Tue Sep 10 02:48:45 2019 - [info] Executing master IP activate script:
Tue Sep 10 02:48:45 2019 - [info]   /usr/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.20.101 --orig_master_ip=192.168.20.101 --orig_master_port=3306 --new_master_host=192.168.20.102 --new_master_ip=192.168.20.102 --new_mast
er_port=3306 --new_master_user='root'   --new_master_password=xxx


IN SCRIPT TEST====/usr/sbin/ip addr del 192.168.20.111/24 dev ens32==/usr/sbin/ip addr add 192.168.20.111/24 dev ens32===
# -----------------將 VIP 漂移到 Slave1 上----------------
Enable the VIP -192.168.20.111/24 on the new master - 192.168.20.102 
Tue Sep 10 02:48:45 2019 - [info]  OK.
Tue Sep 10 02:48:45 2019 - [info] Setting read_only=0 on 192.168.20.102(192.168.20.102:3306)..
Tue Sep 10 02:48:45 2019 - [info]  ok.
Tue Sep 10 02:48:45 2019 - [info] ** Finished master recovery successfully.
# ------------------Master恢復(fù)完成----------------
Tue Sep 10 02:48:45 2019 - [info] * Phase 3: Master Recovery Phase completed.
# ====================== 第三階段 完成 =======================
a":{}}}
Tue Sep 10 02:48:45 2019 - [info] 
# ====================== 第四階段 slave恢復(fù) =======================
Tue Sep 10 02:48:45 2019 - [info] * Phase 4: Slaves Recovery Phase..
Tue Sep 10 02:48:45 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [info] * Phase 4.1: Starting Slaves in parallel..
Tue Sep 10 02:48:45 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [info] -- Slave recovery on host 192.168.20.103(192.168.20.103:3306) started, pid: 24015. Check tmp log /var/log/masterha/app1/192.168.20.103_3306_20190910024843.log if it takes time..
Tue Sep 10 02:48:46 2019 - [info] 
Tue Sep 10 02:48:46 2019 - [info] Log messages from 192.168.20.103 ...
Tue Sep 10 02:48:46 2019 - [info] 
Tue Sep 10 02:48:45 2019 - [info]  Resetting slave 192.168.20.103(192.168.20.103:3306) and starting replication from the new master 192.168.20.102(192.168.20.102:3306)..
Tue Sep 10 02:48:45 2019 - [debug]  Stopping slave IO/SQL thread on 192.168.20.103(192.168.20.103:3306)..
Tue Sep 10 02:48:45 2019 - [debug]   done.
Tue Sep 10 02:48:45 2019 - [info]  Executed CHANGE MASTER.
Tue Sep 10 02:48:45 2019 - [debug]  Starting slave IO/SQL thread on 192.168.20.103(192.168.20.103:3306)..
Tue Sep 10 02:48:45 2019 - [debug]   done.
Tue Sep 10 02:48:45 2019 - [info]  Slave started.
Tue Sep 10 02:48:45 2019 - [info]  gtid_wait(a7776f71-c8be-11e9-838f-0050563bb195:1-751708) completed on 192.168.20.103(192.168.20.103:3306). Executed 0 events.
Tue Sep 10 02:48:46 2019 - [info] End of log messages from 192.168.20.103.
Tue Sep 10 02:48:46 2019 - [info] -- Slave on host 192.168.20.103(192.168.20.103:3306) started.
Tue Sep 10 02:48:46 2019 - [info] All new slave servers recovered successfully.
# ====================== 第四階段 完成遵馆,所有從機恢復(fù)完成 =======================
Tue Sep 10 02:48:46 2019 - [info] 
# ====================== 第五階段 New Master清理 =======================
Tue Sep 10 02:48:46 2019 - [info] * Phase 5: New master cleanup phase..
Tue Sep 10 02:48:46 2019 - [info] 
Tue Sep 10 02:48:46 2019 - [info] Resetting slave info on the new master..
Tue Sep 10 02:48:46 2019 - [debug]  Clearing slave info..
Tue Sep 10 02:48:46 2019 - [debug]  Stopping slave IO/SQL thread on 192.168.20.102(192.168.20.102:3306)..
Tue Sep 10 02:48:46 2019 - [debug]   done.
Tue Sep 10 02:48:46 2019 - [debug]  SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
Tue Sep 10 02:48:46 2019 - [info]  192.168.20.102: Resetting slave info succeeded.
Tue Sep 10 02:48:46 2019 - [info] Master failover to 192.168.20.102(192.168.20.102:3306) completed successfully.
Tue Sep 10 02:48:46 2019 - [info] Deleted server1 entry from /etc/masterha/app1.conf .
Tue Sep 10 02:48:46 2019 - [debug]  Disconnected from 192.168.20.102(192.168.20.102:3306)
Tue Sep 10 02:48:46 2019 - [debug]  Disconnected from 192.168.20.103(192.168.20.103:3306)
Tue Sep 10 02:48:46 2019 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.20.101(192.168.20.101:3306) to 192.168.20.102(192.168.20.102:3306) succeeded

Master 192.168.20.101(192.168.20.101:3306) is down!

Check MHA Manager logs at initnode4:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.20.101(192.168.20.101:3306)
Selected 192.168.20.102(192.168.20.102:3306) as a new master.
192.168.20.102(192.168.20.102:3306): OK: Applying all logs succeeded.
192.168.20.102(192.168.20.102:3306): OK: Activated master IP address.
192.168.20.103(192.168.20.103:3306): OK: Slave started, replicating from 192.168.20.102(192.168.20.102:3306)
192.168.20.102(192.168.20.102:3306): Resetting slave info succeeded.
Master failover to 192.168.20.102(192.168.20.102:3306) completed successfully.
# =======================在線切換成功======================

查看最終復(fù)制關(guān)系

  • VIP 查看
    VIP從101端漂移到了102端
#
#101鲸郊、102端查看
#
shell> ip a
  • Slave 查看
    可見102已被設(shè)置為新主,103的主從關(guān)系修改為102為主
#
# Slave 103 端
#
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.20.102
                  Master_User: rpl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin.000002
          Read_Master_Log_Pos: 194
               Relay_Log_File: relay.000004
                Relay_Log_Pos: 355
        Relay_Master_Log_File: bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

查看切換后masterha_manager的配置

現(xiàn)在由于 masterha_manager 配置了--remove_dead_master_conf的參數(shù)货邓,在 Failover完成后 秆撮,自動將 Dead Master 從配置文件中去除,這樣可以確保 重新啟動masterha_manager 時换况,不會把 Dead Master 加入到MySQL集群中去像吻,從而引發(fā)錯誤。

[root@initnode4 app1]# more /etc/masterha/app1.conf 
[server default]
log_level=debug
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/data/mysql_data/
master_ip_failover_script=/usr/bin/master_ip_failover
master_ip_online_change_script=/usr/bin/master_ip_online_change
password=1122333
ping_interval=3
ping_type=INSERT
remote_workdir=/tmp
repl_password=123456
repl_user=rpl
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.20.102 -s 192.168.20.103 --user=root --master_host=192.168.20.101 --master_port=3306
ssh_user=root
user=root

# ---------------- 之前配置的 [server1] 被移除了 -----------

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.20.102
port=3306

[server3]
hostname=192.168.20.103
no_master=1
port=3306

注意事項

Failover后MHA_Manager自動退出

當自動Failover完成后复隆,MHA Manager服務(wù)器上的 masterha_manager 進程自動退出拨匆。
可以使用 supervisor 來進行托管(python程序員用的很多),該軟件的主要作用就是托管程序挽拂,并且可以簡單的檢測程序狀態(tài)惭每,發(fā)現(xiàn)停止后可以自動重啟運行;

注意:在MHA重啟后亏栈, 日志部分會被自動回滾覆蓋 台腥,MHA日志中的binlog的filename和pos、以及GTID信息也會消失绒北。所以是否自動重啟還是需要看DBA和業(yè)務(wù)需要黎侈。
如果 masterha_manager可以自動重啟 后,那一定要開啟 --remove_dead_master_conf 參數(shù)闷游。因為既然切換(Failover)了(masterha_manager默認自動退出)峻汉,說明配置文件列表中有server是宕機了,需要踢出MySQL集群脐往,那如果此時沒有配置該參數(shù)休吠,當masterha_manager自動重啟時, 仍能讀取到宕機的server的配置 业簿,就會啟動失敗瘤礁,然后一直循環(huán)啟動。

# 安裝
Shell> yum install supervisor

? 配置文件/etc/supervisord.conf
[unix_http_server]
file=/var/run/supervisor.sock ; (the path to the socket file)
[supervisord]
logfile=/var/log/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10 ; (num of main logfile rotation backups;default 10)
loglevel=info ; (log level;default info; others: debug,warn,trace)
pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false ; (start in foreground if true;default false)
minfds=1024 ; (min. avail startup file descriptors;default 1024)
minprocs=200 ; (min. avail process descriptors;default 200)
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL for a unix socket
[program:mha_manager]
command=nohup /usr/local/bin/masterha_manager --conf=/etc/masterha/app1.conf --remove_dead_master_conf --ignore_last_failover
process_name=%(program_name)s
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/masterha/app1/manager.log
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=5
user=root

#啟動supervisord
Shell> systemctl start supervisord.service

# 開啟自啟動
Shell> systemctl enabled supervisord.service

# 測試梅尤,kill掉MHA進程觀察是否被拉起

手動Failover測試

在使用手動Failover的時候柜思,masterha_manager(在線Failover功能)需要關(guān)閉粟耻,這個也是MHA留給我們自己決定的地方惠拭,線上使用可以選擇使用自動切換,也可以通過報警后,處理問題大猛,然后決定是否手工切換误窖。

注意事項
1. 如果你的 app1.conf 中配置的 hostname 為 Master阿趁、Slave1 等僵芹,則上述命令中的 --dead_master_host 和 --new_master_host 也需要寫成 Master、Slave1 霉咨,而 不能寫成IP地址
2. 如果你的 app1.conf 中配置的 hostname 為IP地址蛙紫,則 --dead_master_host 和 --new_master_host 也要配置成IP地址
3.在進行手動Failover的時候,請確保Master上的MySQL掛了,否則會提示服務(wù)沒掛

#
# MHA Manager 端
#
Shell> masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.conf --dead_master_host=192.168.20.101 --dead_master_port=3306 --new_master_host=192.168.20.103 --new_master_port=3306 --ignore_last_failover

... 省略 ...
# 會問你倆個問題途戒,都回答 yes
Master 192.168.20.101(192.168.20.101:3306) is dead. Proceed? (yes/NO): yes
Starting master switch from 192.168.20.101(192.168.20.101:3306) to 192.168.20.103(192.168.20.103:3306)? (yes/NO): yes
... 省略 ...
192.168.20.103(192.168.20.103:3306): Resetting slave info succeeded.
Master failover to 192.168.20.103(192.168.20.103:3306) completed successfully.

MHA總結(jié)

部署操作

1. 配置好MySQL 復(fù)制關(guān)系 坑傅,至少三個節(jié)點,且確保 rpl用戶 傳遞到Slave上喷斋;
2. 在 MHA Manager 上安裝 Manager 組件唁毒;
3. 在 所有節(jié)點 上安裝 Node 組件;
4. 配置好 /etc/masterha/app1.conf 星爪;
5. 驗證主從配置masterha_check_repl --conf=/etc/masterha/app1.conf浆西;
6. 啟動 masterha_manager (或supervisord托管);
7. 確認 masterha_check_status 狀態(tài)顽腾;

Old_Master恢復(fù)

當切換完成后近零, 假如Old Master修復(fù)完成 ,這時需要對比一下數(shù)據(jù)是否一致抄肖;如果主從不一致久信,即Old Master 宕機時刻的binlog 沒有傳到Slave, 且MHA也無法獲取到這部分binlog 漓摩,需要通過 Flashback 工具將這部分數(shù)據(jù)切除裙士;

  • Old Master 的binlog信息可以通過 show master status\G 看到,或者通過 mysqlbinlog 進行查看管毙;
  • New Master 上查看執(zhí)行到的 Old Master 復(fù)制過來的binlog信息可以通過 show global variables like "%gtid%"; 來進行查看(mysql會保留之前執(zhí)行過的GTID信息)腿椎;
    • 在沒有使用MHA,或者使用MHA手動Failover的時候锅风,可以通過在Slave上執(zhí)行 show slave status\G 通過 Exec_Master_Log_Pos 觀察到執(zhí)行到的位置(等待回放完畢)
    • 而使用MHA切換后酥诽, show slave status\G 的信息會被MHA給reset掉(除非自己給MHA打補丁,將信息記錄下來);

一般情況下皱埠,只需要將掛掉的機器啟動后,查看下gtid值咖驮,直接change master到新主就能追平數(shù)據(jù)

mysql> change master to master_host='192.168.20.102',master_user='rpl',master_password='123456',master_port=3306,master_auto_position=1;

# 啟動復(fù)制
start slave;
# 查看狀態(tài)
show slave status\G
image.png

如果中間產(chǎn)生問題边器,和之前處理GTID復(fù)制出錯一樣训枢, 跳過執(zhí)行的部分 即可(注意,需要知道自己跳過的是什么忘巧,小數(shù)據(jù)量若出現(xiàn)無法正常復(fù)制建議重建從庫):

mysql> stop slave;
mysql> reset master;
mysql> set @@global.gtid_purged='22302bb1-c8bb-11e9-8669-00505620a3cf:1-308';
mysql> change master to master_host='192.168.20.102',master_user='rpl',master_password='123456',master_port=3306,master_auto_position=1;
mysql> start slave;

MHA進程恢復(fù)

在發(fā)生切換后恒界,MHA默認會退出進程,需要配置后再次啟動

  • 在從庫[原掛掉的主庫]配置上去后砚嘴,修改MHA配置文件
# MHA 104上操作
vi /etc/masterha/app1.conf
image.png

因為我們已經(jīng)將原來的主庫作為從庫掛載上去了十酣,而MySQL主機已更換,因此需要修改配置文件际长,假設(shè)主現(xiàn)在切換為 192.168.20.102 了耸采,修改如下:

[server default]
log_level=debug
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/data/mysql_data/
master_ip_failover_script=/usr/bin/master_ip_failover
master_ip_online_change_script=/usr/bin/master_ip_online_change
password=1122333
ping_interval=3
ping_type=INSERT
remote_workdir=/tmp
repl_password=123456
repl_user=rpl

### 這邊需要配置為倆個從庫地址-s 192.168.20.101 -s 192.168.20.10
### master_host 需要更改為新主地址
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.20.101 -s 192.168.20.103 --user=root --master_host=192.168.20.102 --master_port=3306
ssh_user=root
user=manager

### 添加 [server1]
[server1]
candidate_master=1
hostname=192.168.20.101
port=3306

[server2]
candidate_master=1
hostname=192.168.20.102
port=3306

[server3]
candidate_master=1
check_repl_delay=0
hostname=192.168.20.103
port=3306
  • MHA配置健康監(jiān)測
masterha_check_ssh --conf=/etc/masterha/app1.conf
masterha_check_repl --conf=/etc/masterha/app1.conf
  • 啟動MHA
nohup masterha_manager --conf=/etc/masterha/app1.conf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
image.png
  • 查看狀態(tài)
masterha_check_status --conf=/etc/masterha/app1.conf
image.png

自動恢復(fù)MHA配置與自動掛載已掛從庫

有個奇葩需求,客戶那邊的機器會被插拔網(wǎng)線工育,又需要保證高可用虾宇,而MHA特性是切換完成即退出監(jiān)控進程。

  • MHA配置文件修改如绸,若切換了則啟動修改為雙機切換
  • 修復(fù)被踢出的原主
  • 比對GTID嘱朽,掛載原主為從庫
    • 原主GTID = 新主 GTID ,直接原主change過去
    • 原主GTID > 新主 GTID 記錄,修正原主GTID后掛載過去
  • 修改MHA配置文件
  • 檢測SSH狀態(tài)
  • 檢測主從狀態(tài)
  • 啟動MHA進程
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末怔接,一起剝皮案震驚了整個濱河市搪泳,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌扼脐,老刑警劉巖森书,帶你破解...
    沈念sama閱讀 206,968評論 6 482
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異谎势,居然都是意外死亡凛膏,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,601評論 2 382
  • 文/潘曉璐 我一進店門脏榆,熙熙樓的掌柜王于貴愁眉苦臉地迎上來猖毫,“玉大人,你說我怎么就攤上這事须喂∮醵希” “怎么了?”我有些...
    開封第一講書人閱讀 153,220評論 0 344
  • 文/不壞的土叔 我叫張陵坞生,是天一觀的道長仔役。 經(jīng)常有香客問我,道長是己,這世上最難降的妖魔是什么又兵? 我笑而不...
    開封第一講書人閱讀 55,416評論 1 279
  • 正文 為了忘掉前任,我火速辦了婚禮,結(jié)果婚禮上沛厨,老公的妹妹穿的比我還像新娘宙地。我一直安慰自己,他們只是感情好逆皮,可當我...
    茶點故事閱讀 64,425評論 5 374
  • 文/花漫 我一把揭開白布宅粥。 她就那樣靜靜地躺著,像睡著了一般电谣。 火紅的嫁衣襯著肌膚如雪秽梅。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,144評論 1 285
  • 那天剿牺,我揣著相機與錄音企垦,去河邊找鬼。 笑死牢贸,一個胖子當著我的面吹牛竹观,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播潜索,決...
    沈念sama閱讀 38,432評論 3 401
  • 文/蒼蘭香墨 我猛地睜開眼臭增,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了竹习?” 一聲冷哼從身側(cè)響起誊抛,我...
    開封第一講書人閱讀 37,088評論 0 261
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎整陌,沒想到半個月后拗窃,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,586評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡泌辫,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,028評論 2 325
  • 正文 我和宋清朗相戀三年随夸,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片震放。...
    茶點故事閱讀 38,137評論 1 334
  • 序言:一個原本活蹦亂跳的男人離奇死亡宾毒,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出殿遂,到底是詐尸還是另有隱情诈铛,我是刑警寧澤,帶...
    沈念sama閱讀 33,783評論 4 324
  • 正文 年R本政府宣布墨礁,位于F島的核電站幢竹,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏恩静。R本人自食惡果不足惜焕毫,卻給世界環(huán)境...
    茶點故事閱讀 39,343評論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧咬荷,春花似錦冠句、人聲如沸轻掩。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,333評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽唇牧。三九已至罕扎,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間丐重,已是汗流浹背腔召。 一陣腳步聲響...
    開封第一講書人閱讀 31,559評論 1 262
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留扮惦,地道東北人臀蛛。 一個月前我還...
    沈念sama閱讀 45,595評論 2 355
  • 正文 我出身青樓,卻偏偏與公主長得像崖蜜,于是被迫代替她去往敵國和親浊仆。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 42,901評論 2 345

推薦閱讀更多精彩內(nèi)容