一、實(shí)現(xiàn)原理
使用 pgpool-II 軟件蒋搜;我們常用來實(shí)現(xiàn)流復(fù)制的高可用性相速;備庫只讀的埋凯,不可寫器腋;就是當(dāng)主庫出現(xiàn)問題時(shí);需要把備庫自動激活為主庫妖滔;來接管服務(wù)俭尖。
這在其他高可用軟件也有這功能氢惋,而 pgpool-II 在配置文件 pgpool.conf 中提供配置項(xiàng) failover_command 。讓用戶配置一個(gè)腳本,當(dāng)發(fā)生故障切換時(shí)明肮,執(zhí)行該腳本菱农。
二、示例演練
本示例采用 PostgreSQL12 + pgpool-II4柿估。
演練目的:
- 搭建 pgpool 集群
- 測試數(shù)據(jù)庫的高可用性
- 修復(fù) primary 節(jié)點(diǎn)重新加入集群
2.1循未、環(huán)境規(guī)劃
1、PostgreSQL庫的IP/Port規(guī)劃
|主機(jī)名|角色|ip|端口|數(shù)據(jù)目錄|
|:----:|:----|:----:|:----|:----:|:----|:----:|:----|:----:|:----|
|node3|pgpool|192.168.1.221|9999| |
|node3|primary|192.168.1.221|6000|/data1/postgres/data|
|node4|standby|192.168.1.202|6000|/data1/postgres/data|
2秫舌、數(shù)據(jù)庫用戶規(guī)劃
|用戶|密碼|用途詳情|
|:----:|:----|:----:|:----|:----:|:----|
|postgres|123456|用于在線恢復(fù)|
|replica|replica|流復(fù)制用戶|
|pgpool|123456|Pgpool-II health check (health_check_user) replication delay check (sr_check_user)|
2.2的妖、數(shù)據(jù)庫環(huán)境準(zhǔn)備
1、安裝 PostgreSQL 軟件
N/A
2足陨、安裝 pgpool-II 軟件
查看 《 pgpool-II安裝 》
本示例涉及到在線恢復(fù)嫂粟;需要安裝 pgpool_recovery
-- 在 primary 操作
psql -c "create extension pgpool_recovery" template1
3、配置 PostgreSQL 數(shù)據(jù)庫
primary 節(jié)點(diǎn)操作
創(chuàng)建數(shù)據(jù)庫用戶
alter user postgres password '123456';
CREATE ROLE pgpool WITH LOGIN password '123456';;
CREATE ROLE replica WITH REPLICATION LOGIN password 'replica';
--If you want to show "replication_state" and "replication_sync_state" column in SHOW POOL NODES command result, role pgpool needs to be PostgreSQL super user or or in pg_monitor group (Pgpool-II 4.1 or later)
GRANT pg_monitor TO pgpool;
配置歸檔
搭建流復(fù)制是不需要配置歸檔墨缘;但是在線恢復(fù)需要?dú)w檔日志星虹。
$ mkdir /data1/archivedir
$ vi postgresql.conf
archive_mode = on
archive_command = 'cp %p /data1/archivedir/%f'
wal_log_hints = on
4、搭建流復(fù)制
--在 standby 操作
# 用 root 操作系統(tǒng)用戶在202創(chuàng)建PostgreSQL工作目錄
mkdir -p /data1/postgres/data
chown -R postgres:postgres /data1/postgres/data
chmod 700 /data1/postgres/data
# 用 postgres 操作系統(tǒng)用戶執(zhí)行 pg_basebackup 命令镊讼;進(jìn)行備庫拷貝
pg_basebackup -F p -R --progress -D /data1/postgres/data -h 192.168.1.221 -p 6000 -U replica
# 用 postgres 操作系統(tǒng)戶用啟動備庫
pg_ctl start
5宽涌、配置 ssh 互信
在上面講到的 實(shí)現(xiàn)原理,使用Pgpool-II的自動故障轉(zhuǎn)移和在線恢復(fù)蝶棋;需要 pgpool 服務(wù)免密碼在各個(gè)機(jī)器上執(zhí)行卸亮;以及后續(xù)在在線恢復(fù)功能;這里我們使用 postgres 操作用戶玩裙。
-- 在pgpool節(jié)點(diǎn)執(zhí)行
$ cd ~/.ssh
$ ssh-keygen -t rsa -f id_rsa_pgpool
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@node3
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@node4
-- 驗(yàn)證免密碼登錄
ssh postgres@serverX -i ~/.ssh/id_rsa_pgpool
6兼贸、配置 pgpool
可以查考 《 pgpool 配置 》;這里我們是用 postgres 操作用戶進(jìn)行安裝
配置環(huán)境變量
export PGHOME=/opt/pg12
export PGDATA=/data1/postgres/data
export PGPOOLHOME=/opt/pgpool
export PATH=$PGHOME/bin:$PATH:$HOME/bin:$PGPOOLHOME/bin
1吃溅、設(shè)置 pcp 的管理用戶/密碼文件 pcp.conf
“pcpadm/pgpool123”
#1 進(jìn)入配置目錄
[postgres@node3 ~]$ cd /opt/pgpool/etc
[postgres@node3 etc]$ cp pcp.conf.sample pcp.conf
# 在該文件中溶诞;用戶/密碼出現(xiàn)在每一行; # USERID:MD5PASSWD
#2 pg_md5 生成配置的用戶名密碼是 pgpool123
[postgres@node3 etc]$ pg_md5 pgpool123
fa039bd52c3b2090d86b0904021a5e33
#3 編輯pcp.conf;這里配置用戶是 pcpadm罕偎,
[postgres@node3 etc]$ vi pcp.conf
# USERID:MD5PASSWD
pcpadm:fa039bd52c3b2090d86b0904021a5e33
2很澄、配置 pool_hba.conf
用于認(rèn)證用戶登錄方式,如客戶端IP限制等颜及,類似于postgresql的pg_hba.conf文件
[postgres@node3 ~]$ cd /opt/pgpool/etc/
[postgres@node3 etc]$ vi pool_hba.conf
# 添加下面內(nèi)容
host all all 0.0.0.0/0 md5
3、生成 pool_passwd
pgpool 密鑰文件蹂楣;通過 pgpool 訪問需要用戶驗(yàn)證俏站;
這里暫用數(shù)據(jù)庫用戶 pgpool
[postgres@node3 ~]$ cd /opt/pgpool/etc/
[postgres@node3 etc]$ pg_md5 --md5auth -u pgpool -p
password:
[postgres@node3 etc]$ ll pool_passwd
-rw-r--r--. 1 postgres postgres 132 Nov 30 10:43 pool_passwd
4、配置.pgpass
使用pgpool-II進(jìn)行故障庫自動切換(failover)痊土、或在線恢復(fù)(online recovery)(在線恢復(fù):主庫故障后切換肄扎,原主庫恢復(fù)后變更為備庫。注意是 Online recovery,而不是自動恢復(fù)犯祠,需要手工執(zhí)行命令恢復(fù))旭等,需要能夠無密碼 SSH 訪問其他 PostgreSQL 服務(wù)器。為了滿足此條件衡载,我們需要在每個(gè) PostgreSQL 服務(wù)器上搔耕,在 postgres 用戶的 home file下創(chuàng)建了.pgpass 文件,并修改器文件權(quán)限為600
# su - postgres
$ vi /var/lib/pgsql/.pgpass
server1:5432:replication:repl:<repl user password>
server2:5432:replication:repl:<repl user passowrd>
server3:5432:replication:repl:<repl user passowrd>
$ chmod 600 /var/lib/pgsql/.pgpass
若設(shè)置 pg_hba.conf 的該網(wǎng)段免密碼驗(yàn)證 trust痰娱;可以忽略該步驟
host replication replica 192.168.1.0/24 trust
5弃榨、配置 pcp 的 .pcppass
需要 follow_master_command 腳本情況下,由于此腳本必須在不輸入密碼的情況下執(zhí)行pcp命令梨睁,所以我們在 postgres 用戶的home directory下創(chuàng)建.pcppass
# echo 'localhost:9898:pgpool:pgpool' > ~/.pcppass
# chmod 600 ~/.pcppass
6鲸睛、配置pgpool.conf
listen_addresses = '*'
port = 9999
backend_hostname0 = '192.168.1.221'
backend_port0 = 6000
backend_weight0 = 1
backend_data_directory0 = '/data1/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'server0'
backend_hostname1 = '192.168.1.202'
backend_port1 = 6000
backend_weight1 = 1
backend_data_directory1 = '/data1/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'server1'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
pid_file_name = '/opt/pgpool/pgpool.pid'
logdir = '/opt/pgpool'
replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 10
sr_check_user = 'pgpool'
sr_check_password = '123456'
sr_check_database = 'postgres'
delay_threshold = 10000000
health_check_period = 5
health_check_user = 'pgpool'
health_check_password = '123456'
health_check_database = 'postgres'
health_check_max_retries = 3
failover_command = '/opt/pgpool/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
# If we use 3 PostgreSQL servers, we need to specify follow_primary_command to run after failover on the primary node failover.
# In case of two PostgreSQL servers, follow_primary_command setting is not necessary
# follow_primary_command = '/opt/pgpool/follow_primary.sh %d %h %p %D %m %H %M %P %r %R'
# online recovery
recovery_user = 'postgres'
recovery_password = '123456'
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
7、配置 failover_command 腳本
[postgres@node3 ~]$ cd $PGPOOLHOME
[postgres@node3 pgpool]$ cp etc/failover.sh.sample failover.sh
[postgres@node3 pgpool]$ vi failover.sh
修改變量 PGHOME
[postgres@node3 pgpool]$ chmod +x failover.sh
2.3坡贺、啟動 pgpool
[postgres@node3 ~]$ pgpool -n > /tmp/pgpool.log &
[postgres@node3 ~]$ psql -p 9999 postgres pgpool
2020-12-01 14:50:09: pid 2422: LOG: new connection received
2020-12-01 14:50:09: pid 2422: DETAIL: connecting host=[local]
psql (12.2)
Type "help" for help.
postgres=> show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_syn
c_state | last_status_change
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+----------------
--------+---------------------
0 | 192.168.1.221 | 6000 | up | 0.500000 | primary | 0 | false | 0 | |
| 2020-12-01 14:38:09
1 | 192.168.1.202 | 6000 | up | 0.500000 | standby | 0 | true | 0 | |
| 2020-12-01 14:38:09
(2 rows)
2.4官辈、測試高可用性
1、備份自動激活為主庫
我們先把主庫停掉遍坟,看看備庫是否可以激活為主庫钧萍;
[postgres@node3 ~]$ pg_ctl stop
waiting for server to shut down..... done
server stopped
# 再次查看節(jié)點(diǎn)信息
[postgres@node3 ~]$ psql -p 9999 postgres pgpool
2020-12-01 14:53:57: pid 2591: LOG: new connection received
2020-12-01 14:53:57: pid 2591: DETAIL: connecting host=[local]
psql (12.2)
Type "help" for help.
postgres=> show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_syn
c_state | last_status_change
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+----------------
--------+---------------------
0 | 192.168.1.221 | 6000 | down | 0.500000 | standby | 0 | false | 0 | |
| 2020-12-01 14:53:07
1 | 192.168.1.202 | 6000 | up | 0.500000 | primary | 0 | true | 0 | |
| 2020-12-01 14:53:07
(2 rows)
測試結(jié)果: 備庫成功激活為新主庫
從上面的查詢結(jié)果可以看到 “node_id=1”的 role 變成了 “primary”
2、原主庫重加回集群
現(xiàn)在我們把原主庫加回集群政鼠,變成備庫风瘦。后面再演示 online recovery。先手動執(zhí)行
1公般、同步時(shí)間線
202 備庫提升為新主庫万搔;其時(shí)間線 +1;與 221 不同步官帘;這是需要使用pg_rewind同步數(shù)據(jù)
[postgres@node3 ~]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.1.202 port=6000 user=postgres dbname=postgres password=123456'
pg_rewind: servers diverged at WAL location 0/18000000 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/17000148 on timeline 1
pg_rewind: Done!
2瞬雹、配置 postgresql.conf
# 192.168.1.221
$ cd $PGDATA
$ touch standby.signal
$ vi postgresql.conf
primary_conninfo = 'host=192.168.1.202 port=6000 user=replica'
3蚊惯、啟動 postgresql
[postgres@node3 ~]$ pg_ctl start
后續(xù)講解online recovery昌讲。