PostgreSQL-Patroni高可用啟動(dòng)報(bào)錯(cuò)

環(huán)境說明

一主兩從:

角色 主機(jī) 組件
pa-pg-1 172.22.138.220 etcd趣避、patroni睹簇、postgresql
pa-pg-2 172.22.138.219 etcd、patroni、postgresql
pa-pg-3 172.22.138.218 etcd、patroni创肥、postgresql

安裝部署好Patroni高可用后,主庫pa-pg-1和從庫pa-pg-3均啟動(dòng)正常值朋,而pa-pg-2的patroni日志一直輸出以下報(bào)錯(cuò):

...
Aug 15 10:09:56 DB-test2 patroni: 2024-08-15 11:09:56,419 INFO: Lock owner: pa-pg-1; I am pa-pg-2
Aug 15 10:09:56 DB-test2 patroni: 2024-08-15 11:09:56,421 INFO: Local timeline=3 lsn=0/DB000110
Aug 15 10:09:56 DB-test2 patroni: 2024-08-15 11:09:56,422 ERROR: Exception when working with leader
Aug 15 10:09:56 DB-test2 patroni: Traceback (most recent call last):
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/rewind.py", line 80, in check_leader_is_not_in_recovery
Aug 15 10:09:56 DB-test2 patroni: with get_connection_cursor(connect_timeout=3, options='-c statement_timeout=2000', **conn_kwargs) as cur:
Aug 15 10:09:56 DB-test2 patroni: File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
Aug 15 10:09:56 DB-test2 patroni: return next(self.gen)
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 157, in get_connection_cursor
Aug 15 10:09:56 DB-test2 patroni: conn = psycopg.connect(**kwargs)
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/psycopg.py", line 104, in connect
Aug 15 10:09:56 DB-test2 patroni: ret = _connect(*args, **kwargs)
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib64/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
Aug 15 10:09:56 DB-test2 patroni: File "/usr/local/lib64/python3.6/site-packages/psycopg2/extensions.py", line 175, in make_dsn
Aug 15 10:09:56 DB-test2 patroni: psycopg2.ProgrammingError: invalid dsn: invalid connection option "target_session_attrs"
Aug 15 10:09:56 DB-test2 patroni: 2024-08-15 11:09:56,425 INFO: no action. I am (pa-pg-2), a secondary, and following a leader (pa-pg-1)
...

排查思路

1.首先一直關(guān)注的報(bào)錯(cuò)是"ERROR: Exception when working with leader"所以按照這個(gè)報(bào)錯(cuò)背蟆,首先去排查了patroni各節(jié)點(diǎn)的狀態(tài)难咕,但是通過節(jié)點(diǎn)狀態(tài)并沒有發(fā)現(xiàn)什么明顯異常窑业。

postgres # patronictl -c /etc/patroni.yml list
+ Cluster: pgsql (7400977424401590005) ----------+----+-----------+-----------------+-----------------------------+
| Member  | Host           | Role    | State     | TL | Lag in MB | Pending restart | Pending restart reason      |
+---------+----------------+---------+-----------+----+-----------+-----------------+-----------------------------+
| pa-pg-1 | 172.22.138.220 | Leader  | running   |  3 |           |                 |                             |
| pa-pg-2 | 172.22.138.219 | Replica | streaming |  3 |         0 |                 |                             |
| pa-pg-3 | 172.22.138.218 | Replica | streaming |  3 |         0 | *               | max_connections: 10000->100 |
+---------+----------------+---------+-----------+----+-----------+-----------------+-----------------------------+

2.排查主庫的pg_stat_replication和從庫的pg_wal_stat_reciver恋沃,檢查后未見明顯異常

postgres=# select * from pg_stat_replication;
+-------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------+
|  pid  | usesysid | usename | application_name |  client_addr   | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           |
+-------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------+
| 12368 |    16384 | replica | pa-pg-2          | 172.22.138.219 |                 |       47170 | 2024-08-15 10:23:36.860733+08 |              | streaming | 0/DC000000 | 0/DC000000 | 0/DC000000 | 0/DC000000 |           |           |            |             0 | async      | 2024-08-15 11:10:18.855529+08 |
| 11101 |    16384 | replica | pa-pg-3          | 172.22.138.218 |                 |       55800 | 2024-08-15 09:57:31.073827+08 |              | streaming | 0/DC000000 | 0/DC000000 | 0/DC000000 | 0/DC000000 |           |           |            |             0 | async      | 2024-08-15 11:10:19.992588+08 |
+-------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------+
...
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 18720
status                | streaming
receive_start_lsn     | 0/DB000000
receive_start_tli     | 3
written_lsn           | 0/DC000000
flushed_lsn           | 0/DC000000
received_tli          | 3
last_msg_send_time    | 2024-08-15 11:11:08.623733+08
last_msg_receipt_time | 2024-08-15 11:11:08.623328+08
latest_end_lsn        | 0/DC000000
latest_end_time       | 2024-08-15 11:09:52.14472+08
slot_name             | pa_pg_2
sender_host           | 172.22.138.220
sender_port           | 5432
conninfo              | user=replica passfile=/home/postgres/pgpass channel_binding=prefer dbname=replication host=172.22.138.220 port=5432 application_name=pa-pg-2 fallback_application_name=pgsql sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any

3.查看postgresql的server日志搪锣,更是一點(diǎn)信息都沒有休建。

2024-08-15 11:07:18 CST--- :LOG:  redo starts at 0/DA0025B8
2024-08-15 11:07:18 CST--- :LOG:  consistent recovery state reached at 0/DB000110
2024-08-15 11:07:18 CST--- :LOG:  invalid record length at 0/DB000110: wanted 24, got 0
2024-08-15 11:07:18 CST--- :LOG:  database system is ready to accept read-only connections
2024-08-15 11:07:18 CST--- :LOG:  started streaming WAL from primary at 0/DB000000 on timeline 3
2024-08-15 11:10:06 CST--- :LOG:  received fast shutdown request
2024-08-15 11:10:06 CST--- :LOG:  aborting any active transactions
2024-08-15 11:10:06 CST-postgres-127.0.0.1-Patroni heartbeat :FATAL:  terminating connection due to administrator command
2024-08-15 11:10:06 CST--- :FATAL:  terminating walreceiver process due to administrator command
2024-08-15 11:10:06 CST--- :LOG:  shutting down
2024-08-15 11:10:06 CST--- :LOG:  database system is shut down
2024-08-15 11:10:10 CST--- :LOG:  starting PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2024-08-15 11:10:10 CST--- :LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-15 11:10:10 CST--- :LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-08-15 11:10:10 CST--- :LOG:  database system was shut down in recovery at 2024-08-15 11:10:06 CST
2024-08-15 11:10:10 CST--- :LOG:  entering standby mode
2024-08-15 11:10:10 CST--- :LOG:  redo starts at 0/DA0025B8
2024-08-15 11:10:10 CST--- :LOG:  consistent recovery state reached at 0/DB000110
2024-08-15 11:10:10 CST--- :LOG:  invalid record length at 0/DB000110: wanted 24, got 0
2024-08-15 11:10:10 CST--- :LOG:  database system is ready to accept read-only connections
2024-08-15 11:10:10 CST--- :LOG:  started streaming WAL from primary at 0/DB000000 on timeline 3

4.隨后尋求外援盛杰,通過官方github的issues翼闽,有類似報(bào)錯(cuò)禽捆,建議為復(fù)制用戶replica添加postgres庫的connect權(quán)限,司馬當(dāng)活馬醫(yī)了笙什。去添加了相關(guān)的權(quán)限。

https://github.com/patroni/patroni/issues/2256

postgres=# grant CONNECT on database postgres to replica;
GRANT

5.重啟pa-pg-2服務(wù)胚想,查看patroni的日志發(fā)現(xiàn)琐凭,報(bào)錯(cuò)是一點(diǎn)沒變。此時(shí)把報(bào)錯(cuò)的重點(diǎn)放在了這條上面"psycopg2.ProgrammingError: invalid dsn: invalid connection option "target_session_attrs""

Aug 15 11:09:56 DB-test2 patroni: 2024-08-15 11:09:56,422 ERROR: Exception when working with leader
Aug 15 11:09:56 DB-test2 patroni: Traceback (most recent call last):
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/rewind.py", line 80, in check_leader_is_not_in_recovery
Aug 15 11:09:56 DB-test2 patroni: with get_connection_cursor(connect_timeout=3, options='-c statement_timeout=2000', **conn_kwargs) as cur:
Aug 15 11:09:56 DB-test2 patroni: File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
Aug 15 11:09:56 DB-test2 patroni: return next(self.gen)
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 157, in get_connection_cursor
Aug 15 11:09:56 DB-test2 patroni: conn = psycopg.connect(**kwargs)
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib/python3.6/site-packages/patroni/psycopg.py", line 104, in connect
Aug 15 11:09:56 DB-test2 patroni: ret = _connect(*args, **kwargs)
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib64/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
Aug 15 11:09:56 DB-test2 patroni: File "/usr/local/lib64/python3.6/site-packages/psycopg2/extensions.py", line 175, in make_dsn
Aug 15 11:09:56 DB-test2 patroni: psycopg2.ProgrammingError: invalid dsn: invalid connection option "target_session_attrs"
Aug 15 11:09:56 DB-test2 patroni: 2024-08-15 11:09:56,425 INFO: no action. I am (pa-pg-2), a secondary, and following a leader (pa-pg-1)

6.通過官方github的issues浊服,建議將psycopg2模塊的版本修改為psycopg2==2.8.6统屈,但是我的機(jī)服務(wù)器上胚吁,沒有這個(gè)模塊,正常啟動(dòng)的兩臺(tái)也沒有這個(gè)模塊愁憔。但是當(dāng)我檢查這個(gè)相關(guān)模塊時(shí)發(fā)現(xiàn)了問題腕扶。

https://github.com/patroni/patroni/issues/1969

#pa-pg-1和pa-pg-3
root # pip3 list|grep psycopg2
psycopg2-binary    2.9.8

#pa-pg-2
psycopg2-binary    2.9.5

7.通過上方發(fā)現(xiàn)patroni的所需要的模塊psycopg2-binary,三臺(tái)節(jié)點(diǎn)唯有啟動(dòng)報(bào)錯(cuò)的pa-pg-2版本與其他兩臺(tái)不一致吨掌。所以決定在pa-pg-2節(jié)點(diǎn)重裝這個(gè)模塊半抱。重裝后pa-pg-2的patroni日志不再輸出報(bào)錯(cuò)。

root # pip3 uninstall psycopg2-binary
root # pip3 install psycopg2-binary==2.9.8
...
Aug 15 11:23:14 DB-test2 systemd: Started patroni - a high-availability PostgreSQL.
Aug 15 11:23:14 DB-test2 systemd-logind: New session 886 of user root.
Aug 15 11:23:14 DB-test2 systemd: Started Session 886 of user root.
Aug 15 11:23:14 DB-test2 systemd-logind: Removed session 886.
Aug 15 11:23:15 DB-test2 patroni: postgresql parameter listen_addresses=0.0.0.0 failed validation, defaulting to None
Aug 15 11:23:15 DB-test2 patroni: postgresql parameter port=5432 failed validation, defaulting to None
Aug 15 11:23:15 DB-test2 patroni: 2024-08-15 11:23:15,225 INFO: No PostgreSQL configuration items changed, nothing to reload.
Aug 15 11:23:15 DB-test2 patroni: localhost:5432 - accepting connections
Aug 15 11:23:15 DB-test2 patroni: 2024-08-15 11:23:15,233 INFO: establishing a new patroni heartbeat connection to postgres
Aug 15 11:23:15 DB-test2 patroni: 2024-08-15 11:23:15,254 INFO: no action. I am (pa-pg-2), a secondary, and following a leader (pa-pg-1)
Aug 15 11:23:16 DB-test2 systemd-logind: New session 887 of user root.
Aug 15 11:23:16 DB-test2 systemd: Started Session 887 of user root.
Aug 15 11:23:16 DB-test2 systemd-logind: Removed session 887.
Aug 15 11:23:16 DB-test2 patroni: 2024-08-15 11:23:16,423 INFO: no action. I am (pa-pg-2), a secondary, and following a leader (pa-pg-1)

總結(jié)

在進(jìn)行集群相關(guān)的應(yīng)用部署時(shí)膜宋,只是檢查好相關(guān)模塊安裝成功與否還不夠窿侈,還需要更細(xì)致的檢查相關(guān)的版本號(hào)是否一致。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末激蹲,一起剝皮案震驚了整個(gè)濱河市棉磨,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌学辱,老刑警劉巖乘瓤,帶你破解...
    沈念sama閱讀 219,366評(píng)論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異策泣,居然都是意外死亡衙傀,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,521評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門萨咕,熙熙樓的掌柜王于貴愁眉苦臉地迎上來统抬,“玉大人,你說我怎么就攤上這事危队〈辖ǎ” “怎么了?”我有些...
    開封第一講書人閱讀 165,689評(píng)論 0 356
  • 文/不壞的土叔 我叫張陵茫陆,是天一觀的道長(zhǎng)金麸。 經(jīng)常有香客問我,道長(zhǎng)簿盅,這世上最難降的妖魔是什么挥下? 我笑而不...
    開封第一講書人閱讀 58,925評(píng)論 1 295
  • 正文 為了忘掉前任,我火速辦了婚禮桨醋,結(jié)果婚禮上棚瘟,老公的妹妹穿的比我還像新娘。我一直安慰自己喜最,他們只是感情好偎蘸,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,942評(píng)論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般禀苦。 火紅的嫁衣襯著肌膚如雪蔓肯。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,727評(píng)論 1 305
  • 那天振乏,我揣著相機(jī)與錄音蔗包,去河邊找鬼。 笑死慧邮,一個(gè)胖子當(dāng)著我的面吹牛调限,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播误澳,決...
    沈念sama閱讀 40,447評(píng)論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼耻矮,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來了忆谓?” 一聲冷哼從身側(cè)響起裆装,我...
    開封第一講書人閱讀 39,349評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎倡缠,沒想到半個(gè)月后哨免,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,820評(píng)論 1 317
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡昙沦,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,990評(píng)論 3 337
  • 正文 我和宋清朗相戀三年琢唾,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片盾饮。...
    茶點(diǎn)故事閱讀 40,127評(píng)論 1 351
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡采桃,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出丘损,到底是詐尸還是另有隱情普办,我是刑警寧澤,帶...
    沈念sama閱讀 35,812評(píng)論 5 346
  • 正文 年R本政府宣布徘钥,位于F島的核電站泌豆,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏吏饿。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,471評(píng)論 3 331
  • 文/蒙蒙 一蔬浙、第九天 我趴在偏房一處隱蔽的房頂上張望猪落。 院中可真熱鬧,春花似錦畴博、人聲如沸笨忌。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,017評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽官疲。三九已至袱结,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間途凫,已是汗流浹背垢夹。 一陣腳步聲響...
    開封第一講書人閱讀 33,142評(píng)論 1 272
  • 我被黑心中介騙來泰國(guó)打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留维费,地道東北人果元。 一個(gè)月前我還...
    沈念sama閱讀 48,388評(píng)論 3 373
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像犀盟,于是被迫代替她去往敵國(guó)和親而晒。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,066評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容