在Oracle中爸黄;刪除表或者誤刪表記錄滞伟;有個(gè)閃回特性揭鳞,不需要停機(jī)操作炕贵,可以完美找回記錄。當(dāng)然也有一些其他的恢復(fù)工具野崇;例如odu工具称开,gdul工具。都可以找回?cái)?shù)據(jù)乓梨。而PostgreSQL目前沒(méi)有閃回特性鳖轰。如何在不停機(jī)情況下恢復(fù)誤刪數(shù)據(jù)。還好是有完整的熱備份扶镀。
本文描述的方法是:利用熱備份在另一臺(tái)服務(wù)器進(jìn)行數(shù)據(jù)恢復(fù)蕴侣;再導(dǎo)入正式環(huán)境;這樣不影響數(shù)據(jù)庫(kù)操作臭觉。這方法也適用在Oracle恢復(fù)昆雀。必須滿足幾個(gè)條件
- 有完整的基礎(chǔ)數(shù)據(jù)文件備份和歸檔文件備份.所以備份是很重要的。
- 有一臺(tái)裝好同款Postgres軟件的服務(wù)器
實(shí)例模擬講解
過(guò)程模擬誤刪表tbl_lottu_drop后蝠筑;后續(xù)進(jìn)行dml/ddl操作狞膘;表明正式數(shù)據(jù)庫(kù)還是進(jìn)行正常工作。在另外一臺(tái)數(shù)據(jù)庫(kù)基于數(shù)據(jù)庫(kù)PITR恢復(fù)什乙⊥旆猓恢復(fù)表tbl_lottu_drop的數(shù)據(jù)。
- Postgres201 : 線上數(shù)據(jù)庫(kù)服務(wù)器
- Postgres202 : 操作服務(wù)器
1.創(chuàng)建一個(gè)有效的備份
postgres=# select pg_start_backup(now()::text);
pg_start_backup
-----------------
0/F000060
(1 row)
[postgres@Postgres201 ~]$ rsync -acvz -L --exclude "pg_xlog" --exclude "pg_log" $PGDATA /data/backup/20180428
postgres=# select pg_stop_backup();
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
pg_stop_backup
----------------
0/F000168
(1 row)
2.模擬誤操作
2.1創(chuàng)建一個(gè)需要恢復(fù)對(duì)象表tbl_lottu_drop臣镣。并插入1000記錄辅愿。也保證數(shù)據(jù)從數(shù)據(jù)緩存寫入磁盤中。
lottu=> create table tbl_lottu_drop (id int);
CREATE TABLE
lottu=> insert into tbl_lottu_drop select generate_series(1,1000);
INSERT 0 1000
lottu=> \c lottu postgres
You are now connected to database "lottu" as user "postgres".
2.2這個(gè)獲取一個(gè)時(shí)間:用于后面基于數(shù)據(jù)庫(kù)PITR恢復(fù)當(dāng)然現(xiàn)實(shí)操作后只能記住一個(gè)大概的時(shí)間忆某;還往往是不準(zhǔn)点待;可能記住的時(shí)間是誤操作之后。后面有講解如何獲取需要恢復(fù)到那個(gè)時(shí)間點(diǎn)
lottu=# select now();
now
-------------------------------
2018-04-28 20:47:31.617808+08
(1 row)
lottu=# checkpoint;
CHECKPOINT
lottu=# select pg_xlogfile_name(pg_switch_xlog());
pg_xlogfile_name
--------------------------
000000030000000000000010
(1 row)
2.3進(jìn)行drop表
lottu=# drop table tbl_lottu_drop;
DROP TABLE
2.4 后續(xù)進(jìn)行dml/ddl操作褒繁;表明正式數(shù)據(jù)庫(kù)還是進(jìn)行正常工作
lottu=# create table tbl_lottu_log (id int);
CREATE TABLE
lottu=# insert into tbl_lottu_log values (1),(2);
INSERT 0 2
lottu=# checkpoint;
CHECKPOINT
lottu=# select pg_xlogfile_name(pg_switch_xlog());
pg_xlogfile_name
--------------------------
000000030000000000000011
(1 row)
3. 恢復(fù)操作
3.1 將備份拷貝到Postgres202數(shù)據(jù)庫(kù)上
[postgres@Postgres201 20180428]$ cd /data/backup/20180428
[postgres@Postgres201 20180428]$ ll
total 4
drwx------. 18 postgres postgres 4096 Apr 28 20:42 data
[postgres@Postgres201 20180428]$ rsync -acvz -L data postgres@192.168.1.202:/data/postgres
3.2 刪除不必要的文件
[postgres@Postgres202 data]$ cd $PGDATA
[postgres@Postgres202 data]$ rm backup_label.old postmaster.pid tablespace_map.old
3.3 還原備份表空間軟鏈接
[postgres@Postgres202 data]$ cat tablespace_map
16385 /data/pg_data/lottu
[postgres@Postgres202 data]$ mkdir -p /data/pg_data
[postgres@Postgres202 data]$ cd pg_tblspc/
[postgres@Postgres202 pg_tblspc]$ mv 16385/ /data/pg_data/lottu
[postgres@Postgres202 pg_tblspc]$ ln -s /data/pg_data/lottu ./16385
[postgres@Postgres202 pg_tblspc]$ ll
total 0
lrwxrwxrwx. 1 postgres postgres 19 Apr 28 23:12 16385 -> /data/pg_data/lottu
3.4 將wal日志拷貝到Postgres202數(shù)據(jù)庫(kù)上pg_xlog目錄下亦鳞;從哪個(gè)日志開(kāi)始拷貝
[postgres@Postgres202 data]$ mkdir -p pg_xlog/archive_status
[postgres@Postgres202 data]$ cat backup_label
START WAL LOCATION: 0/F000060 (file 00000003000000000000000F)
CHECKPOINT LOCATION: 0/F000098
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2018-04-28 20:42:15 CST
LABEL: 2018-04-28 20:42:13.244358+08
#查看backup_label;知道00000003000000000000000F開(kāi)始到正在寫入的wal日志。
[postgres@Postgres202 pg_xlog]$ ll
total 65540
-rw-------. 1 postgres postgres 16777216 Apr 28 20:42 00000003000000000000000F
-rw-------. 1 postgres postgres 313 Apr 28 20:42 00000003000000000000000F.00000060.backup
-rw-------. 1 postgres postgres 16777216 Apr 28 20:48 000000030000000000000010
-rw-------. 1 postgres postgres 16777216 Apr 28 20:50 000000030000000000000011
-rw-------. 1 postgres postgres 16777216 Apr 28 20:55 000000030000000000000012
3.5 編輯recovery.conf文件
[postgres@Postgres202 data]$ vi recovery.conf
restore_command = 'cp /data/arch/%f %p' # e.g. 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = '2018-04-28 20:47:31.617808+08'
recovery_target_inclusive = false
recovery_target_timeline = 'latest'
3.6 啟動(dòng)數(shù)據(jù)庫(kù)燕差;并驗(yàn)證數(shù)據(jù)
[postgres@Postgres202 data]$ pg_start
server starting
[postgres@Postgres202 data]$ ps -ef | grep postgres
root 1098 1083 0 22:32 pts/0 00:00:00 su - postgres
postgres 1099 1098 0 22:32 pts/0 00:00:00 -bash
root 1210 1195 0 22:55 pts/1 00:00:00 su - postgres
postgres 1211 1210 0 22:55 pts/1 00:00:00 -bash
postgres 1442 1 1 23:16 pts/0 00:00:00 /opt/pgsql96/bin/postgres
postgres 1450 1442 0 23:16 ? 00:00:00 postgres: checkpointer process
postgres 1451 1442 0 23:16 ? 00:00:00 postgres: writer process
postgres 1459 1442 0 23:16 ? 00:00:00 postgres: wal writer process
postgres 1460 1442 0 23:16 ? 00:00:00 postgres: autovacuum launcher process
postgres 1461 1442 0 23:16 ? 00:00:00 postgres: archiver process last was 00000005.history
postgres 1462 1442 0 23:16 ? 00:00:00 postgres: stats collector process
postgres 1464 1099 0 23:16 pts/0 00:00:00 ps -ef
postgres 1465 1099 0 23:16 pts/0 00:00:00 grep postgres
[postgres@Postgres202 data]$ psql
psql (9.6.0)
Type "help" for help.
postgres=# \c lottu lottu
You are now connected to database "lottu" as user "lottu".
lottu=> \dt
List of relations
Schema | Name | Type | Owner
--------+----------------+-------+-------
public | pitr_test | table | lottu
public | tbl_lottu_drop | table | lottu
lottu=> select count(1) from tbl_lottu_drop;
count
-------
1000
(1 row)
從這看數(shù)據(jù)是恢復(fù)了遭笋;copy到線上數(shù)據(jù)庫(kù)操作略。
延伸點(diǎn)
下面講解下如何找到誤操作的時(shí)間徒探。即recovery_target_time = '2018-04-28 20:47:31.617808+08'的時(shí)間點(diǎn)瓦呼。上文是前面已經(jīng)獲取的;
- 用pg_xlogdump解析這段日志测暗。
[postgres@Postgres201 pg_xlog]$ pg_xlogdump -b 00000003000000000000000F 000000030000000000000012 > lottu.log
pg_xlogdump: FATAL: error in WAL record at 0/12000648: invalid record length at 0/12000680: wanted 24, got 0
- 從lottu.log中可以找到這段日志央串;
[postgres@Postgres202 lottu]$ oid2name -d lottu -t tbl_lottu_drop
From database "lottu":
Filenode Table Name
--------------------------
32784 tbl_lottu_drop
#根據(jù)“32784”日志可以看到是表tbl_lottu_drop在2018-04-28 20:46:37.718442插入1000條記錄(所以恢復(fù)時(shí)間點(diǎn)選2018-04-28 20:47:31.617808+08沒(méi)毛病);即也是在事務(wù)id為1690操作的碗啄。并在事務(wù)id為1691進(jìn)行刪除操作质和。
rmgr: Transaction len (rec/tot): 8/ 34, tx: 1689, lsn: 0/100244A0, prev 0/10024460, desc: COMMIT 2018-04-28 20:45:49.736013 CST
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn: 0/100244C8, prev 0/100244A0, desc: RUNNING_XACTS nextXid 1690 latestCompletedXid 1689 oldestRunningXid 1690
rmgr: Heap len (rec/tot): 3/ 3130, tx: 1690, lsn: 0/10024500, prev 0/100244C8, desc: INSERT off 9
blkref #0: rel 16385/16386/2619 fork main blk 15 (FPW); hole: offset: 60, length: 5116
rmgr: Btree len (rec/tot): 2/ 7793, tx: 1690, lsn: 0/10025140, prev 0/10024500, desc: INSERT_LEAF off 385
blkref #0: rel 16385/16386/2696 fork main blk 1 (FPW); hole: offset: 1564, length: 452
rmgr: Heap len (rec/tot): 2/ 184, tx: 1690, lsn: 0/10026FD0, prev 0/10025140, desc: INPLACE off 16
blkref #0: rel 16385/16386/1259 fork main blk 0
rmgr: Transaction len (rec/tot): 88/ 114, tx: 1690, lsn: 0/10027088, prev 0/10026FD0, desc: COMMIT 2018-04-28 20:46:37.718442 CST; inval msgs: catcache 49 catcache 45 catcache 44 relcache 32784
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn: 0/10027100, prev 0/10027088, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn: 0/10027138, prev 0/10027100, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 0/10027170, prev 0/10027138, desc: CHECKPOINT_ONLINE redo 0/10027138; tli 3; prev tli 3; fpw true; xid 0:1691; oid 40976; multi 1; offset 0; oldest xid 1668 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1691; online
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn: 0/100271E0, prev 0/10027170, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn: 0/10027218, prev 0/100271E0, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 0/10027250, prev 0/10027218, desc: CHECKPOINT_ONLINE redo 0/10027218; tli 3; prev tli 3; fpw true; xid 0:1691; oid 40976; multi 1; offset 0; oldest xid 1668 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1691; online
rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn: 0/100272C0, prev 0/10027250, desc: SWITCH
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn: 0/11000028, prev 0/100272C0, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: Standby len (rec/tot): 16/ 42, tx: 1691, lsn: 0/11000060, prev 0/11000028, desc: LOCK xid 1691 db 16386 rel 32784
rmgr: Heap len (rec/tot): 8/ 2963, tx: 1691, lsn: 0/11000090, prev 0/11000060, desc: DELETE off 16 KEYS_UPDATED
blkref #0: rel 16385/16386/1247 fork main blk 8 (FPW); hole: offset: 88, length: 5288
根據(jù)“32784”日志可以看到是表tbl_lottu_drop在2018-04-28 20:46:37.718442插入1000條記錄所以恢復(fù)時(shí)間點(diǎn)選2018-04-28 20:47:31.617808+08沒(méi)毛病;即也是在事務(wù)id為1690操作的稚字。并在事務(wù)id為1691進(jìn)行刪除操作
- 所以上面的recovery.conf 也可以改寫為:
restore_command = 'cp /data/arch/%f %p' # e.g. 'cp /mnt/server/archivedir/%f %p'
recovery_target_xid = '1690'
recovery_target_inclusive = false
recovery_target_timeline = 'latest'