技術(shù)前刊:PostgreSQL12 COPY和bulkloading提升
業(yè)務(wù)是否依賴COPY命令加載數(shù)據(jù)蚯窥?PostgreSQL12提供了一個新特性,大大加快了加載速度截粗。
COPY:Loading and unloading data as fast as possible
細(xì)看PostgreSQL12的COPY語法缚陷,發(fā)現(xiàn)有兩處變動:
1)\h 會有手冊文檔鏈接
2)COPY支持WHERE條件
下面是完整語法:
db12=# \h COPY
Command: ????COPY
Description: copy data between a file and a table
Syntax:
COPY table_name [ ( column_name [, ...] ) ]
????FROM { 'filename' | PROGRAM 'command' | STDIN }
????[ [ WITH ] ( option [, ...] ) ]
????[ WHERE condition ]
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
????TO { 'filename' | PROGRAM 'command' | STDOUT }
????[ [ WITH ] ( option [, ...] ) ]
where option can be one of:
????FORMAT format_name
????FREEZE [ boolean ]
????DELIMITER 'delimiter_character'
????NULL 'null_string'
????HEADER [ boolean ]
????QUOTE 'quote_character'
????ESCAPE 'escape_character'
????FORCE_QUOTE { ( column_name [, ...] ) | * }
????FORCE_NOT_NULL ( column_name [, ...] )
????FORCE_NULL ( column_name [, ...] )
????ENCODING 'encoding_name'
URL: https://www.postgresql.org/docs/12/sql-copy.html
雖然添加文檔鏈接功能帶來便利,但是WHERE過濾功能更加有用昵观。這個功能的目的是什么晾腔?當(dāng)前,COPY只能導(dǎo)入整個文件啊犬。但是某些場景下會有問題:很多場景下灼擂,人們只想加載數(shù)據(jù)的一個子集,并且在導(dǎo)出前必須編寫大量代碼進(jìn)行過濾觉至。
COPY...WHERE: 導(dǎo)入時過濾
PostgreSQL通過該條件可以輕松地進(jìn)行過濾剔应。下面講述一個簡單的例子:
db12=# CREATE TABLE t_demo AS
????????SELECT * FROM generate_series(1, 1000) AS id;
SELECT 1000
首先先插入1000行數(shù)據(jù),然后導(dǎo)出到一個文件:
db12=# COPY t_demo TO '/tmp/file.txt';
COPY 1000
最后语御,再重新導(dǎo)入數(shù)據(jù):
db12=# CREATE TABLE t_import (x int);
CREATE TABLE
db12=# COPY t_import FROM '/tmp/file.txt' WHERE x < 5;
COPY 4
db12=# SELECT * FROM t_import;
?x
---
?1
?2
?3
?4
(4 rows)
如上說是峻贮,過濾數(shù)據(jù)非常簡單直接。需要注意沃暗,導(dǎo)出列是“id”月洛,導(dǎo)入列是“x”何恶。文件文件不知道導(dǎo)入表的表結(jié)構(gòu)---需要確保過濾導(dǎo)入表的列名孽锥。
其他特性
COPY可以將數(shù)據(jù)發(fā)送到UNIX pipe,還可以從pipe中讀取數(shù)據(jù)细层。如下所示:
db12=# COPY t_demo TO PROGRAM 'gzip -c > /tmp/file.txt.gz';
COPY 1000
db12=# COPY t_import FROM PROGRAM 'gunzip -c /tmp/file.txt.gz'
????WHERE x BETWEEN 100 AND 103;
COPY 4
db12=# SELECT * FROM t_import WHERE x >= 100;
??x ?
-----
?100
?101
?102
?103
(4 rows)
某些場景下惜辑,不止想將數(shù)據(jù)導(dǎo)出。上面場景疫赎,我們想導(dǎo)出數(shù)據(jù)時將數(shù)據(jù)壓縮盛撑;導(dǎo)入時將數(shù)據(jù)解壓并過濾。如上所示捧搞,這種方法非常簡單抵卫。
原文
https://www.cybertec-postgresql.com/en/tech-preview-improving-copy-and-bulkloading-in-postgresql-12/