前言
ClickHouse是“俄羅斯Google”——Yandex公司在2016年開(kāi)源的面向OLAP的列式存儲(chǔ)數(shù)據(jù)庫(kù)思灌,近來(lái)發(fā)展非常迅猛,國(guó)內(nèi)很多大廠都在生產(chǎn)環(huán)境中廣泛使用恭取。隨著業(yè)務(wù)體量的擴(kuò)張泰偿,我們的業(yè)務(wù)人員逐漸有了分析海量用戶行為和點(diǎn)擊流數(shù)據(jù)的需求,經(jīng)過(guò)各種調(diào)研秽荤,最終敲定ClickHouse為最佳方案(成功擠掉了之前有豐富實(shí)操經(jīng)驗(yàn)的Kudu哈哈)甜奄。
本文就來(lái)記錄一個(gè)ClickHouse開(kāi)發(fā)測(cè)試集群的安裝與部署過(guò)程。
前置條件
- 7臺(tái)阿里云ECS窃款,每臺(tái)52 vCPU课兄,192GB內(nèi)存,3TB * 4云盤(pán)(測(cè)試業(yè)務(wù)都是混布的晨继,所以請(qǐng)不要太在意節(jié)點(diǎn)規(guī)格=烟阐。=)
- 操作系統(tǒng)CentOS 7.5
- 配置SSH互信,修改ulimit紊扬,關(guān)閉防火墻和虛擬內(nèi)存等等蜒茄,不再贅述
- ClickHouse版本19.16.14.65
下載并安裝RPM包
先安裝依賴項(xiàng)。
yum -y install libicu perl-JSON-XS
ClickHouse的RPM包由Altinity提供餐屎,可以直接wget到服務(wù)器上檀葛。
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-common-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-common-static-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-debuginfo-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-test-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.16.14.65-1.el7.x86_64.rpm/download.rpm
然后安裝之。
rpm -ivh clickhouse-server-common-19.16.14.65-1.el7.x86_64.rpm
rpm -ivh clickhouse-common-static-19.16.14.65-1.el7.x86_64.rpm
rpm -ivh clickhouse-server-19.16.14.65-1.el7.x86_64.rpm
rpm -ivh clickhouse-debuginfo-19.16.14.65-1.el7.x86_64.rpm
rpm -ivh clickhouse-test-19.16.14.65-1.el7.x86_64.rpm
rpm -ivh clickhouse-client-19.16.14.65-1.el7.x86_64.rpm
安裝完畢后腹缩,啟動(dòng)腳本clickhouse-server位于/etc/init.d目錄下屿聋,而主要的配置文件config.xml、users.xml位于/etc/clickhouse-server目錄下藏鹊。
高可用拓?fù)?/h3>
ClickHouse與ElasticSearch類似润讥,也有數(shù)據(jù)分片(shard)和副本(replica)的概念。但是ClickHouse只允許一個(gè)實(shí)例持有一個(gè)分片盘寡,所以在生產(chǎn)環(huán)境中楚殿,一般采用兩個(gè)甚至多個(gè)對(duì)等的集群互相復(fù)制和熱備(依靠ReplicatedMergeTree引擎族實(shí)現(xiàn)復(fù)制表),當(dāng)某集群上的某節(jié)點(diǎn)掛掉后竿痰,可以由其他集群上持有對(duì)應(yīng)分片的節(jié)點(diǎn)頂上脆粥,實(shí)現(xiàn)高可用。
但是影涉,我們現(xiàn)在手上只有一個(gè)7節(jié)點(diǎn)的小集群冠绢,并且尚處在試運(yùn)行階段,沒(méi)有必要大動(dòng)干戈常潮,所以可以采用每個(gè)節(jié)點(diǎn)啟動(dòng)兩個(gè)ClickHouse實(shí)例,且每個(gè)分片的兩個(gè)副本都位于相鄰節(jié)點(diǎn)上的“環(huán)形副本”(circular replication)拓?fù)浞桨缚Γ缦聢D所示喊式,很容易理解孵户。
當(dāng)然,在實(shí)際生產(chǎn)環(huán)境中岔留,還是強(qiáng)烈推薦使用對(duì)等集群實(shí)現(xiàn)高可用夏哭。
配置config.xml
config.xml包含了除用戶、配額設(shè)置之外的所有與ClickHouse服務(wù)相關(guān)的配置項(xiàng)献联。首先將它復(fù)制一份出來(lái)并命名為config2.xml竖配,作為節(jié)點(diǎn)上另一個(gè)實(shí)例的配置。
cp config.xml config2.xml
注意里逆,config.xml中可以使用<include_from>標(biāo)簽引入外部XML文件的配置(ClickHouse習(xí)慣上將其命名為metrika.xml)进胯,如集群、ZooKeeper配置等原押,并在其他標(biāo)簽中使用incl屬性直接引用之胁镐。但這并非強(qiáng)制的規(guī)范,因此我們沒(méi)有新建metrika.xml文件诸衔,全部直接在config.xml里寫(xiě)了盯漂。
config.xml、config2.xml相同的配置項(xiàng)
ClickHouse集群節(jié)點(diǎn)配置
按照上文的圖示來(lái)配置如下的7分片2副本方案笨农,兩個(gè)實(shí)例分別占用9000和9001端口就缆。這兩個(gè)端口也分別是同機(jī)兩個(gè)ClickHouse實(shí)例的TCP端口,下文會(huì)說(shuō)谒亦。
<remote_servers>
<sht_ck_cluster_1>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ck-node001</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>ck-node002</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ck-node002</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>ck-node003</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ck-node003</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>ck-node004</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ck-node004</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>ck-node005</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ck-node005</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>ck-node006</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ck-node006</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>ck-node007</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ck-node007</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>ck-node001</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
</sht_ck_cluster_1>
</remote_servers>
- sht_ck_cluster_1是集群標(biāo)識(shí)竭宰,可以自行規(guī)定,在創(chuàng)建分布式表(引擎為Distributed)時(shí)需要用到诊霹。
- weight表示每個(gè)分片的寫(xiě)入權(quán)重值羞延,數(shù)據(jù)寫(xiě)入時(shí)會(huì)有較大概率落到weight值較大的分片,這里全部設(shè)為1脾还。
- internal_replication表示是否啟用內(nèi)部復(fù)制伴箩,即寫(xiě)入數(shù)據(jù)時(shí)只寫(xiě)入到一個(gè)副本,其他副本的同步工作靠復(fù)制表和ZooKeeper異步進(jìn)行鄙漏,顯然設(shè)為true是科學(xué)的嗤谚。
ZK集群節(jié)點(diǎn)配置
ClickHouse的復(fù)制表強(qiáng)依賴于ZooKeeper(作為元數(shù)據(jù)存儲(chǔ))。我們直接復(fù)用了原本存在的3節(jié)點(diǎn)ZK集群怔蚌。
<zookeeper>
<node index="1">
<host>zk-node001</host>
<port>2181</port>
</node>
<node index="2">
<host>zk-node002</host>
<port>2181</port>
</node>
<node index="3">
<host>zk-node003</host>
<port>2181</port>
</node>
</zookeeper>
數(shù)據(jù)壓縮配置
這里只是取消了原來(lái)的注釋而已巩步。
<compression>
<!-- Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. -->
<case>
<!-- Conditions. All must be satisfied. Some conditions may be omitted. -->
<min_part_size>10000000000</min_part_size> <!-- Min part size in bytes. -->
<min_part_size_ratio>0.01</min_part_size_ratio> <!-- Min size of part relative to whole table size. -->
<!-- What compression method to use. -->
<method>lz4</method>
</case>
</compression>
- min_part_size是可被壓縮的數(shù)據(jù)塊的最小大小,默認(rèn)值10GB桦踊。
- min_part_size_ratio是可被壓縮的數(shù)據(jù)塊占全表大小的最小比例椅野,默認(rèn)值1%。
- method是壓縮算法,可選lz4和zstd竟闪。
連接离福、并發(fā)查詢配置
由于并非真正的線上集群,并且集群里還有HDFS等服務(wù)在跑炼蛤,所以設(shè)得比較小一些妖爷。
<max_connections>128</max_connections>
<keep_alive_timeout>3</keep_alive_timeout>
<max_concurrent_queries>16</max_concurrent_queries>
ClickHouse本身也不是為了密集OLTP查詢而設(shè)計(jì)的,所以最大并發(fā)查詢數(shù)不會(huì)很大理朋,默認(rèn)值為100絮识。
時(shí)區(qū)配置
<timezone>Asia/Shanghai</timezone>
config.xml、config2.xml不同的配置項(xiàng)
為了方便對(duì)比嗽上,以下都貼截圖次舌,左側(cè)為config.xml,右側(cè)為config2.xml炸裆。 截圖會(huì)被壓糊垃它,還是文本吧= =
日志配置
注意要寫(xiě)到不同的路徑下。
- config.xml
<logger>
<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
<level>trace</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
<!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
</logger>
- config2.xml
<logger>
<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
<level>trace</level>
<log>/var/log/clickhouse-server2/clickhouse-server2.log</log>
<errorlog>/var/log/clickhouse-server2/clickhouse-server2.err.log</errorlog>
<size>1000M</size>
<count>10</count>
<!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
</logger>
端口配置
包含HTTP烹看、TCP国拇、節(jié)點(diǎn)間通信的端口。注意TCP端口分別與上文副本配置的端口號(hào)一致惯殊,HTTP和節(jié)點(diǎn)間通信端口不要與副本的端口沖突酱吝。
- config.xml
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<interserver_http_port>9009</interserver_http_port>
- config2.xml
<http_port>8124</http_port>
<tcp_port>9001</tcp_port>
<interserver_http_port>9010</interserver_http_port>
目錄路徑配置
包含數(shù)據(jù)目錄、臨時(shí)目錄土思、用戶文件目錄和輸入schema目錄务热,兩個(gè)實(shí)例各使用一塊硬盤(pán)(分別掛載到/data3和/data4)。
- config.xml
<path>/data3/clickhouse/data/</path>
<tmp_path>/data3/clickhouse/tmp/</tmp_path>
<user_files_path>/data3/clickhouse/user_files/</user_files_path>
<format_schema_path>/data3/clickhouse/format_schemas/</format_schema_path>
- config2.xml
<path>/data4/clickhouse/data/</path>
<tmp_path>/data4/clickhouse/tmp/</tmp_path>
<user_files_path>/data4/clickhouse/user_files/</user_files_path>
<format_schema_path>/data4/clickhouse/format_schemas/</format_schema_path>
復(fù)制表宏(macros)配置
該配置在每個(gè)節(jié)點(diǎn)的每個(gè)實(shí)例上都不同己儒,且可以自有配置崎岂。這里可以包含3個(gè)元素:
- layer:復(fù)制表的層級(jí)。在大型集群(比如ClickHouse的發(fā)源地Yandex.Metrica)中可能會(huì)有多級(jí)備份闪湾,不過(guò)我們自然只有1級(jí)了冲甘。
- shard:實(shí)例所持有的分片ID。
- replica:實(shí)例所持有的副本ID途样。
這樣江醇,我們?cè)谑褂肦eplicatedMergeTree引擎族創(chuàng)建表時(shí),就不必在ZK路徑里分別指定這些值何暇,直接用{layer}陶夜、{shard}、{replica}等替代即可裆站。通過(guò)“高可用規(guī)劃”一節(jié)的圖条辟,就可以寫(xiě)出對(duì)應(yīng)的配置黔夭。以node001節(jié)點(diǎn)為例:
- config.xml
<macros>
<layer>01</layer>
<shard>01</shard>
<replica>01-01-1</replica>
</macros>
- config2.xml
<macros>
<layer>01</layer>
<shard>07</shard>
<replica>01-07-2</replica>
</macros>
副本的命名采用了{(lán)layer}-{shard}-{replica序號(hào)}的方式,更易讀捂贿。
配置users.xml
users.xml包含了用戶纠修、配額設(shè)置。結(jié)構(gòu)比較簡(jiǎn)單厂僧,直接貼全文吧。
<?xml version="1.0"?>
<yandex>
<!-- Profiles of settings. -->
<profiles>
<!-- Default settings. -->
<default>
<max_threads>8</max_threads>
<!-- Maximum memory usage for processing single query, in bytes. -->
<max_memory_usage>8000000000</max_memory_usage>
<!-- Use cache of uncompressed blocks of data. Meaningful only for processing many of very short queries. -->
<use_uncompressed_cache>0</use_uncompressed_cache>
<load_balancing>random</load_balancing>
</default>
<!-- Profile that allows only read queries. -->
<readonly>
<max_threads>8</max_threads>
<max_memory_usage>8000000000</max_memory_usage>
<use_uncompressed_cache>0</use_uncompressed_cache>
<load_balancing>random</load_balancing>
<readonly>1</readonly>
</readonly>
</profiles>
<!-- Users and ACL. -->
<users>
<!-- If user name was not specified, 'default' user is used. -->
<default>
<password_sha256_hex>37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f</password_sha256_hex>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<!-- Settings profile for user. -->
<profile>default</profile>
<!-- Quota for user. -->
<quota>default</quota>
</default>
</users>
<!-- Quotas. -->
<quotas>
<!-- Name of quota. -->
<default>
<!-- Limits for time interval. You could specify many intervals with different limits. -->
<interval>
<!-- Length of interval. -->
<duration>3600</duration>
<!-- No limits. Just calculate resource usage for time interval. -->
<queries>0</queries>
<errors>0</errors>
<result_rows>0</result_rows>
<read_rows>0</read_rows>
<execution_time>0</execution_time>
</interval>
</default>
</quotas>
</yandex>
這里基本上采用了默認(rèn)設(shè)置了牛,即用戶名颜屠、密碼、profile和配額設(shè)置均為default鹰祸。注意寫(xiě)密碼時(shí)用SHA256加密后的值:
echo -n 'default' | sha256sum | tr -d '-'
另外甫窟,需要根據(jù)集群實(shí)際情況配置max_threads(單個(gè)查詢能使用的最多線程數(shù))以及max_memory_usage(單個(gè)查詢能使用的最多內(nèi)存量)參數(shù)。
修改啟動(dòng)腳本
將原始的啟動(dòng)腳本clickhouse-server復(fù)制一份蛙婴。
cp clickhouse-server clickhouse-server2
clickhouse-server修改數(shù)據(jù)目錄:
CLICKHOUSE_DATADIR_OLD=/data3/clickhouse/data_old
CLICKHOUSE_DATADIR=/data3/clickhouse/data
clickhouse-server2除了修改數(shù)據(jù)目錄之外粗井,還得修改日志、配置街图、PID和cron文件:
CLICKHOUSE_LOGDIR=/var/log/clickhouse-server2
CLICKHOUSE_DATADIR_OLD=/data4/clickhouse/data_old
CLICKHOUSE_DATADIR=/data4/clickhouse/data
CLICKHOUSE_CRONFILE=/etc/cron.d/clickhouse-server2
CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config2.xml
CLICKHOUSE_PIDFILE="$CLICKHOUSE_PIDDIR/$PROGRAM-2.pid"
啟動(dòng)浇衬、試運(yùn)行
推薦先手動(dòng)把前文涉及到的各種目錄(數(shù)據(jù)目錄之類的)mkdir出來(lái),然后再啟動(dòng)餐济。
service clickhouse-server start
service clickhouse-server2 start
在任一節(jié)點(diǎn)上打開(kāi)一個(gè)客戶端耘擂,查詢system.clusters系統(tǒng)表,可以得到正確的集群分片和副本信息絮姆。
~ clickhouse-client -m -h 127.0.0.1 --password default --port 9000
ClickHouse client version 19.16.14.65.
Connecting to 127.0.0.1:9000 as user default.
Connected to ClickHouse server version 19.16.14 revision 54427.
testdev1-bigdata-ck-node001 :) select * from system.clusters;
SELECT *
FROM system.clusters
┌─cluster──────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name───────────────────┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ sht_ck_cluster_1 │ 1 │ 1 │ 1 │ testdev1-bigdata-ck-node001 │ 10.1.17.3 │ 9000 │ 1 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 1 │ 1 │ 2 │ testdev1-bigdata-ck-node002 │ 10.1.17.4 │ 9001 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 2 │ 1 │ 1 │ testdev1-bigdata-ck-node002 │ 10.1.17.4 │ 9000 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 2 │ 1 │ 2 │ testdev1-bigdata-ck-node003 │ 10.1.17.252 │ 9001 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 3 │ 1 │ 1 │ testdev1-bigdata-ck-node003 │ 10.1.17.252 │ 9000 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 3 │ 1 │ 2 │ testdev1-bigdata-ck-node004 │ 10.1.17.251 │ 9001 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 4 │ 1 │ 1 │ testdev1-bigdata-ck-node004 │ 10.1.17.251 │ 9000 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 4 │ 1 │ 2 │ testdev1-bigdata-ck-node005 │ 10.1.17.2 │ 9001 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 5 │ 1 │ 1 │ testdev1-bigdata-ck-node005 │ 10.1.17.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 5 │ 1 │ 2 │ testdev1-bigdata-ck-node006 │ 10.1.17.1 │ 9001 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 6 │ 1 │ 1 │ testdev1-bigdata-ck-node006 │ 10.1.17.1 │ 9000 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 6 │ 1 │ 2 │ testdev1-bigdata-ck-node007 │ 10.1.17.250 │ 9001 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 7 │ 1 │ 1 │ testdev1-bigdata-ck-node007 │ 10.1.17.250 │ 9000 │ 0 │ default │ │ 0 │ 0 │
│ sht_ck_cluster_1 │ 7 │ 1 │ 2 │ testdev1-bigdata-ck-node001 │ 10.1.17.3 │ 9001 │ 0 │ default │ │ 0 │ 0 │
└──────────────────┴───────────┴──────────────┴─────────────┴─────────────────────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘
14 rows in set. Elapsed: 0.001 sec.
到此為止醉冤,一個(gè)高可用的ClickHouse集群就建立好了。
The End
關(guān)于復(fù)制表篙悯、分布式表的原理和演示蚁阳,就留到下一篇來(lái)寫(xiě)吧松却,這篇已經(jīng)夠長(zhǎng)了昂芜。
看天氣好像要下雨了,春雨貴如油讽营。
民那晚安移宅。