0x0 背景
近期要做一個數(shù)據(jù)統(tǒng)計功能屯断,公司選擇了clickhouse作為數(shù)據(jù)庫贮喧;下面記錄一下該數(shù)據(jù)庫的特性和使用教程。
0x1 介紹
ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).
clickhouse是一個列式數(shù)據(jù)庫东涡,主要用于數(shù)據(jù)分析隆嗅;
從目前使用看來错忱,特點如下:
- 列式存儲查詢效率高
- 不支持事務儡率;
- 適用于一寫多讀
- 支持特殊的修改和刪除語句,sql標準的刪改語句不支持
0x2 安裝教程
1.從官網(wǎng)下載最新的tgz包以清,然后解壓執(zhí)行sh腳本即可儿普,比較簡單:
export LATEST_VERSION=`curl https://api.github.com/repos/ClickHouse/ClickHouse/tags 2>/dev/null | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | head -n 1`
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/clickhouse-client-$LATEST_VERSION.tgz
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
sudo /etc/init.d/clickhouse-server start
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
2.啟動clickhouse服務端:
root@ubuntu:/etc/clickhouse-server# clickhouse-server
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Logging trace to console
2020.04.29 14:26:41.942431 [ 1 ] {} <Trace> Pipe: Pipe capacity is 1.00 MiB
2020.04.29 14:26:41.945559 [ 1 ] {} <Information> : Starting ClickHouse 20.2.1.2183 with revision 54432
2020.04.29 14:26:41.945635 [ 1 ] {} <Information> Application: starting up
2020.04.29 14:26:41.952544 [ 1 ] {} <Debug> Application: Set max number of file descriptors to 1048576 (was 1024).
2020.04.29 14:26:41.952591 [ 1 ] {} <Debug> Application: Initializing DateLUT.
2020.04.29 14:26:41.952600 [ 1 ] {} <Trace> Application: Initialized DateLUT with time zone 'PRC'.
2020.04.29 14:26:41.953286 [ 1 ] {} <Debug> Application: Configuration parameter 'interserver_http_host' doesn't exist or exists and empty. Will use 'localhost' as replica host.
2020.04.29 14:26:41.956136 [ 1 ] {} <Debug> ConfigReloader: Loading config 'users.xml'
Include not found: networks
2020.04.29 14:26:41.957092 [ 1 ] {} <Information> Application: Uncompressed cache size was lowered to 991.13 MiB because the system has low amount of memory
2020.04.29 14:26:41.957443 [ 1 ] {} <Information> Application: Mark cache size was lowered to 991.13 MiB because the system has low amount of memory
......
3.配置允許遠程連接
進入clickhouse配置文件
/etc/clickhouse-server/config.xml
將<listen_host>::</listen_host>
取消注釋;
重啟服務:service clickhouse-server restart
4.啟動clickhouse-client
root@ubuntu:~# clickhouse-client
ClickHouse client version 20.2.1.2183 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.2.1 revision 54432.
localhost :) show databases;
SHOW DATABASES
┌─name────┐
│ default │
│ system │
│ test │
└─────────┘
3 rows in set. Elapsed: 0.008 sec.
localhost :)
0x3 簡單的demo
create table if not exists test.tb_test
(
id Int64,
datetime DateTime,
content Nullable(String),
value Nullable(Float64),
date Date
)
engine = MergeTree --使用mergeTree引擎,ch主要引擎
partition by toYYYYMM(datetime) --按照datetime這個字段的月進行分區(qū)
order by id --按照id進行排序
TTL datetime + INTERVAL 3 DAY ; --三天過期
--修改表中數(shù)據(jù)過期時間掷倔,到期后數(shù)據(jù)會在merge時被刪除
ALTER TABLE test.tb_test
MODIFY TTL datetime + INTERVAL 1 DAY;
--查詢
select * from tb_test order by id;
--刪除分區(qū)眉孩,可用于定時任務刪除舊數(shù)據(jù)
alter table tb_test drop partition '202005';
--插入數(shù)據(jù)
insert into tb_test values (5, '2020-02-29 12:38:37', 'abcde', 12.553, '2020-04-25');
--修改數(shù)據(jù),不推薦使用
alter table tb_test update content = 'hello click' where id=52;
--刪除數(shù)據(jù),不推薦使用
alter table tb_test delete WHERE id=56;
0x4 高級用法
1.求和引擎SummingMergeTree
這種引擎可以自動聚合非主鍵數(shù)字列浪汪,可以用于事件統(tǒng)計
--自動求和聚合表
CREATE TABLE IF NOT EXISTS tb_stat
(
regionId String, --門店id
groupId String, --統(tǒng)計組id
in int, --進客流
out int, --出客流
statDate DateTime --統(tǒng)計時間
)
ENGINE = SummingMergeTree
partition by (toYYYYMM(statDate), regionId)
ORDER BY (toStartOfHour(statDate), regionId, groupId);
insert into tb_stat values ('1232364', '111', 32, 2, '2020-03-25 12:56:00');
insert into tb_stat values ('1232364', '111', 34, 44, '2020-03-25 12:21:00');
insert into tb_stat values ('1232364', '111', 54, 12, '2020-03-25 12:20:00');
insert into tb_stat values ('1232364', '222', 45, 11, '2020-03-25 12:13:00');
insert into tb_stat values ('1232364', '222', 32, 33, '2020-03-25 12:44:00');
insert into tb_stat values ('1232364', '222', 12, 23, '2020-03-25 12:22:00');
insert into tb_stat values ('1232364', '333', 54, 54, '2020-03-25 12:11:00');
insert into tb_stat values ('1232364', '333', 22, 74, '2020-03-25 12:55:00');
insert into tb_stat values ('1232364', '333', 12, 15, '2020-03-25 12:34:00');
select toStartOfHour(statDate), regionId, groupId, sum(in), sum(out)
from tb_stat group by toStartOfHour(statDate), regionId, groupId;
數(shù)據(jù)插入后巴柿,大概過1分鐘,在此查詢該表可以發(fā)現(xiàn)死遭,只剩下3調(diào)數(shù)據(jù):
select * from tb_stat;
1232364 111 480 232 2020-03-25 04:56:00
1232364 222 356 268 2020-03-25 04:13:00
1232364 333 352 572 2020-03-25 04:11:00