clickhouse 基礎(chǔ)知識

Clickhouse是一個(gè)用于聯(lián)機(jī)分析處理（OLAP）的列式數(shù)據(jù)庫管理系統(tǒng)（columnar DBMS）园欣。
傳統(tǒng)數(shù)據(jù)庫在數(shù)據(jù)大小比較小迷扇，索引大小適合內(nèi)存，數(shù)據(jù)緩存命中率足夠高的情形下能正常提供服務(wù)疑苫。但殘酷的是阁簸，這種理想情形最終會隨著業(yè)務(wù)的增長走到盡頭肮帐，查詢會變得越來越慢塞弊。你可能通過增加更多的內(nèi)存，訂購更快的磁盤等等來解決問題（縱向擴(kuò)展）泪姨，但這只是拖延解決本質(zhì)問題游沿。如果你的需求是解決怎樣快速查詢出結(jié)果，那么ClickHouse也許可以解決你的問題肮砾。

應(yīng)用場景：
1.絕大多數(shù)請求都是用于讀訪問的
2.數(shù)據(jù)需要以大批次（大于1000行）進(jìn)行更新诀黍，而不是單行更新；或者根本沒有更新操作
3.數(shù)據(jù)只是添加到數(shù)據(jù)庫仗处，沒有必要修改
4.讀取數(shù)據(jù)時(shí)眯勾，會從數(shù)據(jù)庫中提取出大量的行枣宫，但只用到一小部分列
5.表很“寬”，即表中包含大量的列
6.查詢頻率相對較低（通常每臺服務(wù)器每秒查詢數(shù)百次或更少）
7.對于簡單查詢吃环，允許大約50毫秒的延遲
8.列的值是比較小的數(shù)值和短字符串（例如也颤，每個(gè)URL只有60個(gè)字節(jié)）
9.在處理單個(gè)查詢時(shí)需要高吞吐量（每臺服務(wù)器每秒高達(dá)數(shù)十億行）
10.不需要事務(wù)
11.數(shù)據(jù)一致性要求較低
12.每次查詢中只會查詢一個(gè)大表。除了一個(gè)大表郁轻，其余都是小表
13.查詢結(jié)果顯著小于數(shù)據(jù)源翅娶。即數(shù)據(jù)有過濾或聚合。返回結(jié)果不超過單個(gè)服務(wù)器內(nèi)存大小

相應(yīng)地好唯，使用ClickHouse也有其本身的限制：

1.不支持真正的刪除/更新支持不支持事務(wù)（期待后續(xù)版本支持）
2.不支持二級索引
3.有限的SQL支持竭沫，join實(shí)現(xiàn)與眾不同
4.不支持窗口功能
5.元數(shù)據(jù)管理需要人工干預(yù)維護(hù)

常用SQL語法

-- 列出數(shù)據(jù)庫列表
show databases;

-- 列出數(shù)據(jù)庫中表列表
show tables;

-- 創(chuàng)建數(shù)據(jù)庫
create database test;

-- 刪除一個(gè)表
drop table if exists test.t1;

-- 創(chuàng)建第一個(gè)表
create /*temporary*/ table /*if not exists*/ test.m1 (
 id UInt16
,name String
) ENGINE = Memory
;
-- 插入測試數(shù)據(jù)
insert into test.m1 (id, name) values (1, 'abc'), (2, 'bbbb');

-- 查詢
select * from test.m1;

默認(rèn)值

默認(rèn)值的處理方面， ClickHouse 中骑篙，默認(rèn)值總是有的蜕提，如果沒有顯示式指定的話，會按字段類型處理：

數(shù)字類型靶端， 0
字符串谎势，空字符串
數(shù)組，空數(shù)組
日期杨名， 0000-00-00
時(shí)間它浅， 0000-00-00 00:00:00
注：NULLs 是不支持的

數(shù)據(jù)類型

1.整型：UInt8,UInt16,UInt32,UInt64,Int8,Int16,Int32,Int64
范圍U開頭-2^N/2~2N-1;非U開頭0～2^N-1
2.枚舉類型：Enum8,Enum16
Enum('hello'=1,'test'=-1),Enum是有符號的整型映射的，因此負(fù)數(shù)也是可以的
3.字符串型：FixedString(N),String
N是最大字節(jié)數(shù)镣煮，不是字符長度，如果是UTF8字符串鄙麦，那么就會占3個(gè)字節(jié)典唇，GBK會占2字節(jié);String可以用來替換VARCHAR,BLOB,CLOB等數(shù)據(jù)類型
4.時(shí)間類型：Date
5.數(shù)組類型：Array(T)
T是一個(gè)基本類型，包括arry在內(nèi)胯府，官方不建議使用多維數(shù)組
6.元組：Tuple
7.結(jié)構(gòu)：Nested(name1 Type1,name2 Type2,...)
類似一種map的結(jié)

物化列

指定 MATERIALIZED 表達(dá)式介衔，即將一個(gè)列作為物化列處理了，這意味著這個(gè)列的值不能從insert 語句獲取骂因，只能是自己計(jì)算出來的炎咖。同時(shí)，
物化列也不會出現(xiàn)在 select * 的結(jié)果中：

drop table if exists test.m2;
create table test.m2 (
 a MATERIALIZED (b+1)
,b UInt16
) ENGINE = Memory;
insert into test.m2 (b) values (1);
select * from test.m2;
select a, b from test.m2;

表達(dá)式列

ALIAS 表達(dá)式列某方面跟物化列相同寒波，就是它的值不能從 insert 語句獲取乘盼。不同的是，物化列是會真正保存數(shù)據(jù)（這樣查詢時(shí)不需要再計(jì)算）俄烁，
而表達(dá)式列不會保存數(shù)據(jù)（這樣查詢時(shí)總是需要計(jì)算）绸栅，只是在查詢時(shí)返回表達(dá)式的結(jié)果。

create table test.m3 (a ALIAS (b+1), b UInt16) ENGINE = Memory;
insert into test.m3(b) values (1);
select * from test.m3;
select a, b from test.m3;

引擎/engine

引擎是clickhouse設(shè)計(jì)的精華部分

TinyLog

最簡單的一種引擎页屠，每一列保存為一個(gè)文件粹胯，里面的內(nèi)容是壓縮過的蓖柔，不支持索引
這種引擎沒有并發(fā)控制，所以风纠，當(dāng)你需要在讀况鸣，又在寫時(shí)，讀會出錯(cuò)竹观。并發(fā)寫镐捧，內(nèi)容都會壞掉。

應(yīng)用場景:
a. 基本上就是那種只寫一次
b. 然后就是只讀的場景栈幸。
c. 不適用于處理量大的數(shù)據(jù)愤估，官方推薦，使用這種引擎的表最多 100 萬行的數(shù)據(jù)

drop table if exists test.tinylog;
create table test.tinylog (a UInt16, b UInt16) ENGINE = TinyLog;
insert into test.tinylog(a,b) values (7,13);

此時(shí)/var/lib/clickhouse/data/test/tinylog保存數(shù)據(jù)的目錄結(jié)構(gòu)：

├── a.bin
├── b.bin
└── sizes.json

a.bin 和 b.bin 是壓縮過的對應(yīng)的列的數(shù)據(jù)速址， sizes.json 中記錄了每個(gè) *.bin 文件的大小

Log

這種引擎跟 TinyLog 基本一致
它的改進(jìn)點(diǎn)玩焰，是加了一個(gè) __marks.mrk 文件，里面記錄了每個(gè)數(shù)據(jù)塊的偏移
這樣做的一個(gè)用處芍锚，就是可以準(zhǔn)確地切分讀的范圍昔园，從而使用并發(fā)讀取成為可能
但是，它是不能支持并發(fā)寫的并炮，一個(gè)寫操作會阻塞其它讀寫操作
Log 不支持索引默刚，同時(shí)因?yàn)橛幸粋€(gè) __marks.mrk 的冗余數(shù)據(jù)，所以在寫入數(shù)據(jù)時(shí)逃魄，一旦出現(xiàn)問題荤西，這個(gè)表就廢了

應(yīng)用場景:
同 TinyLog 差不多，它適用的場景也是那種寫一次之后伍俘，后面就是只讀的場景邪锌，臨時(shí)數(shù)據(jù)用它保存也可以

drop table if exists test.log;
create table test.log (a UInt16, b UInt16) ENGINE = Log;
insert into test.log(a,b) values (7,13);

此時(shí)/var/lib/clickhouse/data/test/log保存數(shù)據(jù)的目錄結(jié)構(gòu)：

├── __marks.mrk
├── a.bin
├── b.bin
└── sizes.json

Memory

內(nèi)存引擎，數(shù)據(jù)以未壓縮的原始形式直接保存在內(nèi)存當(dāng)中癌瘾，服務(wù)器重啟數(shù)據(jù)就會消失
可以并行讀觅丰，讀寫互斥鎖的時(shí)間也非常短
不支持索引，簡單查詢下有非常非常高的性能表現(xiàn)

應(yīng)用場景:
a. 進(jìn)行測試
b. 在需要非常高的性能妨退，同時(shí)數(shù)據(jù)量又不太大（上限大概 1 億行）的場景

Merge

一個(gè)工具引擎妇萄，本身不保存數(shù)據(jù)，只用于把指定庫中的指定多個(gè)表鏈在一起咬荷。
這樣冠句，讀取操作可以并發(fā)執(zhí)行，同時(shí)也可以利用原表的索引幸乒，但是轩端，此引擎不支持寫操作
指定引擎的同時(shí)，需要指定要鏈接的庫及表逝变，庫名可以使用一個(gè)表達(dá)式基茵，表名可以使用正則表達(dá)式指定

create table test.tinylog1 (id UInt16, name String) ENGINE=TinyLog;
create table test.tinylog2 (id UInt16, name String) ENGINE=TinyLog;
create table test.tinylog3 (id UInt16, name String) ENGINE=TinyLog;

insert into test.tinylog1(id, name) values (1, 'tinylog1');
insert into test.tinylog2(id, name) values (2, 'tinylog2');
insert into test.tinylog3(id, name) values (3, 'tinylog3');

use test;
create table test.merge (id UInt16, name String) ENGINE=Merge(currentDatabase(), '^tinylog[0-9]+');
select _table,* from test.merge order by id desc

┌─_table───┬─id─┬─name─────┐
│ tinylog3 │ 3 │ tinylog3 │
│ tinylog2 │ 2 │ tinylog2 │
│ tinylog1 │ 1 │ tinylog1 │
└──────────┴────┴──────────┘

注：_table 這個(gè)列奋构，是因?yàn)槭褂昧?Merge 多出來的一個(gè)的一個(gè)虛擬列

a. 它表示原始數(shù)據(jù)的來源表，它不會出現(xiàn)在 show table 的結(jié)果當(dāng)中
b. select * 不會包含它

Distributed

與 Merge 類似拱层， Distributed 也是通過一個(gè)邏輯表弥臼，去訪問各個(gè)物理表，設(shè)置引擎時(shí)的樣子是：

Distributed(remote_group, database, table [, sharding_key])

其中：

remote_group /etc/clickhouse-server/config.xml中remote_servers參數(shù)
database 是各服務(wù)器中的庫名
table 是表名
sharding_key 是一個(gè)尋址表達(dá)式根灯，可以是一個(gè)列名径缅，也可以是像 rand() 之類的函數(shù)調(diào)用，它與 remote_servers 中的 weight 共同作用烙肺，決定在寫時(shí)往哪個(gè) shard 寫

配置文件中的 remote_servers

<remote_servers>
   <log>
       <shard>
           <weight>1</weight>
           <internal_replication>false</internal_replication>
           <replica>
               <host>172.17.0.3</host>
               <port>9000</port>
           </replica>
       </shard>
       <shard>
           <weight>2</weight>
           <internal_replication>false</internal_replication>
           <replica>
               <host>172.17.0.4</host>
               <port>9000</port>
           </replica>
       </shard>
   </log>
</remote_servers>

log 是某個(gè) shard 組的名字纳猪，就是上面的 remote_group 的值
shard 是固定標(biāo)簽
weight 是權(quán)重，前面說的 sharding_key 與這個(gè)有關(guān)桃笙。
簡單來說氏堤，上面的配置，理論上來看:
第一個(gè) shard “被選中”的概率是 1 / (1 + 2) 搏明，第二個(gè)是 2 / (1 + 2) 鼠锈，這很容易理解。但是星著， sharding_key 的工作情況购笆，是按實(shí)際數(shù)字的“命中區(qū)間”算的，即第一個(gè)的區(qū)間是 [0, 1) 的周期虚循，第二個(gè)區(qū)間是 [1, 1+2) 的周期同欠。比如把 sharding_key 設(shè)置成 id ，當(dāng) id=0 或 id=3 時(shí)横缔，一定是寫入到第一個(gè) shard 中铺遂，如果把 sharding_key 設(shè)置成 rand() ，那系統(tǒng)會對應(yīng)地自己作一般化轉(zhuǎn)換吧剪廉，這種時(shí)候就是一種概率場景了。
internal_replication 是定義針對多個(gè) replica 時(shí)的寫入行為的炕檩。
如果為 false 斗蒋，則會往所有的 replica 中寫入數(shù)據(jù)，但是并不保證數(shù)據(jù)寫入的一致性笛质，所以這種情況時(shí)間一長泉沾，各 replica 的數(shù)據(jù)很可能出現(xiàn)差異。如果為 true 妇押，則只會往第一個(gè)可寫的 replica 中寫入數(shù)據(jù)（剩下的事“物理表”自己處理）跷究。
replica 就是定義各個(gè)冗余副本的，選項(xiàng)有 host 敲霍， port 俊马， user 丁存， password 等

看一個(gè)實(shí)際的例子，我們先在兩臺機(jī)器上創(chuàng)建好物理表并插入一些測試數(shù)據(jù)：

create table test.tinylog_d1(id UInt16, name String) ENGINE=TinyLog;
insert into test.tinylog_d1(id, name) values (1, 'Distributed record 1');
insert into test.tinylog_d1(id, name) values (2, 'Distributed record 2');

在其中一臺創(chuàng)建邏輯表：

create table test.tinylog_d (id UInt16, name String) ENGINE=Distributed(log, test,tinylog_d1 , id);

-- 插入數(shù)據(jù)到邏輯表柴我，觀察數(shù)據(jù)分發(fā)情況
insert into test.tinylog_d(id, name) values (0, 'main');
insert into test.tinylog_d(id, name) values (1, 'main');
insert into test.tinylog_d(id, name) values (2, 'main');

select name,sum(id),count(id) from test.tinylog_d group by name;

注：邏輯表中的寫入操作是異步的解寝，會先緩存在本機(jī)的文件系統(tǒng)上，并且艘儒，對于物理表的不可訪問狀態(tài)聋伦，并沒有嚴(yán)格控制，所以寫入失敗丟數(shù)據(jù)的情況是可能發(fā)生的

Null

空引擎界睁，寫入的任何數(shù)據(jù)都會被忽略觉增，讀取的結(jié)果一定是空。

但是注意翻斟，雖然數(shù)據(jù)本身不會被存儲逾礁，但是結(jié)構(gòu)上的和數(shù)據(jù)格式上的約束還是跟普通表一樣是存在的，同時(shí)杨赤，你也可以在這個(gè)引擎上創(chuàng)建視圖

Buffer

1.Buffer 引擎敞斋，像是Memory 存儲的一個(gè)上層應(yīng)用似的（磁盤上也是沒有相應(yīng)目錄的）
2.它的行為是一個(gè)緩沖區(qū)，寫入的數(shù)據(jù)先被放在緩沖區(qū)疾牲，達(dá)到一個(gè)閾值后植捎，這些數(shù)據(jù)會自動被寫到指定的另一個(gè)表中
3.和Memory 一樣，有很多的限制阳柔，比如沒有索引
4.Buffer 是接在其它表前面的一層焰枢，對它的讀操作，也會自動應(yīng)用到后面表舌剂，但是因?yàn)榍懊嬲f到的限制的原因济锄，一般我們讀數(shù)據(jù)，就直接從源表讀就好了霍转，緩沖區(qū)的這點(diǎn)數(shù)據(jù)延遲荐绝，只要配置得當(dāng)，影響不大的
5.Buffer 后面也可以不接任何表避消，這樣的話低滩，當(dāng)數(shù)據(jù)達(dá)到閾值，就會被丟棄掉

一些特點(diǎn)：

如果一次寫入的數(shù)據(jù)太大或太多岩喷，超過了 max 條件恕沫，則會直接寫入源表。
刪源表或改源表的時(shí)候纱意，建議 Buffer 表刪了重建婶溯。
“友好重啟”時(shí)， Buffer 數(shù)據(jù)會先落到源表，“暴力重啟”迄委， Buffer 表中的數(shù)據(jù)會丟失褐筛。
即使使用了 Buffer ，多次的小數(shù)據(jù)寫入跑筝，對比一次大數(shù)據(jù)寫入死讹，也慢得多（幾千行與百萬行的差距）

-- 創(chuàng)建源表
create table test.mergetree (sdt  Date, id UInt16, name String, point UInt16) ENGINE=MergeTree(sdt, (id, name), 10);
-- 創(chuàng)建 Buffer表
-- Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)
create table test.mergetree_buffer as test.mergetree ENGINE=Buffer(test, mergetree, 16, 3, 20, 2, 10, 1, 10000);

insert into test.mergetree (sdt, id, name, point) values ('2017-07-10', 1, 'a', 20);
insert into test.mergetree_buffer (sdt, id, name, point) values ('2017-07-10', 1, 'b', 10);
select * from test.mergetree;
select '------';
select * from test.mergetree_buffer;

database 數(shù)據(jù)庫
table 源表，這里除了字符串常量曲梗，也可以使用變量的赞警。
num_layers 是類似“分區(qū)”的概念，每個(gè)分區(qū)的后面的 min / max 是獨(dú)立計(jì)算的虏两，官方推薦的值是 16 愧旦。
min / max 這組配置薦，就是設(shè)置閾值的定罢，分別是時(shí)間（秒）笤虫，行數(shù)，空間（字節(jié)）祖凫。

閾值的規(guī)則: 是“所有的 min 條件都滿足琼蚯，或至少一個(gè) max 條件滿足”。

如果按上面我們的建表來說惠况，所有的 min 條件就是：過了 3秒遭庶，2條數(shù)據(jù)，1 Byte稠屠。一個(gè) max 條件是：20秒峦睡，或 10 條數(shù)據(jù)，或有 10K

Set

Set 這個(gè)引擎有點(diǎn)特殊权埠，因?yàn)樗挥迷?IN 操作符右側(cè)榨了，你不能對它 select

create table test.set(id UInt16, name String) ENGINE=Set;
insert into test.set(id, name) values (1, 'hello');
-- select 1 where (1, 'hello') in test.set; -- 默認(rèn)UInt8 需要手動進(jìn)行類型轉(zhuǎn)換
select 1 where (toUInt16(1), 'hello') in test.set;

注: Set 引擎表，是全內(nèi)存運(yùn)行的攘蔽，但是相關(guān)數(shù)據(jù)會落到磁盤上保存龙屉，啟動時(shí)會加載到內(nèi)存中。所以满俗，意外中斷或暴力重啟转捕，是可能產(chǎn)生數(shù)據(jù)丟失問題的

Join

TODO

MergeTree

這個(gè)引擎是 ClickHouse 的重頭戲，它支持一個(gè)日期和一組主鍵的兩層式索引漫雷，還可以實(shí)時(shí)更新數(shù)據(jù)瓜富。同時(shí)鳍咱，索引的粒度可以自定義降盹，外加直接支持采樣功能

MergeTree(EventDate, (CounterID, EventDate), 8192)
MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)

EventDate 一個(gè)日期的列名
intHash32(UserID) 采樣表達(dá)式
(CounterID, EventDate) 主鍵組（里面除了列名，也支持表達(dá)式），也可以是一個(gè)表達(dá)式
8192 主鍵索引的粒度

drop table if exists test.mergetree1;
create table test.mergetree1 (sdt  Date, id UInt16, name String, cnt UInt16) ENGINE=MergeTree(sdt, (id, name), 10);

-- 日期的格式蓄坏，好像必須是 yyyy-mm-dd
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-01', 1, 'aaa', 10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-02', 4, 'bbb', 10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-03', 5, 'ccc', 11);

此時(shí)/var/lib/clickhouse/data/test/mergetree1的目錄結(jié)構(gòu)：

├── 20180601_20180601_1_1_0
│   ├── checksums.txt
│   ├── columns.txt
│   ├── id.bin
│   ├── id.mrk
│   ├── name.bin
│   ├── name.mrk
│   ├── cnt.bin
│   ├── cnt.mrk 
│   ├── cnt.idx
│   ├── primary.idx
│   ├── sdt.bin
│   └── sdt.mrk -- 保存一下塊偏移量
├── 20180602_20180602_2_2_0
│   └── ...
├── 20180603_20180603_3_3_0
│   └── ...
├── format_version.txt
└── detached

ReplacingMergeTree

1.在 MergeTree 的基礎(chǔ)上价捧，添加了“處理重復(fù)數(shù)據(jù)”的功能=>實(shí)時(shí)數(shù)據(jù)場景
2.相比 MergeTree ,ReplacingMergeTree 在最后加一個(gè)"版本列",它跟時(shí)間列配合一起，用以區(qū)分哪條數(shù)據(jù)是"新的"涡戳，并把舊的丟掉(這個(gè)過程是在 merge 時(shí)處理结蟋，不是數(shù)據(jù)寫入時(shí)就處理了的，平時(shí)重復(fù)的數(shù)據(jù)還是保存著的渔彰，并且查也是跟平常一樣會查出來的)
3.主鍵列組用于區(qū)分重復(fù)的行

-- 版本列 允許的類型是嵌屎， UInt 一族的整數(shù)，或 Date 或 DateTime
create table test.replacingmergetree (sdt  Date, id UInt16, name String, cnt UInt16) ENGINE=ReplacingMergeTree(sdt, (name), 10, cnt);

insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-10', 1, 'a', 20);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-10', 1, 'a', 30);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 20);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 30);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 10);

select * from test.replacingmergetree;

-- 如果記錄未執(zhí)行merge恍涂，可以手動觸發(fā)一下 merge 行為
optimize table test.replacingmergetree;

┌────────sdt─┬─id─┬─name─┬─cnt─┐
│ 2018-06-11 │ 1 │ a │ 30 │
└────────────┴────┴──────┴─────┘

SummingMergeTree

1.SummingMergeTree 就是在 merge 階段把數(shù)據(jù)sum求和
2.sum求和的列可以指定宝惰，不可加的未指定列，會取一個(gè)最先出現(xiàn)的值

create table test.summingmergetree (sdt Date, name String, a UInt16, b UInt16) ENGINE=SummingMergeTree(sdt, (sdt, name), 8192, (a));

insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-10', 'a', 1, 20);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-10', 'b', 2, 11);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'b', 3, 18);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'b', 3, 82);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'a', 3, 11);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-12', 'c', 1, 35);

-- 手動觸發(fā)一下 merge 行為
optimize table test.summingmergetree;

select * from test.summingmergetree;

┌────────sdt─┬─name─┬─a─┬──b─┐
│ 2018-06-10 │ a │ 1 │ 20 │
│ 2018-06-10 │ b │ 2 │ 11 │
│ 2018-06-11 │ a │ 3 │ 11 │
│ 2018-06-11 │ b │ 6 │ 18 │
│ 2018-06-12 │ c │ 1 │ 35 │
└────────────┴──────┴───┴────┘
注: 可加列不能是主鍵中的列再沧，并且如果某行數(shù)據(jù)可加列都是 null 尼夺，則這行會被刪除

AggregatingMergeTree

AggregatingMergeTree 是在 MergeTree 基礎(chǔ)之上，針對聚合函數(shù)結(jié)果炒瘸，作增量計(jì)算優(yōu)化的一個(gè)設(shè)計(jì)淤堵，它會在 merge 時(shí)，針對主鍵預(yù)處理聚合的數(shù)據(jù)
應(yīng)用于AggregatingMergeTree 上的聚合函數(shù)除了普通的 sum, uniq等顷扩，還有 sumState , uniqState 拐邪，及 sumMerge ， uniqMerge 這兩組

1.聚合數(shù)據(jù)的預(yù)計(jì)算
是一種“空間換時(shí)間”的權(quán)衡屎即，并且是以減少維度為代價(jià)的

dim1	dim2	dim3	measure1
aaaa	a	1	1
aaaa	b	2	1
bbbb	b	3	1
cccc	b	2	1
cccc	c	1	1
dddd	c	2	1
dddd	a	1	1

假設(shè)原始有三個(gè)維度庙睡，一個(gè)需要 count 的指標(biāo)

dim1	dim2	dim3	measure1
aaaa	a	1	1
aaaa	b	2	1
bbbb	b	3	1
cccc	b	2	1
cccc	c	1	1
dddd	c	2	1
dddd	a	1	1

通過減少一個(gè)維度的方式，來以 count 函數(shù)聚合一次 M

dim2	dim3	count(measure1)
a	1	3
b	2	2
b	3	1
c	1	1
c	2	1

2.聚合數(shù)據(jù)的增量計(jì)算

對于 AggregatingMergeTree 引擎的表技俐，不能使用普通的 INSERT 去添加數(shù)據(jù)乘陪，可以用：
a. INSERT SELECT 來插入數(shù)據(jù)
b. 更常用的，是可以創(chuàng)建一個(gè)物化視圖

drop table if exists test.aggregatingmergetree;
create table test.aggregatingmergetree(
sdt Date
, dim1 String
, dim2 String
, dim3 String
, measure1 UInt64
) ENGINE=MergeTree(sdt, (sdt, dim1, dim2, dim3), 8192);

-- 創(chuàng)建一個(gè)物化視圖雕擂，使用 AggregatingMergeTree
drop table if exists test.aggregatingmergetree_view;
create materialized view test.aggregatingmergetree_view
ENGINE = AggregatingMergeTree(sdt,(dim2, dim3), 8192)
as
select sdt,dim2, dim3, uniqState(dim1) as uv
from test.aggregatingmergetree
group by sdt,dim2, dim3;

insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'a', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'a', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'b', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'bbbb', 'b', '30', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'cccc', 'b', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'cccc', 'c', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'dddd', 'c', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'dddd', 'a', '10', 1);

-- 按 dim2 和 dim3 聚合 count(measure1)
select dim2, dim3, count(measure1) from test.aggregatingmergetree group by dim2, dim3;

-- 按 dim2 聚合 UV
select dim2, uniq(dim1) from test.aggregatingmergetree group by dim2;

-- 手動觸發(fā)merge
OPTIMIZE TABLE test.aggregatingmergetree_view;
select * from test.aggregatingmergetree_view;

-- 查 dim2 的 uv
select dim2, uniqMerge(uv) from test.aggregatingmergetree_view group by dim2 order by dim2;

CollapsingMergeTree

是專門為 OLAP 場景下啡邑，一種“變通”存數(shù)做法而設(shè)計(jì)的，在數(shù)據(jù)是不能改井赌，更不能刪的前提下谤逼，通過“運(yùn)算”的方式，去抹掉舊數(shù)據(jù)的影響仇穗，把舊數(shù)據(jù)“減”去即可流部，從而解決"最終狀態(tài)"類的問題，比如 當(dāng)前有多少人在線纹坐？

“以加代刪”的增量存儲方式枝冀，帶來了聚合計(jì)算方便的好處，代價(jià)卻是存儲空間的翻倍，并且果漾，對于只關(guān)心最新狀態(tài)的場景球切，中間數(shù)據(jù)都是無用的

CollapsingMergeTree 在創(chuàng)建時(shí)與 MergeTree 基本一樣，除了最后多了一個(gè)參數(shù)绒障，需要指定 Sign 位（必須是 Int8 類型）

create table test.collapsingmergetree(sign Int8, sdt Date, name String, cnt UInt16) ENGINE=CollapsingMergeTree(sdt, (sdt, name), 8192, sign);

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末吨凑，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子户辱，更是在濱河造成了極大的恐慌鸵钝，老刑警劉巖，帶你破解...
沈念sama閱讀 221,548評論 6贊 515
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件庐镐，死亡現(xiàn)場離奇詭異蒋伦，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)焚鹊，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 94,497評論 3贊 399
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門痕届，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人末患，你說我怎么就攤上這事研叫。” “怎么了璧针？”我有些...
開封第一講書人閱讀 167,990評論 0贊 360
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵嚷炉，是天一觀的道長。經(jīng)常有香客問我探橱，道長申屹，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 59,618評論 1贊 296
?港島之戀（遺憾婚禮）
正文為了忘掉前任隧膏，我火速辦了婚禮哗讥，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘胞枕。我一直安慰自己杆煞，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 68,618評論 6贊 397
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布腐泻。她就那樣靜靜地躺著决乎，像睡著了一般。火紅的嫁衣襯著肌膚如雪派桩。梳的紋絲不亂的頭發(fā)上构诚，一...
開封第一講書人閱讀 52,246評論 1贊 308
城市分裂傳說
那天，我揣著相機(jī)與錄音铆惑，去河邊找鬼范嘱。笑死凳寺，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的彤侍。我是一名探鬼主播，決...
沈念sama閱讀 40,819評論 3贊 421
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼逆趋，長吁一口氣：“原來是場噩夢啊……” “哼盏阶！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起闻书，我...
開封第一講書人閱讀 39,725評論 0贊 276
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤名斟，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后魄眉，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體砰盐，經(jīng)...
沈念sama閱讀 46,268評論 1贊 320
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 38,356評論 3贊 340
?白月光啟示錄
正文我和宋清朗相戀三年坑律，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了岩梳。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 40,488評論 1贊 352
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡晃择，死狀恐怖冀值，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情宫屠，我是刑警寧澤列疗，帶...
沈念sama閱讀 36,181評論 5贊 350
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站浪蹂，受9級特大地震影響抵栈，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜坤次，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,862評論 3贊 333
男人毒藥：我在死后第九天來索命
文/蒙蒙一古劲、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧缰猴，春花似錦绢慢、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,331評論 0贊 24
一樁弒父案胰舆，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至蹬挤，卻和暖如春缚窿，著一層夾襖步出監(jiān)牢的瞬間曼月，已是汗流浹背聚谁。一陣腳步聲響...
開封第一講書人閱讀 33,445評論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留即供，地道東北人。一個(gè)月前我還...
沈念sama閱讀 48,897評論 3贊 376
代替公主和親
正文我出身青樓扫茅，卻偏偏與公主長得像蹋嵌，于是被迫代替她去往敵國和親。傳聞我的和親對象是個(gè)殘疾皇子葫隙，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,500評論 2贊 359