Apache Kylin CPU 使用率異常
Kylin 版本 2.6.1
創(chuàng)建測試數(shù)據(jù)
首先造一點測試數(shù)據(jù):
rm data.csv
for i in {0..1000}
do
echo "$i,$i,2333
$i,$i,2333
$i,$i,2333
$i,$i,2333
$i,$i,2333
$i,$i,2333
$i,$i,2333
$i,$i,2333
$i,$i,2333
$i,$i,2333" >> data.csv
done
use temp;
drop table test_filter;
create table test_filter(a int, b int, value int)
row format delimited
FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH "/home/kylin/data.csv" into table test_filter;
創(chuàng)建測試 Cube
略
創(chuàng)建包含大量與或邏輯的 SQL
生成 sql 的腳本
#!/bin/bash
echo "select * from temp.test_filter
where ((a >= 0 and a <= 1)"
for i in {1..3}
do
echo "or (a >= $i and a <= $((i+1)))"
done
echo ") and ((b >= 0 and b <= 1)"
for i in {1..3}
do
echo "or (b >= $i and b <= $((i+1)))"
done
echo ")"
先來個簡單點的 sql, 看一下執(zhí)行步驟:
select * from temp.test_filter
where ((a >= 0 and a <= 1)
or (a >= 1 and a <= 2)
or (a >= 2 and a <= 3)
or (a >= 3 and a <= 4)
) and ((b >= 0 and b <= 1)
or (b >= 1 and b <= 2)
or (b >= 2 and b <= 3)
or (b >= 3 and b <= 4)
)
大量與或操作導(dǎo)致的結(jié)果
org.apache.kylin.metadata.filter.TupleFilter 240行左右:
// boolean algebra flatten
if (op == FilterOperatorEnum.AND) {
flatFilter = new LogicalTupleFilter(FilterOperatorEnum.AND);
for (TupleFilter andChild : andChildren) {
flatFilter.addChildren(andChild.getChildren());
}
if (!orChildren.isEmpty()) {
List<TupleFilter> fullAndFilters = cartesianProduct(orChildren, flatFilter);
flatFilter = new LogicalTupleFilter(FilterOperatorEnum.OR);
flatFilter.addChildren(fullAndFilters);
}
}
這段代碼就解釋了 CPU 跑滿的原因,
這里的 op 就是
((a >= 0 and a <= 1)
or (a >= 1 and a <= 2)
or (a >= 2 and a <= 3)
or (a >= 3 and a <= 4)
)
與
((b >= 0 and b <= 1)
or (b >= 1 and b <= 2)
or (b >= 2 and b <= 3)
or (b >= 3 and b <= 4)
)
之間的 AND.
orChildren 就是
((a >= 0 and a <= 1)
or (a >= 1 and a <= 2)
or (a >= 2 and a <= 3)
or (a >= 3 and a <= 4)
)
和
((b >= 0 and b <= 1)
or (b >= 1 and b <= 2)
or (b >= 2 and b <= 3)
or (b >= 3 and b <= 4)
)
所以最終會走到這里:
List<TupleFilter> fullAndFilters = cartesianProduct(orChildren, flatFilter);
做一個笛卡爾積, 目的是把這個 where 字句后面的條件打平成如下的樣子:
((a >= 0 and a <= 1 and b >= 0 and b <= 1)
or (a >= 0 and a <= 1 and b >= 1 and b <= 2)
or (a >= 0 and a <= 1 and b >= 2 and b <= 3)
or (a >= 0 and a <= 1 and b >= 3 and b <= 4)
or (a >= 1 and a <= 2 and b >= 0 and b <= 1)
or (a >= 1 and a <= 2 and b >= 1 and b <= 2)
or (a >= 1 and a <= 2 and b >= 2 and b <= 3)
or (a >= 1 and a <= 2 and b >= 3 and b <= 4)
or (a >= 2 and a <= 3 and b >= 0 and b <= 1)
or (a >= 2 and a <= 3 and b >= 1 and b <= 2)
or (a >= 2 and a <= 3 and b >= 2 and b <= 3)
or (a >= 2 and a <= 3 and b >= 3 and b <= 4)
or (a >= 3 and a <= 4 and b >= 0 and b <= 1)
or (a >= 3 and a <= 4 and b >= 1 and b <= 2)
or (a >= 3 and a <= 4 and b >= 2 and b <= 3)
or (a >= 3 and a <= 4 and b >= 3 and b <= 4)
)
所以, 問題來了, 為什么要把兩個 and 連接的條件打平成這么多個條件?
通過這段代碼, 我推測這么做的原因和構(gòu)建 HTable 維度組合時的默認(rèn)排列方式有關(guān), 這么一打平然后通過上面的算法就可以達(dá)到一次性掃出所有相關(guān)的行.
這樣以來可以很好的優(yōu)化執(zhí)行計劃, 但是這么打平的壞處是什么呢?
上面的例子可以看到, and([4], [4]) 最終的到了一個 or([16]) 的 List.
那么如果把這 4 個條件改成 200 個條件組合起來呢?
最終 List 的長度會變成 200 * 200 = 40000.
所以, 可以使用上述腳本生成個 200 個條件的 SQL 跑一把:
效果立竿見影, 僅一條 SQL 幾乎就可以讓整個系統(tǒng)卡到無法正常使用, 更不用說 Tableau 可能會一次性發(fā)送多條類似 SQL (Tableau 報表中的過濾器使用僅相關(guān)值即可觸發(fā)這種 SQL ). 這無疑是非常致命的.
因為如果過濾字段在多那么幾個, 這個最終的 List 長度則會成指數(shù)型增長, 這么一來就會發(fā)現(xiàn) CPU 全部被用來往 List 里追加元素了, 永無止盡的往 List 里插入幾萬幾十萬個元素.
此類 SQL 明顯是不合理的, 而且一旦出現(xiàn)一條這樣的 SQL 可能就會導(dǎo)致整個實例的 CPU 使用率發(fā)生異常.
解決方案
調(diào)低 kylin.query.flat-filter-max-children 的值, 在迪卡爾積數(shù)量可能過大的時候, 直接拋出異常拒絕執(zhí)行.
相關(guān) Issue: KYLIN-3797, KYLIN-4180