問(wèn)題:
Druid的Kafka Index Service Task在配置完規(guī)則之后,按天分Segement,導(dǎo)致每天都有8小時(shí)的數(shù)據(jù)丟失
原因:
Druid里使用UTC時(shí)間馏臭,而且是寫(xiě)死的澜搅,按天分Segment時(shí)承粤,Segment取的開(kāi)始時(shí)間和結(jié)束時(shí)間是數(shù)據(jù)時(shí)間,而配置了規(guī)則的刪除策略為Drop Forever之后,
Coordinate不會(huì)觸發(fā)handoff纯丸,導(dǎo)致0點(diǎn)到7點(diǎn)之間的數(shù)據(jù)無(wú)法加載到history節(jié)點(diǎn),導(dǎo)致數(shù)據(jù)丟失静袖。同時(shí)在Coordinator Console界面也能看到對(duì)應(yīng)的task為FAILED觉鼻。
解決方案:
目前的Bug還沒(méi)解決,ISSUE為https://github.com/apache/incubator-druid/issues/4137 队橙、https://github.com/apache/incubator-druid/issues/5868 坠陈, 目前是把DropForever修改為Drop Period,如:Load P30D, Drop P31D
history未加載的數(shù)據(jù)捐康,可在元數(shù)據(jù)庫(kù)中將表druid_segments的used字段修改為1即可仇矾。