Apache drill

跟著官方文檔看伦籍,外加查到的一些資料
官方文檔戳這里瘾晃,中文版戳這里(安裝方法完全可以按照tutorial贷痪,很詳細(xì),開啟服務(wù)記住這一句就ok:bin/drill-embedded)

FYI:本文和大部分介紹drill的文字一樣無聊蹦误,劫拢,可能drill都是這么點(diǎn)東西,而且是同一版翻譯

Running in embedded mode

安裝完可以通過http://localhost:8047/ 訪問强胰,也可以:

  1. cd (path)/drill
  2. bin/sqlline -u jdbc:drill:zk=local
  3. Run a query (below).

如果想修改配置舱沧,進(jìn)入drill下conf文件夾,drill-env.sh中可以添加配置信息

簡介

  1. Apache Drill是一個低延遲的分布式海量數(shù)據(jù)(涵蓋結(jié)構(gòu)化偶洋、半結(jié)構(gòu)化以及嵌套數(shù)據(jù))交互式查詢引擎熟吏。分布式、無模式(schema-free)
  2. 是Google Dremel的開源實(shí)現(xiàn)玄窝,本質(zhì)是一個分布式的mpp(大規(guī)模并行處理)查詢層牵寺,支持SQL及一些用于NoSQL和Hadoop數(shù)據(jù)存儲系統(tǒng)上的語言
  3. 更快查詢海量數(shù)據(jù),通過對PB字節(jié)(2的50次方字節(jié))數(shù)據(jù)的快速掃描完成相關(guān)分析
  4. Drill 提供即插即用恩脂,在現(xiàn)有的 Hive 和 HBase中可以隨時整合部署帽氓。
  5. 是MR交互式查詢能力不足的補(bǔ)充
  6. 數(shù)據(jù)模型,嵌套
  7. 列式存儲
  8. 結(jié)合了web搜索和并行DBMS技術(shù)

注:Hive (Hive就是在Hadoop上架了一層SQL接口俩块,可以將SQL翻譯成MapReduce去Hadoop上執(zhí)行黎休,這樣就使得數(shù)據(jù)開發(fā)和分析人員很方便的使用SQL來完成海量數(shù)據(jù)的統(tǒng)計(jì)和分析,而不必使用編程語言開發(fā)MapReduce那么麻煩玉凯。)
有一套筆記講Hive势腮,戳這里

Drill 核心服務(wù)是 Drillbit,

Drillbit運(yùn)行在集群的每個數(shù)據(jù)節(jié)點(diǎn)上時漫仆,可以最大化執(zhí)行查詢捎拯,不需要網(wǎng)絡(luò)或是節(jié)點(diǎn)之間移動數(shù)據(jù)

接口

  • Drill Shell
  • Drill Web Console
  • ODBC/JDBC
  • C++ API
動態(tài)發(fā)現(xiàn)Schema

處理過程中會發(fā)現(xiàn)schema,

靈活的數(shù)據(jù)模型

允許數(shù)據(jù)屬性嵌套盲厌,從架構(gòu)角度看玄渗,Drill提供了靈活的柱狀數(shù)據(jù)模型

無集中式元數(shù)據(jù)

不依賴單個的Hive倉庫,可以查詢多個Hive倉庫狸眼,將數(shù)據(jù)結(jié)果整合

查詢執(zhí)行

提交一個Drill查詢,客戶端或應(yīng)用程序會按照查詢格式發(fā)一個SQL語句到Drillbit浴滴,Drillbit是一個執(zhí)行入口拓萌,運(yùn)行計(jì)劃并執(zhí)行查詢

Drillbit街道查詢請求后會變成Foreman來帶動整個查詢,先解析SQL升略,然后轉(zhuǎn)變成Drill可以識別的SQL

logical plan 描述生成查詢結(jié)果所需要的工作微王,并定義數(shù)據(jù)源和操作屡限,由邏輯運(yùn)算符的集合構(gòu)成。

流程

Major Fragments

  • a concept that represents a phase of the query execution
  • A phase can consist of one or multiple operations that Drill must perform to execute the query.
  • Drill assigns each major fragment a MajorFragmentID
  • Drill uses an exchange operator to separate major fragments. An exchange is a change in data location and/or parallelization of the physical plan. An exchange is composed of a sender and a receiver to allow data to move between nodes.

Minor Fragments

  • Each major fragment is parallelized into minor fragments.
  • A minor fragment is a logical unit of work that runs inside a thread. A logical unit of work in Drill is also referred to as a slice.
  • The execution plan that Drill creates is composed of minor fragments. Drill assigns each minor fragment a MinorFragmentID.


  • 流程
    The parallelizer in the Foreman creates one or more minor fragments from a major fragment at execution time, by breaking a major fragment into as many minor fragments as it can usefully run at the same time on the cluster.
    Drill executes each minor fragment in its own thread as quickly as possible based on its upstream data requirements. Drill schedules the minor fragments on nodes with data locality. Otherwise, Drill schedules them in a round-robin(RR時間段執(zhí)行方法) fashion on the existing, available Drillbits.
  • Minor fragments contain one or more relational operators. An operator performs a relational operation, such as scan, filter, join, or group by. Each operator has a particular operator type and an OperatorID. Each OperatorID defines its relationship within the minor fragment to which it belongs.

Execution of Minor Fragments

Minor fragments can run as root, intermediate, or leaf fragments. An execution tree contains only one root fragment. The coordinates of the execution tree are numbered from the root, with the root being zero. Data flows downstream from the leaf fragments to the root fragment.

The root fragment runs in the Foreman and receives incoming queries, reads metadata from tables, rewrites the queries and routes them to the next level in the serving tree. The other fragments become intermediate or leaf fragments.

Intermediate fragments start work when data is available or fed to them from other fragments. They perform operations on the data and then send the data downstream. They also pass the aggregated results to the root fragment, which performs further aggregation and provides the query results to the client or application.

The leaf fragments scan tables in parallel and communicate with the storage layer or access data on local disk. The leaf fragments pass partial results to the intermediate fragments, which perform parallel operations on intermediate results.

Minor Fragment可以作為root炕倘、intermediate钧大、leaf Fragment三種類型運(yùn) 行。一個執(zhí)行樹只包括一個root Fragment罩旋。執(zhí)行樹的坐標(biāo)編號是從root 開始的啊央,root是0。數(shù)據(jù)流是從下游的leaf Fragment到root Fragment涨醋。

運(yùn)行在Foreman的root Fragment接收傳入的查詢瓜饥、從表讀取元數(shù)據(jù),重 新查詢并且路由到下一級服務(wù)樹浴骂。下一級的Fragment包括Intermediate 和leaf Fragment乓土。

當(dāng)數(shù)據(jù)可用或者能從其他的Fragment提供時,Intermediate Fragment啟 動作業(yè)溯警。他們執(zhí)行數(shù)據(jù)操作并且發(fā)送數(shù)據(jù)到下游處理趣苏。通過聚合Root Fragment的結(jié)果數(shù)據(jù),進(jìn)行進(jìn)一步聚合并提供查詢結(jié)果給客戶端或應(yīng) 用程序梯轻。

Leaf Fragment并行掃描表并且與存儲層數(shù)據(jù)通信或者訪問本地磁盤數(shù) 據(jù)食磕。Leaf Fragment的部分結(jié)果傳遞給Intermediate Fragment,然后對 Intermediate結(jié)果執(zhí)行合并操作

Query

  1. 比如一個查詢語句:select id, type, name, ppu
    from dfs./Users/brumsby/drill/donuts.json;

Note that dfs is the schema name, the path to the file is enclosed by backticks, and the query must end with a semicolon.

  1. 注意需要as top

    Paste_Image.png

    如果是嵌套數(shù)組檩淋,the third value of the second inner array).
    select group[1][2]

  2. 一個復(fù)雜的SQL語句

SELECT * FROM (SELECT t.trans_id,
                      t.trans_info.prod_id[0] AS prod_id,
                      t.trans_info.purch_flag AS purchased
               FROM `clicks/clicks.json` t) sq
WHERE sq.prod_id BETWEEN 700 AND 750 AND
      sq.purchased = 'true'
ORDER BY sq.prod_id;

REST API

get/post,文檔在這里芬为,有個jsonapi相關(guān)的文檔,寫的很好蟀悦,而且也有代碼媚朦,是我當(dāng)時看的時候的參考資料

待續(xù)。日戈。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末询张,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子浙炼,更是在濱河造成了極大的恐慌,老刑警劉巖弯屈,帶你破解...
    沈念sama閱讀 218,525評論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件蜗帜,死亡現(xiàn)場離奇詭異,居然都是意外死亡资厉,警方通過查閱死者的電腦和手機(jī)厅缺,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,203評論 3 395
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人湘捎,你說我怎么就攤上這事诀豁。” “怎么了窥妇?”我有些...
    開封第一講書人閱讀 164,862評論 0 354
  • 文/不壞的土叔 我叫張陵舷胜,是天一觀的道長。 經(jīng)常有香客問我活翩,道長烹骨,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,728評論 1 294
  • 正文 為了忘掉前任纱新,我火速辦了婚禮展氓,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘脸爱。我一直安慰自己遇汞,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,743評論 6 392
  • 文/花漫 我一把揭開白布簿废。 她就那樣靜靜地躺著空入,像睡著了一般。 火紅的嫁衣襯著肌膚如雪族檬。 梳的紋絲不亂的頭發(fā)上歪赢,一...
    開封第一講書人閱讀 51,590評論 1 305
  • 那天,我揣著相機(jī)與錄音单料,去河邊找鬼埋凯。 笑死,一個胖子當(dāng)著我的面吹牛扫尖,可吹牛的內(nèi)容都是我干的白对。 我是一名探鬼主播,決...
    沈念sama閱讀 40,330評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼换怖,長吁一口氣:“原來是場噩夢啊……” “哼甩恼!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起沉颂,我...
    開封第一講書人閱讀 39,244評論 0 276
  • 序言:老撾萬榮一對情侶失蹤条摸,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后铸屉,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體钉蒲,經(jīng)...
    沈念sama閱讀 45,693評論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,885評論 3 336
  • 正文 我和宋清朗相戀三年彻坛,在試婚紗的時候發(fā)現(xiàn)自己被綠了子巾。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片帆赢。...
    茶點(diǎn)故事閱讀 40,001評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖线梗,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情怠益,我是刑警寧澤仪搔,帶...
    沈念sama閱讀 35,723評論 5 346
  • 正文 年R本政府宣布,位于F島的核電站蜻牢,受9級特大地震影響烤咧,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜抢呆,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,343評論 3 330
  • 文/蒙蒙 一煮嫌、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧抱虐,春花似錦昌阿、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,919評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至谣沸,卻和暖如春刷钢,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背乳附。 一陣腳步聲響...
    開封第一講書人閱讀 33,042評論 1 270
  • 我被黑心中介騙來泰國打工内地, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人赋除。 一個月前我還...
    沈念sama閱讀 48,191評論 3 370
  • 正文 我出身青樓阱缓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親贤重。 傳聞我的和親對象是個殘疾皇子茬祷,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,955評論 2 355

推薦閱讀更多精彩內(nèi)容