硬件及軟件要求:
- Java 8 or higher
- Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
- 8G of RAM
- 2 vCPUs
1沼溜、下載解壓縮軟件包
curl -O http://static.druid.io/artifacts/releases/druid-0.12.3-bin.tar.gz
tar -xzf druid-0.12.3-bin.tar.gz
cd druid-0.12.3
壓縮包包含如下目錄:
- LICENSE - license文件.
- bin/ - 快速啟動相關(guān)腳本.
- conf/* - 為集群安裝提供的配置模板.
- conf-quickstart/* - 快速入門配置文件.
- extensions/* - 所有Druid擴(kuò)展文件.
- hadoop-dependencies/* - Druid Hadoop依賴文件.
- lib/* - 所有Druid核心軟件包.
- quickstart/* - quickstart相關(guān)文件目錄.
2沪编、下載教程示例
curl -O http://druid.io/docs/0.12.3/tutorials/tutorial-examples.tar.gz
tar zxvf tutorial-examples.tar.gz
3资盅、啟動zookeeper
curl http://mirror.bit.edu.cn/apache/zookeeper/stable/zookeeper-3.4.12.tar.gz -o zookeeper-3.4.12.tar.gz
tar -xzf zookeeper-3.4.12.tar.gz
cd zookeeper-3.4.12
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
4暑脆、啟動Druid服務(wù)
在druid-0.12.3目錄下竣贪,執(zhí)行如下命令
bin/init
init會做一些初始化工作堂鲤,腳本內(nèi)容如下:
#!/bin/bash -eu
gzip -c -d quickstart/wikiticker-2015-09-12-sampled.json.gz > "quickstart/wikiticker-2015-09-12-sampled.json"
LOG_DIR=var
mkdir log
mkdir -p $LOG_DIR/tmp;
mkdir -p $LOG_DIR/druid/indexing-logs;
mkdir -p $LOG_DIR/druid/segments;
mkdir -p $LOG_DIR/druid/segment-cache;
mkdir -p $LOG_DIR/druid/task;
mkdir -p $LOG_DIR/druid/hadoop-tmp;
mkdir -p $LOG_DIR/druid/pids;
在不同的終端窗口啟動Druid進(jìn)程孩哑,本教程在同一個操作系統(tǒng)運(yùn)行所有Druid進(jìn)程,在大型分布式生產(chǎn)集群環(huán)境中蜕青,部分Druid進(jìn)程仍可以部署在一起。
java `cat examples/conf/druid/coordinator/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
java `cat examples/conf/druid/overlord/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/overlord:lib/*" io.druid.cli.Main server overlord
java `cat examples/conf/druid/historical/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/historical:lib/*" io.druid.cli.Main server historical
java `cat examples/conf/druid/middleManager/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
java `cat examples/conf/druid/broker/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/broker:lib/*" io.druid.cli.Main server broker
jvm.config為java進(jìn)程運(yùn)行參數(shù)配置糊渊,cat coordinator/jvm.config輸出如下:
-server
-Xms256m
-Xmx256m
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dderby.stream.error.file=var/druid/derby.log
以上命令在不同終端窗口運(yùn)行右核,分別啟動了coordinator、overlord渺绒、historical贺喝、middleManager、broker進(jìn)程宗兼。
5躏鱼、重置Druid
所有持久化狀態(tài),如集群元數(shù)據(jù)存儲和服務(wù)的segments都會保存在druid-0.12.3/var目錄下.
如果你想停止服務(wù)殷绍,CTRL-C退出運(yùn)行中的java進(jìn)程染苛。假如希望停止服務(wù)后,以初始化狀態(tài)啟動服務(wù)主到,刪除log和var目錄茶行,再跑一遍init腳本,然后關(guān)閉Zookeeper登钥,刪除Zookeeper的數(shù)據(jù)目錄/tmp/zookeeper畔师。
在druid-0.12.3 目錄下:
rm -rf log
rm -rf var
bin/init
假如你學(xué)習(xí)了Loading stream data from Kafka教程,在關(guān)閉Zookeeper之前你需要先關(guān)閉Kafka牧牢,刪除Kafka日志目錄/tmp/kafka-logs
Ctrl-C 關(guān)閉Kafka broker看锉,刪除日志目錄:
rm -rf /tmp/kafka-logs
現(xiàn)在關(guān)閉Zookeeper姿锭,清理狀態(tài),在zookeeper-3.4.12目錄下:
./bin/zkServer.sh stop
rm -rf /tmp/zookeeper
清理了Druid和Zookeeper狀態(tài)數(shù)據(jù)后伯铣,重啟Zookeeper和Druid服務(wù)呻此。
6、數(shù)據(jù)集
如下數(shù)據(jù)加載教程中懂傀,我們會使用到一份數(shù)據(jù)文件趾诗,Druid包根目錄下的quickstart/wikiticker-2015-09-12-sampled.json.gz,文件內(nèi)容包含了2015-09-12這一天Wikipedia頁面編輯事件蹬蚁。頁面編輯事件以json對象格式存儲于text文件中恃泪。
數(shù)據(jù)包含了如下列:
- added
- channel
- cityName
- comment
- countryIsoCode
- countryName
- deleted
- delta
- isAnonymous
- isMinor
- isNew
- isRobot
- isUnpatrolled
- metroCode
- namespace
- page
- regionIsoCode
- regionName
- user
如下為一條示例數(shù)據(jù):
{
"timestamp":"2015-09-12T20:03:45.018Z",
"channel":"#en.wikipedia",
"namespace":"Main"
"page":"Spider-Man's powers and equipment",
"user":"foobar",
"comment":"/* Artificial web-shooters */",
"cityName":"New York",
"regionName":"New York",
"regionIsoCode":"NY",
"countryName":"United States",
"countryIsoCode":"US",
"isAnonymous":false,
"isNew":false,
"isMinor":false,
"isRobot":false,
"isUnpatrolled":false,
"added":99,
"delta":99,
"deleted":0,
}
7、從文件中夾在數(shù)據(jù)
1) 準(zhǔn)備數(shù)據(jù)犀斋、定義數(shù)據(jù)攝取任務(wù)
A data load is initiated by submitting an ingestion task spec to the Druid overlord. For this tutorial, we'll be loading the sample Wikipedia page edits data.
向Druid overlord提交一個數(shù)據(jù)攝取任務(wù)贝乎,即完成了數(shù)據(jù)的初始化,如下我們會加載Wikipedia頁面編輯數(shù)據(jù)叽粹。
examples/wikipedia-index.json
定義了一個數(shù)據(jù)攝入任務(wù)览效,該任務(wù)讀取quickstart/wikiticker-2015-09-12-sampled.json.gz
中數(shù)據(jù):
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }
]
},
"timestampSpec": {
"column": "time",
"format": "iso"
}
}
},
"metricsSpec" : [],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"],
"rollup" : false
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "quickstart/",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
}
}
}
如上定義,創(chuàng)建了一個名為"wikipedia"的數(shù)據(jù)源.
2)加載批量數(shù)據(jù)
在druid-0.12.3 目錄下虫几,通過POST方式锤灿,提交數(shù)據(jù)攝取任務(wù):
curl -X 'POST' -H 'Content-Type:application/json' -d @examples/wikipedia-index.json http://localhost:8090/druid/indexer/v1/task
假如任務(wù)提交成功,控制臺將會打印任務(wù)ID:
{"task":"index_wikipedia_2018-06-09T21:30:32.802Z"}
可至overlord控制臺http://localhost:8090/console.html查看你已提交的數(shù)據(jù)攝取任務(wù)的狀態(tài)辆脸,可以周期性地刷新控制臺但校,當(dāng)任務(wù)成功時,你可以看到任務(wù)的狀態(tài)變?yōu)?SUCCESS"
當(dāng)攝取任務(wù)結(jié)束啡氢,數(shù)據(jù)將會被historical節(jié)點(diǎn)加載状囱,且在一兩分鐘內(nèi)可被查詢到,你可以通過coordinator控制臺監(jiān)控?cái)?shù)據(jù)加載進(jìn)度倘是,當(dāng)控制臺http://localhost:8081/#/數(shù)據(jù)源“ wikipedia”帶有一個藍(lán)色圈圈時表明"fully available"
8亭枷、