參考連接:
datax github官方地址:https://github.com/alibaba/DataX
1, 安裝使用
1.1, 下載地址
http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
1.2, 使用方式(DataX工具包,非源碼編譯方式)
-
下載后解壓至本地某個目錄歉嗓,進入bin目錄丰介,即可運行同步作業(yè):
$ cd {YOUR_DATAX_HOME}/bin $ python datax.py {YOUR_JOB.json}
-
自檢腳本:
$ python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
1.3, 配置示例:從stream讀取數(shù)據(jù)并打印到控制臺
1.3.1, 第一步, 查找配置文件模板(json格式)
可以通過命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}
如:python datax.py -r streamreader -w streamwriter, 會打印如下信息:
當然也可以通過官網(wǎng)的github地址下載配置的模板鉴分, 里面還有具體的字段的詳細解釋
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the streamreader document:
https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md
Please refer to the streamwriter document:
https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md
Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [],
"sliceRecordCount": ""
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
1.3.2, 第二步, 根據(jù)模板自定義配置文件
#stream2stream.json
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"sliceRecordCount": 10,
"column": [
{
"type": "long",
"value": "10"
},
{
"type": "string",
"value": "hello哮幢,你好,世界-DataX"
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "UTF-8",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": 5
}
}
}
}
1.3.3, 第三步, 啟動datax志珍,根據(jù)配置json文件執(zhí)行即可
- 如下執(zhí)行后即可通過streamwriter把streamreader從內(nèi)存中讀取的內(nèi)存打印在控制臺上
$ python ../bin/datax.py stream2stream.json