本文為學習筆記界酒,會隨著學習深入持續(xù)更新蜒程,僅供參考
場景:mysql到hdfs;hdfs到doris
1岖圈、mysql到hdfs參考配置文件
{
"job": {
"setting": {
"speed": {
"channel": 3
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "${username}",
"password": "${password}",
"column": [
"id"
],
"connection": [
{
"table": [
"${table}"
],
"jdbcUrl": [
"jdbc:MySQL://${host}:3306/${database}?useSSL=false&allowPublicKeyRetrieval=true"
]
}
]
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"defaultFS": "hdfs://${hdfs_host}:${hdfs_port}",
"fileType": "text",
"path": "${path}",
"fileName": "${table}",
"column": [
{"name": "id", "type": "bigint"}
],
"writeMode":"append",
"fieldDelimiter": "\t",
"compress": "gzip",
"hadoopConfig": {
"dfs.replication": "1"
}
}
}
}
]
}
}
2讹语、hdfs到doris參考配置文件
{
"job": {
"setting": {
"speed": {
"channel": 3
}
},
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "${hive_path}/${hive_table}/dt=${day}",
"defaultFS": "hdfs://${hdfs_host}:${hdfs_port}",
"fileType": "orc",
"column": [
{
"index": 0,
"name": "agg_time",
"type": "string"
},
{
"name": "dt",
"type": "string",
"value": "${day}"
}
],
"fieldDelimiter": "\t",
"encoding": "UTF-8",
"nullFormat": "\\N"
}
},
"writer": {
"name": "doriswriter",
"parameter": {
"loadUrl": [
"${doris_host}:8030"
],
"column": [
"agg_time",
"time_day"
],
"username": "${doris_user}",
"password": "${doris_password}",
"postSql": [],
"preSql": [],
"flushInterval": 30000,
"connection": [
{
"jdbcUrl": "jdbc:mysql://${doris_host}:9030/${doris_db}",
"table": [
"${doris_table}"
],
"selectedDatabase": "${doris_db}"
}
]
}
}
}
]
}
}
注意:
1钙皮、這里需要提取hive的分區(qū)時間(dt)作為doris的一個時間字段(time_day)蜂科,處理方式為傳入指定日期的時間,比如今天處理昨天的數(shù)據(jù)短条,這個時間就是昨天
2导匣、doris的loadUrl的端口是fe的端口,下邊的jdbcUrl的端口才是數(shù)據(jù)庫的端口
3茸时、這里的數(shù)據(jù)采用的追加的方式
4贡定、json的格式要正確否則會報錯
5、如果官方給的jar包可都,缺少一些reader或writer就需要改為源碼安裝
參考文件
1缓待、DataX源碼
2、Dolphinscheduler調度Datax任務讀取Hive分區(qū)表案例
3渠牲、Dorsi官網
4旋炒、Doris寫入時報Content-Length header already present異常處理