說明
StreamingPro有非常多的模塊可以直接在配置文件中使用,本文主要針對(duì)流式計(jì)算中涉及到的模塊。
Kafka Compositor
{
"name": "streaming.core.compositor.spark.streaming.source.KafkaStreamingCompositor",
"params": [{
"topics":"your topic",
"metadata.broker.list":"brokers",
"auto.offset.reset": "smallest|largest"
}]
}
參數(shù)說明:
Property Name | Meaning |
---|---|
topics | Kafka主題,可以多個(gè)瞻惋,按 逗號(hào)分隔 |
metadata.broker.list | Kafka Broker地址 |
auto.offset.reset | 重頭消費(fèi)還是從最新消費(fèi) |
MockInputStreamCompositor
模擬數(shù)據(jù)源熄求,主要為了方便測(cè)試。
{
"name": "streaming.core.compositor.spark.streaming.source.MockInputStreamCompositor",
"params": [{
"batch-1":["1","2","3"],
"batch-2":["1","2","3"],
"batch-3":["1","2","3"],
"batch-4":["1","2","3"]
}]
}
MockInputStreamFromPathCompositor
模擬數(shù)據(jù)源器贩,主要為了方便測(cè)試颅夺。可以接入一個(gè)外部文件作為mock數(shù)據(jù)
{
"name": "streaming.core.compositor.spark.streaming.source.MockInputStreamFromPathCompositor",
"params": [{"path":"file:///tmp/test.txt"}]
}
SingleColumnJSONCompositor
把一條日志轉(zhuǎn)化一個(gè)單列的json文件蛹稍。
{
"name": "streaming.core.compositor.spark.streaming.transformation.SingleColumnJSONCompositor",
"params": [{
"name": "a"
}]
}
params.name 則是列名吧黄,方便后續(xù)的sql使用。
ScalaMapToJSONCompositor
{
"name": "streaming.core.compositor.spark.streaming.transformation.ScalaMapToJSONCompositor",
"params": [{}]
}
可以把scala Map轉(zhuǎn)化為JSon
JavaMapToJSONCompositor
{
"name": "streaming.core.compositor.spark.streaming.transformation.JavaMapToJSONCompositor",
"params": [{}]
}
可以把java Map轉(zhuǎn)化為JSon
FlatJSONCompositor
{
"name": "streaming.core.compositor.spark.streaming.transformation.FlatJSONCompositor",
"params": [{"a":"$['store']['book'][0]['title']"}]
}
從JSON里抽取字段唆姐,映射到新的列名上拗慨。主要是對(duì)復(fù)雜JSON結(jié)構(gòu)進(jìn)行扁平化。語法參考該庫JsonPath
NginxParserCompositor
{
"name": "streaming.core.compositor.spark.streaming.transformation.NginxParserCompositor",
"params": [{"time":0,"url":1}]
}
Nginx 日志解析工具奉芦,按位置給列進(jìn)行命名赵抢。
SQLCompositor
{
"name": "streaming.core.compositor.spark.streaming.transformation.SQLCompositor",
"params": [
{
"sql": "select a, \"5\" as b from test",
"outputTableName": "test2"
}
]
}
Property Name | Meaning |
---|---|
sql | sql 語句 |
outputTableName | 輸出的表名,方便后續(xù)的SQL語句可以銜接 |
SQLESOutputCompositor
將數(shù)據(jù)存儲(chǔ)到ES中
{
"name":"streaming.core.compositor.spark.streaming.output.SQLESOutputCompositor",
"params":[
{
"es.nodes":"",
"es.resource":"",
"es.mapping.include":"",
"timeFormat":"yyyyMMdd"
}
]
}
Property Name | Meaning |
---|---|
es.nodes | 節(jié)點(diǎn)仗阅,多個(gè)節(jié)點(diǎn)用逗號(hào)分隔 |
es.resource | 索引名稱以及類型名稱 |
.... | 其他一些elasticsearch-hadoop的配置 |
SQLPrintOutputCompositor(output)
{
"name": "streaming.core.compositor.spark.streaming.output.SQLPrintOutputCompositor",
"params": [{}]
}
把處理結(jié)果打印到終端控制臺(tái)昌讲。主要是為了調(diào)試使用
JSONTableCompositor
{
"name": "streaming.core.compositor.spark.streaming.transformation.JSONTableCompositor",
"params": [{
"tableName": "test"
}]
}
把字符串(JSON格式)的數(shù)據(jù)注冊(cè)成一張表。 params.tableName可以讓你指定表名减噪。
ConsoleOutputCompositor
{
"name": "streaming.core.compositor.spark.streaming.output.ConsoleOutputCompositor",
"params": [{ }]
}
控制臺(tái)打印短绸,非SQL類。
SQLCSVOutputCompositor
{
"name": "streaming.core.compositor.spark.streaming.output.SQLCSVOutputCompositor",
"params": [{
"path":"",
"mode":""
}]
}
Property Name | Meaning |
---|---|
path | cvs 存儲(chǔ)路徑 |
mode | ErrorIfExists 或者Overwrite 或者Append或者Ignore |
作為CSV 輸出筹裕,需要前面是一張表醋闭。
SQLParquetOutputCompositor
{
"name": "streaming.core.compositor.spark.streaming.output.SQLParquetOutputCompositor",
"params": [{
"path":"",
"mode":""
}]
}
Property Name | Meaning |
---|---|
path | parquet 存儲(chǔ)路徑 |
mode | ErrorIfExists 或者Overwrite 或者Append或者Ignore |
作為parquet 輸出,需要前面是一張表朝卒。