Flume的負(fù)載配置通過slink的group來實現(xiàn),每次按照一定的算法選擇slink輸出到指定的地方阵难,如果文件輸出量很大的情況下負(fù)載均衡還是很有必要的鳞陨,通過多通道輸出緩解輸出壓力烂瘫。
Flume內(nèi)置的負(fù)載均衡的算法默認(rèn)是round robin(輪詢算法)
文件從主機(jī)傳到HDFS上萧锉。
集群信息如下:
Flume集群采用4臺主機(jī)
Flumeapp1 load_balance
Flumeapp2 slink1
Flumeapp3 slink2
Flumeapp4 slink3
Load_balance配置如下(文件采用默認(rèn)的配置文件名conf/flume-conf.properties):
agent1.sources=source1
agent1.sinks=sink1 sink2 sink3
agent1.channels = channel1
source
agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /e3base/spooldir
配置原文件中與目標(biāo)文件名相同
agent1.sources.source1.basenameHeader=true
agent1.sources.source1.basenameHeaderKey=fileName
sink group
agent1.sinkgroups=group1
agent1.sinkgroups.group1.sinks=sink1 sink2 sink3
agent1.sinkgroups.group1.processor.type=load_balance
agent1.sinkgroups.group1.processor.backoff=true
agent1.sinkgroups.group1.processor.selector=round_robin
sink1
agent1.sinks.sink1.type=avro
agent1.sinks.sink1.hostname=134.32.50.13
agent1.sinks.sink1.port=21000
sink2
agent1.sinks.sink2.type=avro
agent1.sinks.sink2.hostname=134.32.50.14
agent1.sinks.sink2.port=21000
sink3
agent1.sinks.sink3.type=avro
agent1.sinks.sink3.hostname=134.32.152.49
agent1.sinks.sink3.port=21000
channel
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity=100
bind
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink2.channel = channel1
agent1.sinks.sink3.channel = channel1
Flumeapp2~Flumeapp4配置相同如下(文件采用默認(rèn)的配置文件名conf/flume-conf.properties):
agent1.sources=source1
agent1.channels=channel1
agent1.sinks = sink1
source
agent1.sources.source1.type=avro
agent1.sources.source1.bind= 134.32.152.49
agent1.sources.source1.port=21000
agent1.sources.source1.basenameHeader=true
agent1.sources.source1.basenameHeaderKey=filename
channels
agent1.channels.channel1.type=memory
agent1.channels.channel1.capacity=1000
agent1.channels.channel1.transactionCapacity=100
sinks
agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=hdfs://drmcluster/test_bak/flume/
agent1.sinks.sink1.hdfs.filePrefix=%{fileName}
agent1.sinks.sink1.hdfs.fileType=DataStream
agent1.sinks.sink1.hdfs.rollCount=0
agent1.sinks.sink1.hdfs.rollSize=134217728
agent1.sinks.sink1.hdfs.rollInterval=60
agent1.sinks.sink1.hdfs.writeFormat=Text
agent1.sinks.sink1.hdfs.useLocalTimeStamp=true
agent1.sources.source1.channels=channel1
agent1.sinks.sink1.channel=channel1
agent1.sources=source1
agent1.channels=channel1
agent1.sinks = sink1
agnet1.channel=channel1
在集群中四個主機(jī)啟動 flume-ng
啟動命令:(conf,properties盡量用絕對路徑沐绒,否則會有意想不到的錯誤)
./flume-ng agent -c /e3base/flume/conf -f /e3base/flume/conf/flume-conf.properties -n agent1 -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true
順序先啟sink1~sink3 然后再啟動load_balanc3否則主報端口找不到谋作。