Flume的下載方式:
wget http://www.apache.org/dyn/closer.lua/flume/1.6.0/apache-flume-1.6.0-bin.tar.
下載完成之后区岗,使用tar進(jìn)行解壓
tar -zvxf? apache-flume-1.6..0-bin.tar.
進(jìn)入flume的conf配置包中兆衅,使用命令touch flume.conf,然后cp flume-conf.properties.template flume.conf
使vim/gedit flume.conf 編輯配置文件,需要說(shuō)明的的是,Flume conf文件用的是Java版的property文件的key-value鍵值對(duì)模式.
在Flume配置文件中,我們需要
1. 需要命名當(dāng)前使用的Agent的名稱(chēng).
2. 命名Agent下的source的名字.
3. 命名Agent下的channal的名字.
4. 命名Agent下的sink的名字.
5. 將source和sink通過(guò)channal綁定起來(lái).
一般來(lái)說(shuō),在Flume中會(huì)存在著多個(gè)Agent,所以我們需要給它們分別取一個(gè)名字來(lái)區(qū)分它們,注意名字不要相同,名字保持唯一!
例如:
#Agent取名為 agent_name
#source 取名為 source_name ,一次類(lèi)推
agent_name.source=source_name
agent_name.channels=channel_name
agent_name.sinks= sink_name
上圖對(duì)應(yīng)的是單個(gè)Agent,單個(gè)sink,單個(gè)channel情況,如下圖
如果我們需要在一個(gè)Agent上配置n個(gè)sink,m個(gè)channel(n>1, m>1),
那么只需要這樣配置即可:
#Agent取名為 agent_name
#source 取名為 source_name ,一次類(lèi)推
agent_name.source=source_name ,source_name1
agent_name.channels=channel_name,channel_name1
agent_name.sinks= sink_name,sink_name1
上面的配置就表示一個(gè)Agent中有兩個(gè) source,sink,channel的情況,如圖所示
以上是對(duì)多sink,channel,source情況,對(duì)于 多個(gè)Agent,只需要給每個(gè)Agent取一個(gè)獨(dú)一無(wú)二的名字即可!
Flume支持各種各樣的sources,sinks,channels,它們支持的類(lèi)型如下:
以上的類(lèi)型,你可以根據(jù)自己的需求來(lái)搭配組合使用,當(dāng)然如果你愿意,你可以為所欲為的搭配.比如我們使用Avro source類(lèi)型,采用Memory channel,使用HDFS sink存儲(chǔ),那我們的配置可以接著上的配置這樣寫(xiě)
#Agent取名為 agent_name
#source 取名為 source_name ,一次類(lèi)推
agent_name.source=Avro
agent_name.channels=MemoryChannel
agent_name.sinks= HDFS
當(dāng)你命名好Agent的組成部分后,你還需要對(duì)Agent的組成sources , sinks, channles去一一描述. 下面我們來(lái)逐一的細(xì)說(shuō)脑蠕;
Source的配置
注: 需要特別說(shuō)明酌泰,在Agent中對(duì)于存在的N(N>1)個(gè)source媒佣,其中的每一個(gè)source都需要單獨(dú)進(jìn)行配置,首先我們需要對(duì)source的type進(jìn)行設(shè)置陵刹,然后在對(duì)每一個(gè)type進(jìn)行對(duì)應(yīng)的屬性設(shè)置.其通用的模式如下:
agent_name.sources. source_name.type =value
agent_name.sources. source_name.property2=value
agent_name.sources. source_name.property3= value
具體的例子默伍,比如我們Source選用的是Avro模式
#Agent取名為 agent_name
#source 取名為 source_name ,一次類(lèi)推
agent_name.source=Avro
agent_name.channels=MemoryChannel
agent_name.sinks=HDFS
#——————————sourcec配置——————————————#
agent_name.source.Avro.type=avro
agent_name.source.Avro.bind=localhost
agent_name.source.Avro.port= 9696#將source綁定到MemoryChannel管道上
agent_name.source.Avro.channels= MemoryChannel
Channels的配置
Flume在source和sink配間提供各種管道(channels)來(lái)傳遞數(shù)據(jù).因而和source一樣,它也需要配置屬性衰琐,同source一樣也糊,對(duì)于N(N>0)個(gè)channels,
需要單個(gè)對(duì)它們注意設(shè)置屬性,它們的通用模板為:
agent_name.channels.channel_name.type =value
agent_name.channels.channel_name. property2=value
agent_name.channels.channel_name. property3= value
具體的例子羡宙,假如我們選用memory channel類(lèi)型狸剃,那么我先要配置管道的類(lèi)型
agent_name.channels.MemoryChannel.type = memory
但是我們現(xiàn)在只是設(shè)置好了管道自個(gè)兒屬性,我們還需要將其和sink狗热,source鏈接起來(lái)钞馁,也就是綁定,綁定設(shè)置如下斗搞,我們可以分別寫(xiě)在source指攒,sink處慷妙,也可以集中寫(xiě)在channel處
agent_name.sources.Avro.channels =MemoryChannel
agent_name.sinks.HDFS.channels=? MemoryCHannel
Sink的配置
sink的配置和Source配置類(lèi)似僻焚,它的通用格式:
agent_name.sinks. sink_name.type =value
agent_name.sinks. sink_name.property2=value
agent_name.sinks. sink_name.property3= value
具體例子,比如我們?cè)O(shè)置Sink類(lèi)型為HDFS ,那么我們的配置單就如下:
agent_name.sinks.HDFS.type =hdfs
agent_name.sinks.HDFS.path= HDFS‘s path
以上就是對(duì)Flume的配置文件詳細(xì)介紹膝擂,下面在補(bǔ)全一張完整的配置圖:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.? See the NOTICE file
# distributed withthisworkforadditional information
# regarding copyright ownership.? The ASF licensesthisfile
# to you under the Apache License, Version2.0(the
#"License"); you may not usethisfile except in compliance
# with the License.? You may obtain a copy of the License at
#
#? http://www.apache.org/licenses/LICENSE-2.0#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
#"AS IS"BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.? See the Licenseforthe
# specific language governing permissions and limitations
# under the License.
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# inthiscasecalled 'agent'#define agent
agent.sources=seqGenSrc
agent.channels=memoryChannel
agent.sinks=loggerSink kafkaSink
#
# For each one of the sources, the type is defined
#默認(rèn)模式 agent.sources.seqGenSrc.type= seq / netcat /avro
agent.sources.seqGenSrc.type=avro
agent.sources.seqGenSrc.bind=localhost
agent.sources.seqGenSrc.port= 9696#####數(shù)據(jù)來(lái)源####
#agent.sources.seqGenSrc.coommand= tail -F /home/gongxijun/Qunar/data/data.log
# The channel can be defined as follows.
agent.sources.seqGenSrc.channels=memoryChannel
#+++++++++++++++定義sink+++++++++++++++++++++#
# Each sink's type must be definedagent.sinks.loggerSink.type=logger
agent.sinks.loggerSink.type=hbase
agent.sinks.loggerSink.channel=memoryChannel
#表名
agent.sinks.loggerSink.table=flume
#列名
agent.sinks.loggerSink.columnFamily=gxjun
agent.sinks.loggerSink.serializer=org.apache.flume.sink.hbase.MyHbaseEventSerializer
#agent.sinks.loggerSink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.loggerSink.zookeeperQuorum=localhost:2181agent.sinks.loggerSink.znodeParent= /hbase
#Specify the channel the sink should use
agent.sinks.loggerSink.channel=memoryChannel
# Each channel's type is defined.#memory
agent.channels.memoryChannel.type=memory
agent.channels.memortChhannel.keep-alive = 10# Other config values specific to each type of channel(sink or source)
# can be defined as well
# Inthiscase, it specifies the capacity of the memory channel
#agent.channels.memoryChannel.checkpointDir= /home/gongxijun/Qunar/data
#agent.channels.memoryChannel.dataDirs= /home/gongxijun/Qunar/data , /home/gongxijun/Qunar/tmpData
agent.channels.memoryChannel.capacity= 10000000agent.channels.memoryChannel.transactionCapacity= 10000#define the sink2 kafka
#+++++++++++++++定義sink+++++++++++++++++++++#
# Each sink's type must be definedagent.sinks.kafkaSink.type=logger
agent.sinks.kafkaSink.type=org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkaSink.channel=memoryChannel
#agent.sinks.kafkaSink.server=localhost:9092agent.sinks.kafkaSink.topic= kafka-topic
agent.sinks.kafkaSink.batchSize= 20agent.sinks.kafkaSink.brokerList= localhost:9092#Specify the channel the sink should use
agent.sinks.kafkaSink.channel= memoryChannel
該配置類(lèi)型如下如所示:
參考資料:
http://www.tutorialspoint.com/apache_flume/apache_flume_configuration.htm