1.環(huán)境準(zhǔn)備
maven
gcc-c++
lzo-devel
zlib-devel
autoconf
automake
libtool
maven安裝:
1)下載
wget https://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
2)解壓
tar -zxvf apache-maven-3.6.3-bin.tar.gz
3)環(huán)境變量配置
vim /root/.bash_profile
MAVEN_HOME=/usr/local/src/apache-maven-3.6.3
export MAVEN_HOME
PATH=$MAVEN_HOME/bin:$PATH
export PATH
4)修改setting.xml,增加阿里云鏡像
vim conf/settings.xml
<mirrors>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirrors>
其他環(huán)境安裝:
yum -y install gcc-c++ lzo-devel zlib-devel autoconf automake libtool
2.下載、安裝并編譯LZO
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
tar -zxvf lzo-2.10.tar.gz
cd lzo-2.10
./configure -prefix=/usr/local/hadoop/lzo
make
make install
3.編譯hadoop-lzo源碼
1)下載源碼:wget https://github.com/twitter/hadoop-lzo/archive/master.zip
2)解壓:unzip master.zip
3)進(jìn)入/usr/local/src/hadoop-lzo-master详恼,修改pom配置
<hadoop.current.version>2.7.2</hadoop.current.version>
4)聲明兩個(gè)臨時(shí)變量
export C_INCLUDE_PATH=/usr/local/lzo/include
export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
5)編譯
進(jìn)入hadoop-lzo-master執(zhí)行
mvn package -Dmaven.test.skip=true
6)進(jìn)入target呼巴,hadoop-lzo-0.4.21-SNAPSHOT.jar即是編譯成功的hadoop-lzo組件
4.hadoop配置lzo
1)將編譯好的jar放入hadoop下的common目錄下遍略,并分發(fā)到slave節(jié)點(diǎn)
cp hadoop-lzo-0.4.21-SNAPSHOT.jar /usr/local/src/hadoop-2.7.3/share/hadoop/common/
scp hadoop-lzo-0.4.21-SNAPSHOT.jar slave1:/usr/local/src/hadoop-2.7.3/share/hadoop/common/
scp hadoop-lzo-0.4.21-SNAPSHOT.jar slave2:/usr/local/src/hadoop-2.7.3/share/hadoop/common/
scp hadoop-lzo-0.4.21-SNAPSHOT.jar slave3:/usr/local/src/hadoop-2.7.3/share/hadoop/common/
2)配置hadoop下的core-site.xml,在configuration下增加:
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
增加完畢后分發(fā):
scp core-site.xml slave1:$PWD
scp core-site.xml slave2:$PWD
scp core-site.xml slave3:$PWD
3)啟動(dòng)hadoop
start-dfs.sh
start-yarn.sh
5.LZO創(chuàng)建索引
1)上傳文件
hadoop fs -put bigtable.lzo /input
2)對上傳的LZO文件建索引
創(chuàng)建LZO文件的索引果录,LZO壓縮文件的可切片特性依賴于其索引讹开,故我們需要手動(dòng)為
LZO壓縮文件創(chuàng)建索引。若無索引易茬,LZO文件的切片只有一個(gè)酬蹋。
hadoop jar /usr/local/src/hadoop-2.7.3/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar com.hadoop.compression.lzo.DistributedLzoIndexer /input/bigtable.lzo
3)執(zhí)行wordcount程序
hadoop jar /usr/local/src/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output1