安裝JDK8
PPA方式安裝OracleJDK
sudo apt-add-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
或安裝OpenJDK
sudo apt-get install default-jdk
創(chuàng)建hadoop用戶
sudo useradd -m hadoop -s /bin/bash
sudo passwd hadoop
sudo adduser hadoop sudo
安裝Open SSH Server
sudo apt-get install openssh-server
SSH授權(quán):
cd ~/.ssh/
ssh-keygen -t rsa
cat ./id_rsa.pub >> ./authorized_keys
下載Hadoop
http://hadoop.apache.org/releases.html
選擇3.0穩(wěn)定版的binary下載,并解壓
安裝Hadoop
tar -xzvf hadoop-3.0.0.tar.gz
sudo mv hadoop-3.0.0 /opt/hadoop
PATH
export PATH=$PATH:/opt/hadoop/sbin:/opt/hadoop/bin
設(shè)置JDK環(huán)境變量
readlink -f /usr/bin/java | sed "s:bin/java::"
/usr/lib/jvm/java-8-oracle/jre/
sudo vi ./etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre/
運行Hadoop
./bin/hadoop
mkdir ~/input
cp /opt/hadoop/etc/hadoop/*.xml ~/input
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep ~/input ~/grep_example 'principal[.]*'
偽分布式配置
vi /opt/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vi /opt/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/tmp/dfs/data</value>
</property>
</configuration>
執(zhí)行 NameNode 的格式化:
./bin/hdfs namenode -format
開啟 NameNode 和 DataNode 守護進程:
./sbin/start-dfs.sh
./sbin/stop-dfs.sh
可以執(zhí)行jps查看進程
WEB控制臺界面:
http://localhost:9870
運行Hadoop偽分布式實例
在 HDFS 中創(chuàng)建用戶目錄:
./bin/hdfs dfs -mkdir -p /user/hadoop
將示例xml文件作為輸入文件復(fù)制到分布式文件系統(tǒng)中
./bin/hdfs dfs -mkdir input
./bin/hdfs dfs -put /opt/hadoop/etc/hadoop/*.xml input
查看文件列表:
./bin/hdfs dfs -ls input
偽分布式運行 MapReduce 作業(yè):
./bin/hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
查看運行結(jié)果:
./bin/hdfs dfs -cat output/*
將文件取回本地:
./bin/hdfs dfs -get output /opt/hadoop/output
啟動YARN
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
啟動YARN:
./sbin/start-yarn.sh # 啟動YARN
./sbin/mr-jobhistory-daemon.sh start historyserver # 開啟歷史服務(wù)器喉钢,才能在Web中查看任務(wù)運行情況
停止YARN:
./sbin/stop-yarn.sh
./sbin/mr-jobhistory-daemon.sh stop historyserver
參考文章
https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-in-stand-alone-mode-on-ubuntu-16-04
http://www.powerxing.com/install-hadoop/
http://www.powerxing.com/hadoop-build-project-using-eclipse/