準(zhǔn)備工作
vmware + centos7安裝略
hostname 設(shè)置
hostnamectl set-hostname "master" // 設(shè)置主機(jī)名字
hostnamectl status --transient // 查看臨時(shí)主機(jī)名字
hostnamectl status --static // 查看主機(jī)名字
hostnamectl status // 查看主機(jī)目前基礎(chǔ)信息
防火墻
查看防火墻
systemctl status firewalld.service // 查看防火墻狀態(tài)
systemctl stop firewalld.service // 關(guān)閉防火墻
systemctl disable firewalld.service //永久停用防火墻
systemctl status firewalld.service // 再次查看防火墻狀態(tài)
更新并安裝必要的工具
測(cè)試是否聯(lián)網(wǎng)ping www.baidu.com
如果可以上網(wǎng)那么更新yum并安裝ifconfig类腮,也可以直接使用ip addr
sudo yum update
sudo yum install -y net-tools // 安裝ifconfig 工具
sudo yum install -y vim
配置靜態(tài)IP
使用ifconfig查看網(wǎng)卡是什么,排除掉lo,我這邊是ens33(NAT方式)
首先在虛擬機(jī)添加一塊bridge網(wǎng)卡,然后直接編輯這個(gè)網(wǎng)卡如果沒(méi)有發(fā)現(xiàn)需要復(fù)制下ens33(最好先安裝一下vmware-tools)
cp /etc/sysconfig/network-scripts/ifcfg-ens33 /etc/sysconfig/network-scripts/ifcfg-ens37
ens37配置
TYPE=Ethernet
iOOTPROTO=static
DEVICE=ens37
NAME=ens37
ONBOOT=yes
IPADDR="192.168.15.100"
NETMASK="255.255.255.0"
完成以后service network restart重啟網(wǎng)絡(luò)鉴裹,如果顯示Ok饱狂,那么就表示配置沒(méi)有問(wèn)題了痛侍。
最后使用ifconfig查看下我們的ip是否已經(jīng)按照我們?cè)O(shè)定的配置好了朝氓。
修改host
vim /etc/hosts
192.168.15.100 master // 新添加,機(jī)器IP 機(jī)器名字
192.168.15.101 slave1 // 新添加主届,ip地址可以在所有機(jī)器修改完之后再進(jìn)行修改
192.168.15.102 slave2 // 新添加赵哲,ip地址可以在所有機(jī)器修改完之后再進(jìn)行修改
安裝JAVA
Hadoop和spark都運(yùn)行在java7以上版本,這里下載java8,這里說(shuō)明下tar.gz君丁,這種類型僅僅需要解壓以及配置環(huán)境變量即可枫夺,便于管理
wget下載jdk,這是一個(gè)大坑
wget http://download.oracle.com/otn-pub/java/jdk/8u151-b12/e758a0de34e24606bca991d704f6dcbf/jdk-8u151-linux-x64.tar.gz
tar zxvf jdk-8u151-linux-x64.tar
執(zhí)行一直報(bào)錯(cuò)绘闷,而且莫名其妙
tar (child): jdk-8u151-linux-x64.tar: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
查了下文檔筷屡,是因?yàn)槲覜](méi)有同意java license直接下載了,所以不能解壓tの埂毙死! 坑爹啊
正解:下載java到本地以后在上傳到服務(wù)器上去
搭建java
cp -r jdk1.8/ /usr/java/
配置環(huán)境變量
vim /etc/profile
文件結(jié)尾添加
export JAVA_HOME=/usr/java
export JRE_HOME=/usr/java/jre
export JAVA_BIN=/usr/java/bin
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME JAVA_BIN PATH CLASSPATH
讓其立即生效source /etc/profile
,測(cè)試java -version
Hadoop安裝
下載hadoop,這里選擇stable2的喻鳄,跟java一樣還是tar.gz
吸取了上次的經(jīng)驗(yàn)扼倘,現(xiàn)在我都是下載到本地以后在上傳到服務(wù)器上去
tar zxvf hadoop-2.9.0.tar.gz
將hadoop放入/home/hadoop/hadoop-2.9.0,在/etc/profile中添加
export HADOOP_HOME=/home/hadoop/hadoop-2.9.0
export PATH=.:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
然后讓其立即生效 source /etc/profile
Hadoop配置
這里需要注意一點(diǎn)下面所有的配置均在/home/hadoop/hadoop-2.9.0/etc/hadoop中完成
這里我僅僅說(shuō)明下/home/hadoop/hadoop-2.9.0為hadoop的主目錄
-
配置slaves
僅僅有兩臺(tái)slave除呵,配置如下
slave1
slave2
2.修改hadoop-env.sh和yarn-env.sh中java_home
將export JAVA_HOME=${JAVA_HOME}
修改為對(duì)應(yīng)的JAVA_HOME路徑
3.修改core-site.xml配置
創(chuàng)建一個(gè)目錄(/home/hadoop/hadoop-2.9.0/tmp)存放tmp文件
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop-2.9.0/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
</configuration>
備注:前兩個(gè)設(shè)置是必須的再菊,后面可以不加
4.修改hdfs-site.xml配置
在hadoop_home目錄下創(chuàng)建hdfs文件夾, 完成后在hdfs目錄下創(chuàng)建name以及data文件夾
<property>
<name>dfs.namenode.name.dir</name
> <value>file:/home/hadoop/hadoop-2.9.0/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.9.0/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
5.修改map-red.xml配置
/etc/hadoop/目錄下沒(méi)有這個(gè)文件只有它的模板文件(mapred-site.xml.template),需要復(fù)制出來(lái)這個(gè)文件
cp mapred-site.xml.template mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
6.修改yarn-site.xml配置
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
目前僅僅完成簡(jiǎn)單的配置,三臺(tái)hadoop才可以進(jìn)行集群測(cè)試
Hadoop 集群搭建
slaves創(chuàng)建
copy兩個(gè)虛擬機(jī)(用于slave1 slave2),然后進(jìn)行修改虛擬機(jī)參數(shù)
1.修改靜態(tài)IP
vim /etc/sysconfig/network-scripts/ifcfg-ens37
TYPE=Ethernet
iOOTPROTO=static
DEVICE=ens37
NAME=ens37
ONBOOT=yes
IPADDR="192.168.15.101" # slave1 101 slave2 102
NETMASK="255.255.255.0"
2.修改機(jī)器名字
hostnamectl set-hostname "slave1" // 設(shè)置主機(jī)名字 "slave1"/"slave2"
hostnamectl status --transient // 查看臨時(shí)主機(jī)名字
hostnamectl status --static // 查看主機(jī)名字
hostnamectl status // 查看主機(jī)目前基礎(chǔ)信息
集群測(cè)試
1.啟動(dòng)master&slaves集群
cd $HADOOP_HOME
./bin/hdfs namenode -formate // 一次即可 格式化namenode
./sbin/start-all.sh // 啟動(dòng) dfs 和yarn
緊接著會(huì)讓你依次輸入master root用戶密碼,以及slave1 & slave2 的root用戶密碼颜曾,這樣帶來(lái)很大的麻煩纠拔;
下面來(lái)設(shè)置免密碼登錄
master&slave1&slave2均需要設(shè)置
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys // 記錄機(jī)器key的文件,下次免認(rèn)證
chmod 600 ~/.ssh/authorized_keys
測(cè)試ssh localhost是否可以本地成功(必須先做)泛豪,下面配置遠(yuǎn)程連接
vim /etc/ssh/sshd_config
PubkeyAuthentication yes // 取消這行的注釋
完成以后重啟服務(wù)service sshd restart
下面將master中的~/.ssh/id_rsa.pub下載到本地稠诲,重命名為id_rsa_master.pub侦鹏;分別上傳到slave1&slave2上。
分別切換到slave1&slave2執(zhí)行下面命令
cat ~/.ssh/id_rsa_master.pub>> ~/.ssh/authorized_keys
如果沒(méi)有什么問(wèn)題的話臀叙,顯示如下
master: starting namenode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-root-resourcemanager-master.out
slave1: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-root-nodemanager-slave1.out
slave2: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-root-nodemanager-slave2.out
2.查看啟動(dòng)情況
查看master運(yùn)行情況
切換到master輸入jps
[root@master hadoop]# jps
2373 SecondaryNameNode
3735 Jps
2526 ResourceManager
切換到slave1輸入jps
[root@slave1 ~]# jps
2304 DataNode
2405 NodeManager
2538 Jps
切換到slave2輸入jps
[root@slave2 ~]# jps
1987 DataNode
2293 NodeManager
2614 Jps
如果均啟動(dòng)成功那我們宿主機(jī)器看下鏈接略水,如果可以在Active Nodes看到有2個(gè)那么就代表對(duì)了
關(guān)機(jī)前需要在master機(jī)器上執(zhí)行,用于停hadoop
cd $HADOOP_HOME
./sbin/stop-all.sh
spark 配置
下載并安裝scala
下載scala,這里我們選擇的是scala-2.11.6.tgz劝萤。老樣子還是下載到本地渊涝,然后在上傳到服務(wù)器。
解壓上傳文件并一起放入/usr/scala
tar zxvf scala-2.11.6.tgz
mkdir /usr/scala
mv scala-2.11.6 /usr/scala
編輯/etc/profile*床嫌,加入scala目錄到環(huán)境變量中去
export SCALA_HOME=/usr/scala/scala-2.11.6
export PATH=$PATH:$SCALA_HOME/bin
source /etc/profile
scala -version
檢查是否scala安裝成功
下載并安裝spark
1.下載spark
下載spark,這里我們選擇
spark-2.2.1-bin-hadoop2.6.tgz跨释。
解壓文件放入/home/hadoop/目錄下
[root@master hadoop]# ls
hadoop-2.9.0 spark-2.2.1
將spark加入環(huán)境變量中
export SPARK_HOME=/home/hadoop/spark-2.2.1
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile
2.修改spark參數(shù)(集群)
切換到/home/hadoop/spark-2.2.1/conf目錄下
修改spark-env.sh
因?yàn)閟park-env.sh不存在,所以我們需要copy一份(spark-env.sh.template)
cp spark-env.sh.template spark-env.sh
編輯這個(gè)文件,依次加入java厌处、Scala鳖谈、hadoop、spark的環(huán)境變量蚯姆,以使其能夠正常到運(yùn)行
export JAVA_HOME=/usr/java
export SCALA_HOME=/usr/scala/scala-2.11.6
export SPARK_MASTER=192.168.15.100 // master ip address
export SPARK_WORKER_MEMORY=1g
export HADOOP_HOME=/home/hadoop/hadoop-2.9.0
修改slaves
同樣slaves文件也不存在洒敏,我們需要cp slaves.template slaves
,完成后添加下面內(nèi)容
master
slave1
slave2
到這里master節(jié)點(diǎn)都已經(jīng)完全配置完成凶伙。下面我們需要按照上面配置依次完成slave1 & slave2
3.測(cè)試spark集群
切換到master主機(jī)它碎,首先啟動(dòng)haddop
$HADOOP_HOME/sbin/start-all.sh
完成后啟動(dòng)spark
$HADOOP_HOME/../spark-2.2.1/sbin/start-all.sh
運(yùn)行結(jié)果
[root@master sbin]# $HADOOP_HOME/../spark-2.2.1/sbin/start-all.sh
rsync from 192.168.15.100
/home/hadoop/spark-2.2.1/sbin/spark-daemon.sh: line 170: rsync: command not found
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-2.2.1/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave2: rsync from 192.168.15.102
slave2: /home/hadoop/spark-2.2.1/sbin/spark-daemon.sh: line 170: rsync: command not found
slave1: rsync from 192.168.15.101
slave1: /home/hadoop/spark-2.2.1/sbin/spark-daemon.sh: line 170: rsync: command not found
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-2.2.1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-2.2.1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
master: rsync from 192.168.15.100
master: /home/hadoop/spark-2.2.1/sbin/spark-daemon.sh: line 170: rsync: command not found
master: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-2.2.1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
這里有一個(gè)小報(bào)錯(cuò)扳肛,/home/hadoop/spark-2.2.1/sbin/spark-daemon.sh: line 170: rsync: command not found
;這里我們使用yum install rsync挖息,安裝下rsync(錯(cuò)誤即可消失)。
如果沒(méi)有問(wèn)題的話套腹,可以訪問(wèn)hadoop以及spark
關(guān)閉服務(wù)器先關(guān)閉spark, 然后在關(guān)閉hadoop
[root@master sbin]# $HADOOP_HOME/../spark-2.2.1/sbin/stop-all.sh
slave1: stopping org.apache.spark.deploy.worker.Worker
slave2: stopping org.apache.spark.deploy.worker.Worker
master: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master
[root@master sbin]# $HADOOP_HOME/sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: no namenode to stop
slave1: stopping datanode
slave2: stopping datanode
Stopping secondary namenodes [master]
master: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave2: stopping nodemanager
slave1: stopping nodemanager
slave2: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
搭建jupyter環(huán)境
1.下載Anaconda3
下載Anaconda3,懶得在上傳了直接wget吧
yum install -y bzip2 // 先安裝bunzip2
wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
./Anaconda3-5.0.1-Linux-x86_64.sh // 安裝Anaconda3
這里默認(rèn)目錄是/root/anaconda3 我這里調(diào)整為/usr/anaconda3
同樣編輯/etc/profile添加
export PATH=$PATH:/usr/anaconda3/bin
測(cè)試是否可以運(yùn)行成功jupyter notebook
电禀,如果命令可以運(yùn)行不報(bào)錯(cuò),那么就表示OK了
下面創(chuàng)建一個(gè)密文的密碼
from notebook.auth import passwd
passwd()
Enter password: ········
Verify password: ········
Out[1]:
'sha1:3da3aa9aedb0:deb470c78d2857a1f7d1c11138e4d4d8ad5ecbaf'
2.配置jupyter屬性
首先配置jupyter可以外網(wǎng)訪問(wèn)
[root@master bin]# jupyter notebook --generate-config --allow-root
Writing default config to: /root/.jupyter/jupyter_notebook_config.py
編輯/root/.jupyter/jupyter_notebook_config.py并釋放出如下參數(shù)尖飞,以及修改如下
vim /root/.jupyter/jupyter_notebook_config.py
小技巧 v編輯的時(shí)候可以輸入/進(jìn)行search字符串
c.NotebookApp.open_browser = False # 禁止在運(yùn)行ipython的同時(shí)彈出瀏覽器
c.NotebookApp.password = u`sha1:3da3aa9aedb0:deb470c78d2857a1f7d1c11138e4d4d8ad5ecbaf` # 如果不配置會(huì)出現(xiàn)?token=
c.NotebookApp.port = 5000 # 指定port
c.NotebookApp.ip = '192.168.15.100' # 本機(jī)ip
c.NotebookApp.allow_root=True # 允許root否則又一神坑
touch /root/.jupyter/jupyter_notebook_config.py
測(cè)試是否安裝成功 jupyter notebook
[I 11:23:55.382 NotebookApp] JupyterLab alpha preview extension loaded from /usr/anaconda3/lib/python3.6/site-packages/jupyterlab
JupyterLab v0.27.0
Known labextensions:
[I 11:23:55.383 NotebookApp] Running the core application with no additional extensions or settings
[I 11:23:55.384 NotebookApp] Serving notebooks from local directory: /home/hadoop/spark-2.2.1/bin
[I 11:23:55.384 NotebookApp] 0 active kernels
[I 11:23:55.384 NotebookApp] The Jupyter Notebook is running at: http://192.168.15.100:5000/
[I 11:23:55.384 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
復(fù)制http://192.168.15.100:5000/到本機(jī)進(jìn)行測(cè)試。
- 配置jupyter啟用spark kernel
查看目前啟用python用到的kernel
[root@master bin]# jupyter kernelspec list
Available kernels:
python3 /usr/anaconda3/share/jupyter/kernels/python3
配置:在運(yùn)行pyspark(切換到/home/hadoop/spark-2.2.1/bin)的時(shí)候其實(shí)是運(yùn)行jupyter notebook
在./bashrc文件中添加下面設(shè)置瓦糕,設(shè)置默認(rèn)pyspark屬性
vim ~/.bashrc
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
完成后pyspark
, 即可啟動(dòng)pyspark
然后開(kāi)始我們的pyspark之旅吧!9韭Α!圣勒!
致謝
Hadoop Cluster Setup
Hadoop: Setting up a Single Node Cluster
Spark 開(kāi)發(fā)環(huán)境搭建系列
利用docker快速搭建Spark集群
部署Jupyter/IPython Notebook Memo
基于pyspark 和scala spark的jupyter notebook 安裝