spark
spark的安裝
-
安裝Scala
-
下載:
wget http://www.scala-lang.org/files/archive/scala-docs-2.13.0-M3.tgz
-
解壓:
tar xvf package
-
安裝:
sudo mv scala-2.13.0 /usr/local/scala
-
環(huán)境變量:
~/.bashrc
#add #SCALA變量 export SCALA_HOME=/usr/local/scala export PATH=$PATH:$SCALA_HOME/bin
source .bashrc
-
啟動Scala
scala
-
安裝Spark
-
下載:
wget http://mirror.bit.edu.cn/apache/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
-
解壓:
tar zxf spark-2.3.0-bin-hadoop2.7.tgz
-
安裝:
sudo mv ./spark-2.3.0-bin-hadoop2.7 /usr/local/spark
-
環(huán)境變量:
#spark變量 export PYSPARK_PYTHON=python3#pyspark版本 export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin
-
啟動pyspark交互界面
pyspark
-
設(shè)置pyspark顯示信息
cd /usr/local/spark/conf cp log4j.properties.template log4j.properties nano log4j.properties #修改成WARN log4j.rootCategory=WARN, console
本地運(yùn)行pyspark
pyspark --master local[*]
測試命令:
sc.master
textFile = sc.textFile("file:/usr/local/spark/README.md")
textFile.count()
Spark Standalone Cluster 運(yùn)行環(huán)境
- 在master中設(shè)置spark-env.sh
復(fù)制模板文件
cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
編輯spark-env.sh文件
export SPARK_MASTER_IP=master#masterIP設(shè)置
export SPARK_WORKER_CORES=1#worker使用CPU核心數(shù)
export SPARK_WORKER_MEMORY=512m#每個worker使用內(nèi)存
export SPARK_WORKER_INSTANCES=1#實(shí)例數(shù)
- 復(fù)制spark到data1偷线、data2辉懒、data3
ssh data1
sudo mkdir /usr/local/spark
sudo chown hduser:hduser /usr/local/spark
exit
sudo scp -r /usr/local/spark hduser@data1:/usr/local
data2拯啦、data3同樣配置
編輯slaves文件
sudo nano /usr/local/spark/conf/slaves
data1
data2
data3
- 啟動Spark Standalone Cluster
/usr/local/spark/sbin/start-all/sh
- 分別啟動master归敬、slaves
/usr/local/spark/sbin/start-master/sh
/usr/local/spark/sbin/start-slaves/sh
- 運(yùn)行pyspark
pyspark --master spark://master:7077 --num-executors 1 --total-executor-cores 3 --executor-memory 512m
- 測試命令
sc.master
textFile = sc.textFile("file:/usr/local/spark/README.md")
textFile.count()
- spark web ui
IPython Notebook運(yùn)行python spark
安裝JUPYTER
sudo pip3 install jupyter
配置jupyter遠(yuǎn)程連接
-
創(chuàng)建登錄密碼
In [1]: from IPython.lib import passwd In [2]: passwd() Enter password: Verify password: Out[2]: 'sha1:................................' # 根據(jù)你的密碼生成sha1值
-
創(chuàng)建jupyter notebook服務(wù)器
ipython3 profile create myserver
這里的myserver是自定義的服務(wù)器名字蒋失。
執(zhí)行之后月匣,命令行會有輸出湖饱,告訴我們生成的文件在哪里遭殉。一般在/home/yourname/.ipython/profile_myserver/這個文件夾下。
我們可以進(jìn)入到該文件夾下崔泵,并查看一下生成的文件:
一般沒有問題的話秒赤,會生成ipython_config.py,ipython_kernel_config.py和ipython_notebook_config.py三個文件憎瘸。
需要修改該ipython_notebook_config.py文件來配置服務(wù)器入篮。不過,我測試的時候這個文件不能生成幌甘,直接手動創(chuàng)建即可潮售。
-
修改配置文件ipython_notebook_config.py
c = get_config() c.IPKernelApp.pylab = 'inline' c.NotebookApp.ip='*' c.NotebookApp.open_browser = False c.NotebookApp.password = u'...........' # 第2步生成的sha1值 c.NotebookApp.port = 8888 # 端口號痊项,設(shè)置一個沒被占用的
-
啟動jupyter notebook服務(wù)器
jupyter notebook --config=/home/hduser/.ipython/profile_myserver/ipython_notebook_config.py
此時可以遠(yuǎn)程瀏覽器訪問jupyter notebook
不同模式下pyspark 的jupyter notebook運(yùn)行
-
本地模式:
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*]
-
starnalone cluster 模式
/usr/local/spark/sbin/start-all.sh PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 2 --executor-memory 512m