下載必要的組建
組建地址
spark : http://spark.apache.org/downloads.html
hadoop : http://hadoop.apache.org/releases.html
jdk: http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html
hadoop-commin : https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip (for windows7)
Scala : http://www.scala-lang.org/downloads
安裝
這里的scala是對需要使用scala的語言的人準(zhǔn)備的腔召,如果不用可以不用安裝
a. 安裝jdk和scala导狡,默認(rèn)步驟即可儡循,
b. 解壓spark (D:\spark-2.0.0-bin-hadoop2.7)
c. 解壓hadoop (D:\hadoop2.7)
d. 解壓hadoop-commin (for w7)
e. copy hadoop-commin/bin to hadoop/bin (for w7)
就是把winutils.exe拷貝到hadoop-2.7.4的bin目錄下
配置環(huán)境變量
貼一下別人的圖踪危,參考
https://blog.csdn.net/HHTNAN/article/details/78391409
https://blog.csdn.net/qq_38799155/article/details/78254580
https://blog.csdn.net/hjxinkkl/article/details/57083549?winzoom=1
JAVA_HOME:
并在Path中添加
%JAVA_HOME%\bin
CLASSPATH :
SPARK_HOME :
并在Path中添加
%SPARK_HOME%\bin
%SPARK_HOME%\sbin
PYTHONPATH :
將spark\python\pyspark整個(gè)文件夾復(fù)制到Anaconda3\Lib\site-packages文件夾中
HADOOP_HOME
注意要把winutils.exe拷貝到hadoop-2.7.4的bin目錄下脆霎,為了支持python語言
并在Path中添加
%HADOOP_HOME%\bin
總圖
用戶添加的變量
系統(tǒng)的Path變量
獻(xiàn)上測試程序:
import sys
from pyspark import SparkContext
if __name__ == "__main__":
master = "local"
if len(sys.argv) == 2:
master = sys.argv[1]
try:
sc.stop()
except:
pass
sc = SparkContext(master, "WordCount")
lines = sc.parallelize(["pandas", "i like pandas"])
result = lines.flatMap(lambda x: x.split(" ")).countByValue()
for key, value in result.items():
print ("%s %i" % (key, value))