1.spark-submit方式:將jar上傳到集群,然后到/bin目錄下通過(guò)spark-submit的方式,執(zhí)行spark任務(wù):
格式:
spark-submit --master spark的地址 --class 全類名 jar包地址 參數(shù)
舉個(gè)栗子:運(yùn)行spark自帶的測(cè)試程序,計(jì)算pi的值
./spark-submit --master spark://node3:7077 --class org.apache.spark.examples.SparkPi /usr/local/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar 500
運(yùn)行結(jié)果:
Pi is roughly 3.1414508628290174
2.spark-shell方式:相當(dāng)于REPL工具,命令行工具,本身也是一個(gè)Application
2.1本地模式:不需要連接到Spark集群,在本地直接運(yùn)行语婴,用于測(cè)試
啟動(dòng)命令:bin/spark-shell 后面不寫(xiě)任何參數(shù)附鸽,代表本地模式:
[root@bigdata111 bin]# ./spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/06/18 17:52:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/06/18 17:52:27 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
19/06/18 17:52:27 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
19/06/18 17:52:29 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.226.111:4040
Spark context available as 'sc' (master = local[*], app id = local-1560851538355).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.
scala> [root@bigdata111 bin]#
2.2集群模式
啟動(dòng)命令:bin/spark-shell --master spark://.....
[root@bigdata111 spark-2.1.0-bin-hadoop2.7]# ./bin/spark-shell --master spark://bigdata111:7077
啟動(dòng)之后:
[root@bigdata111 spark-2.1.0-bin-hadoop2.7]# ./bin/spark-shell --master spark://bigdata111:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/06/18 22:47:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/06/18 22:48:07 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.226.111:4040
Spark context available as 'sc' (master = spark://bigdata111:7077, app id = app-20190618224755-0000).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
說(shuō)明:
Spark context available as 'sc' (master = spark://bigdata111:7077, app id = app-20190618224755-0000).
Spark session available as 'spark'.
Spark session : Spark2.0以后提供的源请,利用session可以訪問(wèn)所有spark組件(core sql..)
'spark' 'sc' 兩個(gè)對(duì)象植旧,可以直接使用
舉個(gè)栗子:在Spark shell中 開(kāi)發(fā)一個(gè)wordCount程序
(*)讀取一個(gè)本地文件网持,將結(jié)果打印到屏幕上尼斧。
注意:示例必須只有一個(gè)worker 且本地文件與worker在同一臺(tái)服務(wù)器上荚恶。
scala> sc.textFile("/usr/local/tmp_files/test_WordCount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
結(jié)果:
res0: Array[(String, Int)] = Array((is,1), (love,2), (capital,1), (Beijing,2), (China,2), (hehehehehe,1), (I,2), (of,1), (the,1))
(*)讀取一個(gè)hdfs文件撩穿,進(jìn)行WordCount操作,并將結(jié)果寫(xiě)回hdfs
scala> sc.textFile("hdfs://bigdata111:9000/word.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).saveAsTextFile("hdfs://bigdata111:9000/result")
說(shuō)明:這里textFile()里的地址是HDFS上地址
spark任務(wù)執(zhí)行完成之后,會(huì)把結(jié)果存放在hdfs上的result文件夾里:
查看:
[root@bigdata111 opt]# hdfs dfs -ls /result/
Found 3 items
-rw-r--r-- 3 root supergroup 0 2019-06-18 23:02 /result/_SUCCESS
-rw-r--r-- 3 root supergroup 73 2019-06-18 23:02 /result/part-00000
-rw-r--r-- 3 root supergroup 22 2019-06-18 23:02 /result/part-00001
[root@bigdata111 opt]# hdfs dfs -cat /result/*
(shuai,1)
(are,1)
(b,1)
(best,1)
(zouzou,1)
(word,1)
(hello,1)
(world,1)
(you,1)
(a,1)
(the,1)