本節(jié)主要內(nèi)容:
Spark Helloworld實(shí)驗(yàn):
運(yùn)用Spark苍凛,計(jì)算詞語出現(xiàn)的次數(shù)
wordcount數(shù)據(jù)準(zhǔn)備
? ? ? ?# echo "Hello World Bye World" > /file0
? ? ? ?# echo "Hello Hadoop Goodbye Hadoop" > /file1
? ? ? ?# sudo -u hdfs hdfs dfs -mkdir -p /usr/spark/wordcount/input
? ? ? ?# sudo -u hdfs hdfs dfs -put file* /user/spark/wordcount/input
? ? ? ?# sudo -u hdfs hdfs dfs -chmod 1777 /user/spark/wordcount/input
? ? ? ?# sudo -u hdfs hdfs dfs -chown -R spark:spark /user/spark/wordcount/input
進(jìn)入spark-shell運(yùn)行腳本
? ? ? ?## sudo -u spark spark-shell
Setting default log level to "WARN".
scala>
scala> val file = sc.textFile("hdfs://cluster1/user/spark/wordcount/input") 定義變量file,指向源文件地址
scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) 調(diào)用file的flatMap方法汰规,將每一行用空格分割,并取出單詞物邑,然后根據(jù)單詞統(tǒng)計(jì)溜哮,并累加
scala> counts.saveAsTextFile("hdfs://cluster1/user/spark/wordcount/output")? 定義文件的輸出
在pig中查看
? ? ? ?# sudo -u hdfs pig
grunt> ls
hdfs://cluster1/user/spark/wordcount/output/_SUCCESS<r 3> 0
hdfs://cluster1/user/spark/wordcount/output/part-00000<r 3> 28
hdfs://cluster1/user/spark/wordcount/output/part-00001<r 3> 23
grunt> cat part-00000
(Bye,1)
(Hello,2)
(World,2)
grunt> cat part-00001
(Goodbye,1)
(Hadoop,2)
grunt>