1 Transformations介紹
Transformations(轉換)
從之前的RDD構建一個新的RDD,像map()和filter()
map()
map()接收函數盗尸,把函數應用到RDD的每一個元素柑船,返回新RDD
val lines=sc.parallelize(Array("hello","spark","hello","world","!")
lines.foreach(println)
val lines2 = lines.map(word=>(word,1))
lines2.foreach(println)
filter()
filter()接收函數,返回只包含滿足filter()函數的元素的新RDD
val lines3=lines.filter(word=>word.contains("hello"))
lines3.foreach(println)
flatMap()
對每個輸入元素泼各,輸出多個輸出元素
flat壓扁的意思鞍时,將RDD中元素壓扁后返回一個新的RDD
val inputs=sc.textFile("/home/helloSpark.txt")
inputs.foreach(println)
val lines=inputs.flatMapt(line=>line.split(" "))
lines.foreach(println)
lines.foreach(print)
集合運算
val rdd1 = sc.parallelize(Array("coffe","coffe","panda","monkey","tea"))
rdd1.foreach(println)
val rdd2 =sc.parallelize(Array("coffe","monkey","kitty"))
rdd2.foreach(println)
val rdd_distinct=rdd1.distinct() #去重
rdd_distinct.foreach(println)
val rdd_union = rdd1.union(rdd2) #并集
rdd_union.foreach(println)
val rdd_inter=rdd1.intersection(rdd2) #交集
rdd_inter.foreach(println)
val rdd_sub=rdd1.subtract(rdd2) #包含
rd_sub.foreach(println)