1.transformation
lazy:僅僅記錄transformation路徑情龄,不發(fā)生操作
- map 對每一個元素進(jìn)行轉(zhuǎn)換
- filter 對元素進(jìn)行過濾
3.flatmap 把每個元素里面的元素展開
例子
val nums = sc.parallelize(List(1,2,3,4,5,6,7,8,9))
nums.flatMap(x => 1 to x).collect
res7: Array[Int] = Array(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9)
4.mapValues 對每個元素里面的value轉(zhuǎn)換
5.substract 求兩個rdd相減
6.intersection 求兩個rdd并集
7.cartesian 求兩個rdd笛卡爾積
scala> val a = sc.parallelize(Array(1,2,3,4,5))
scala> val b = sc.parallelize(Array(1,2,6))
scala> a.subtract(b).collect
res28: Array[Int] = Array(4, 3, 5)
scala> a.intersection(b).collect
res30: Array[Int] = Array(2, 1)
scala> a.cartesian(b).collect
res32: Array[(Int, Int)] = Array((1,1), (2,1), (1,2), (1,6), (2,2), (2,6), (3,1), (4,1), (5,1), (3,2), (3,6), (4,2), (4,6), (5,2), (5,6))
2.action
action的時候,才進(jìn)行計算
1.collect 返回所有元素的集合
2.count 求元素數(shù)量
3.reduce 依次計算
4.top 求最大的元素元素,可以自己設(shè)置排序方式
5.takeSample 求隨機(jī)
6.takeOrdered 與top排序方式相反