原始數(shù)據(jù)
ID 性別 身高
1 M 178
2 M 168
3 F 160
4 F 156
5 M 195
6 F 172
7 M 180
8 M 190
9 M 175
10 F 150
11 F 170
12 F 155
13 F 157
14 M 160
15 F 159
16 M 182
17 M 165
拿到性別(tab鍵分割后取下標為1)
scala> val lines = sc.textFile("file:///home/hadoop/soul/data/m_f_info.txt")
lines: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/soul/data/m_f_info.txt MapPartitionsRDD[15] at textFile at <console>:24
scala> val splits = lines.map(x => x.split("\t")(1))
splits: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[16] at map at <console>:25
得到值等于M的RDD,和值等于F的RDD
scala> val mRDD = splits.filter(x => (x == "M"))
mRDD: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[17] at filter at <console>:25
scala> val fRDD =splits.filter( (_ == "F"))
fRDD: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at filter at <console>:25
使用count算子求總數(shù)
1、男生人數(shù)
scala> mRDD.count
res15: Long = 9
2、女生人數(shù)
scala> fRDD.count
res16: Long = 8