1.達(dá)成目標(biāo)
讀取采用逗號分隔的txt文件數(shù)據(jù)味榛,通過sql對于txt文件中的數(shù)據(jù)進(jìn)行查詢
2.實現(xiàn)
2.1 數(shù)據(jù)示例
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
2.2 思路
讀取文件,形成javaRDD對象唤反,對于每行數(shù)據(jù)進(jìn)行分隔逢倍,形成以Row封裝的JavaRDD<Row>對象,定義數(shù)據(jù)的格式后茉唉,結(jié)合Row對象,生成新的Dataset<Row>,調(diào)用sqlContext的sql方法度陆,對于Dataset<Row>進(jìn)行查詢艾凯,得到結(jié)果
2.3 實現(xiàn)
代碼:
package com.surfilter.spark.java;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
/**
* description: RDD2DataFrameByJson <br>
* date: 2020/3/28 4:10 下午 <br>
* author: yaohao <br>
* version: 1.0 <br>
*/
public class RDD2DataFrameByJson {
public static void main(String args[]) {
//創(chuàng)建一個sparksession
SparkSession spark = SparkSession
.builder()
.appName("RDD2DataFrameProgrammatically")
.master("local")
.getOrCreate();
//讀取文件,創(chuàng)建一個javaRDD懂傀,讀取文件的textFile獲取的是RDD方法趾诗,需要使用toJavaRDD,轉(zhuǎn)換為javaRDD對象
//這里的數(shù)據(jù)結(jié)構(gòu)為 {"name":"Justin", "age":19} 每行一條數(shù)據(jù)
Dataset<Row> rfDataset = spark.read().json("/Users/yaohao/tools/spark-2.4.5-bin-hadoop2.7/examples/src/main/resources/people.json");
//可以理解為注冊城一張表蹬蚁,支持后面采用sql方式進(jìn)行查詢
rfDataset.registerTempTable("person");
//執(zhí)行查詢語句
Dataset<Row> result = rfDataset.sqlContext().sql("select age,name from person where age>=18");
result.show();
}
}