DataFrame報(bào) notFoundClass any類
=>DataFrame數(shù)據(jù)的類型為: row, 且每個(gè)row對(duì)象存儲(chǔ)的值為 any,應(yīng)該將any對(duì)象轉(zhuǎn)化為 string等類型
spark程序 notFoundClass sparkSession 或者 hadoop.fs等錯(cuò)誤
=>本質(zhì)上,包缺失導(dǎo)致的類缺失, 需要在pom文件中重新導(dǎo)包,并且scoop不能定義為 ?provided
Intellij IDEA 編譯提示: ?"Test is already defined as object Test"
=>test對(duì)象已經(jīng)被定義了聪铺,說(shuō)明有多個(gè)test對(duì)象, 確認(rèn)自己只寫(xiě)了一個(gè)Test,發(fā)現(xiàn)原來(lái)是創(chuàng)建工程的時(shí)候撒桨,文件夾main和scala都定義為Sources,把最頂層的目錄 main 的Source 屬性去掉凤类,解決
打jar包在集群中運(yùn)行spark程序, 不能加載配置信息
=>spark-submit master=資源模式 class=運(yùn)行類 files=加載的參數(shù) executor=一系列參數(shù)?
注: jars 是分發(fā)給每個(gè)executor的包資源
注: files 是分發(fā)給每個(gè)executor的配置參數(shù)文件
通過(guò) 某個(gè)節(jié)點(diǎn) ?來(lái)查找的webui頁(yè)面
=> 通過(guò)個(gè)人連接的機(jī)器來(lái)查找hosts,來(lái)獲取ip的映射關(guān)系, ?使用 whereis? 命令查找zk, yarn, namenode, resourceManager 等配置參數(shù)文件
spark2.x 讀取hive數(shù)據(jù)
=>最關(guān)鍵的是, 一定要在files 中加上 hive-site.xml配置文件,這樣spark才能連接到hive數(shù)據(jù)庫(kù), 直接使用sparkSession.sql(" ")命令讀取hive表數(shù)據(jù) ( 理解spark on hive 運(yùn)行模式 ?)
spark 時(shí)間數(shù)據(jù)類型
=>時(shí)間戳格式的數(shù)據(jù)在ludp上查詢是long類型數(shù)字,但是在程序運(yùn)行階段是"yyyy-MM-dd hh:mm:ss"
spark 加載mysql數(shù)據(jù)報(bào)錯(cuò): ?沒(méi)有發(fā)現(xiàn)Driver驅(qū)動(dòng)類
=>spark使用mysql中的數(shù)據(jù),最好使用 spark.read.format("jdbc").options(Map(xxx)).load
如果使用 spark.read.jdbc() ? 會(huì)報(bào)錯(cuò)
將DataFrame數(shù)據(jù)寫(xiě)入 hive 分區(qū)表
spark.sql(" insert into insertTableName partition( month=xx, day=xx ) select column1,column2 from sourceTable ")
注意: 因?yàn)閔ive是讀時(shí)模式,所以 兩個(gè)表的字段名可以不相同,但是字段數(shù)量必須一樣,準(zhǔn)確來(lái)說(shuō),只要把表數(shù)據(jù)存放在此分區(qū)目錄下就可以了
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:D:/IDEA/workspaces/mvn5/spark-warehouse
解決:
It's the SPARK-15565 issue in Spark 2.0 on Windows with a simple solution (that appears to be part of Spark's codebase that may soon be released as 2.0.2 or 2.1.0).
The solution in Spark 2.0.0 is to set spark.sql.warehouse.dir to some properly-referenced directory, say file:///c:/Spark/spark-2.0.0-bin-hadoop2.7/spark-warehouse that uses /// (triple slashes)
此處需要將spark.sql.warehouse.dir指定到本地正確的倉(cāng)庫(kù)地址,修改代碼為:
=>SparkSession.builder() .appName("1") .master("local").config("spark.sql.warehouse.dir", "file:///E:\\apache\\jartest\\mvndatabase") .getOrCreate()