Spark Hive

版本: 2.3.0

準(zhǔn)備

保證spark的的各個(gè)節(jié)點(diǎn)上都有hive的包谎僻。

將hive的配置文件, hive-site.xml 拷貝到spark的 conf文件下 寓辱。

配置

在hive-site.xml中的參數(shù) hive.metastore.warehouse.dir 自版本 spark2.0.0 起廢棄了艘绍。 需要使用 spark.sql.warehouse.dir 來(lái)指定默認(rèn)的數(shù)據(jù)倉(cāng)庫(kù)目錄。 需要給該目錄提供讀寫(xiě)權(quán)限秫筏。

(但是實(shí)際來(lái)看诱鞠,hive.metastore.warehouse.dir 仍然在起作用,并且通過(guò)spark-sql創(chuàng)建的表也會(huì)在相應(yīng)的目錄下存在这敬。 )

啟動(dòng) hive

hive  --service  metastore &  
hiveserver2 & 

啟動(dòng) spark

start-all.sh  
star-histrory-server.sh  

啟動(dòng)spark sql 客戶(hù)端 驗(yàn)證測(cè)試

./spark-sql --master spark://node202.hmbank.com:7077   

18/05/31 12:00:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/31 12:00:31 WARN HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist
18/05/31 12:00:31 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
18/05/31 12:00:31 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
18/05/31 12:00:31 WARN HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist
18/05/31 12:00:32 INFO metastore: Trying to connect to metastore with URI thrift://node203.hmbank.com:9083
18/05/31 12:00:32 INFO metastore: Connected to metastore.
18/05/31 12:00:32 INFO SessionState: Created local directory: /var/hive/iotmp/0d80c963-6383-42b5-89c6-9c82cbd4e15c_resources
18/05/31 12:00:32 INFO SessionState: Created HDFS directory: /tmp/hive/root/0d80c963-6383-42b5-89c6-9c82cbd4e15c
18/05/31 12:00:32 INFO SessionState: Created local directory: /var/hive/iotmp/hive/0d80c963-6383-42b5-89c6-9c82cbd4e15c
18/05/31 12:00:32 INFO SessionState: Created HDFS directory: /tmp/hive/root/0d80c963-6383-42b5-89c6-9c82cbd4e15c/_tmp_space.db
18/05/31 12:00:32 INFO SparkContext: Running Spark version 2.3.0
18/05/31 12:00:32 INFO SparkContext: Submitted application: SparkSQL::10.30.16.204
18/05/31 12:00:32 INFO SecurityManager: Changing view acls to: root
18/05/31 12:00:32 INFO SecurityManager: Changing modify acls to: root
18/05/31 12:00:32 INFO SecurityManager: Changing view acls groups to: 
18/05/31 12:00:32 INFO SecurityManager: Changing modify acls groups to: 
18/05/31 12:00:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
18/05/31 12:00:33 INFO Utils: Successfully started service 'sparkDriver' on port 33733.
18/05/31 12:00:33 INFO SparkEnv: Registering MapOutputTracker
18/05/31 12:00:33 INFO SparkEnv: Registering BlockManagerMaster
18/05/31 12:00:33 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/05/31 12:00:33 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/05/31 12:00:33 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f425c261-2aa0-4063-8fa2-2ff4f106d948
18/05/31 12:00:33 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
18/05/31 12:00:33 INFO SparkEnv: Registering OutputCommitCoordinator
18/05/31 12:00:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/05/31 12:00:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://node204.hmbank.com:4040
18/05/31 12:00:33 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://node202.hmbank.com:7077...
18/05/31 12:00:33 INFO TransportClientFactory: Successfully created connection to node202.hmbank.com/10.30.16.202:7077 after 30 ms (0 ms spent in bootstraps)
18/05/31 12:00:33 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20180531120033-0000
18/05/31 12:00:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37444.
18/05/31 12:00:33 INFO NettyBlockTransferService: Server created on node204.hmbank.com:37444
18/05/31 12:00:33 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/05/31 12:00:33 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, node204.hmbank.com, 37444, None)
18/05/31 12:00:33 INFO BlockManagerMasterEndpoint: Registering block manager node204.hmbank.com:37444 with 366.3 MB RAM, BlockManagerId(driver, node204.hmbank.com, 37444, None)
18/05/31 12:00:33 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, node204.hmbank.com, 37444, None)
18/05/31 12:00:33 INFO BlockManager: external shuffle service port = 7338
18/05/31 12:00:33 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, node204.hmbank.com, 37444, None)
18/05/31 12:00:33 INFO EventLoggingListener: Logging events to hdfs://hmcluster/user/spark/eventLog/app-20180531120033-0000
18/05/31 12:00:33 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
18/05/31 12:00:34 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
18/05/31 12:00:34 INFO SharedState: loading hive config file: file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/conf/hive-site.xml
18/05/31 12:00:34 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/bin/spark-warehouse').
18/05/31 12:00:34 INFO SharedState: Warehouse path is 'file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/bin/spark-warehouse'.
18/05/31 12:00:34 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/05/31 12:00:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/bin/spark-warehouse
18/05/31 12:00:34 INFO metastore: Mestastore configuration hive.metastore.warehouse.dir changed from /user/hive/warehouse to file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/bin/spark-warehouse
18/05/31 12:00:34 INFO metastore: Trying to connect to metastore with URI thrift://node203.hmbank.com:9083
18/05/31 12:00:34 INFO metastore: Connected to metastore.
18/05/31 12:00:34 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
spark-sql> show databases;
18/05/31 12:02:22 INFO CodeGenerator: Code generated in 171.318399 ms
default
hivecluster
Time taken: 1.947 seconds, Fetched 2 row(s)
18/05/31 12:02:22 INFO SparkSQLCLIDriver: Time taken: 1.947 seconds, Fetched 2 row(s)

可以正常使用 航夺。

創(chuàng)建表 并插入數(shù)據(jù)

> create table spark2 (id int , seq int ,name string) using hive options(fileFormat 'parquet');
Time taken: 0.358 seconds
18/05/31 14:10:47 INFO SparkSQLCLIDriver: Time taken: 0.358 seconds
spark-sql> 
         > desc spark2;
id  int NULL
seq int NULL
name    string  NULL
Time taken: 0.061 seconds, Fetched 3 row(s)
18/05/31 14:10:54 INFO SparkSQLCLIDriver: Time taken: 0.061 seconds, Fetched 3 row(s)
spark-sql> insert into spark2 values( 1,1, 'nn');

查詢(xún)表數(shù)據(jù)

spark-sql> select * from spark2;
18/05/31 14:12:08 INFO FileSourceStrategy: Pruning directories with: 
18/05/31 14:12:08 INFO FileSourceStrategy: Post-Scan Filters: 
18/05/31 14:12:08 INFO FileSourceStrategy: Output Data Schema: struct<id: int, seq: int, name: string ... 1 more fields>
18/05/31 14:12:08 INFO FileSourceScanExec: Pushed Filters: 
18/05/31 14:12:08 INFO CodeGenerator: Code generated in 33.608151 ms
18/05/31 14:12:08 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 249.2 KB, free 365.9 MB)
18/05/31 14:12:08 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 24.6 KB, free 365.8 MB)
18/05/31 14:12:08 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on node204.hmbank.com:37444 (size: 24.6 KB, free: 366.2 MB)
18/05/31 14:12:08 INFO SparkContext: Created broadcast 6 from processCmd at CliDriver.java:376
18/05/31 14:12:08 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
18/05/31 14:12:09 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
18/05/31 14:12:09 INFO DAGScheduler: Got job 5 (processCmd at CliDriver.java:376) with 1 output partitions
18/05/31 14:12:09 INFO DAGScheduler: Final stage: ResultStage 3 (processCmd at CliDriver.java:376)
18/05/31 14:12:09 INFO DAGScheduler: Parents of final stage: List()
18/05/31 14:12:09 INFO DAGScheduler: Missing parents: List()
18/05/31 14:12:09 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[21] at processCmd at CliDriver.java:376), which has no missing parents
18/05/31 14:12:09 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 10.1 KB, free 365.8 MB)
18/05/31 14:12:09 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 4.6 KB, free 365.8 MB)
18/05/31 14:12:09 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on node204.hmbank.com:37444 (size: 4.6 KB, free: 366.2 MB)
18/05/31 14:12:09 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1039
18/05/31 14:12:09 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[21] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0))
18/05/31 14:12:09 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
18/05/31 14:12:09 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, 10.30.16.202, executor 1, partition 0, ANY, 8395 bytes)
18/05/31 14:12:09 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 10.30.16.202:36243 (size: 4.6 KB, free: 366.2 MB)
18/05/31 14:12:09 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 10.30.16.202:36243 (size: 24.6 KB, free: 366.2 MB)
18/05/31 14:12:09 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 595 ms on 10.30.16.202 (executor 1) (1/1)
18/05/31 14:12:09 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
18/05/31 14:12:09 INFO DAGScheduler: ResultStage 3 (processCmd at CliDriver.java:376) finished in 0.603 s
18/05/31 14:12:09 INFO DAGScheduler: Job 5 finished: processCmd at CliDriver.java:376, took 0.607284 s
1   1   nn
Time taken: 0.793 seconds, Fetched 1 row(s)
18/05/31 14:12:09 INFO SparkSQLCLIDriver: Time taken: 0.793 seconds, Fetched 1 row(s)

在hdfs 上的 hive 的 warehouse目錄下查看:

圖片.png

可以看到數(shù)據(jù)已經(jīng)正確的寫(xiě)入到hive中 ;

查看hive 元數(shù)據(jù)庫(kù) 表的存儲(chǔ)情況:

圖片.png

Done!

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末崔涂,一起剝皮案震驚了整個(gè)濱河市阳掐,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌冷蚂,老刑警劉巖锚烦,帶你破解...
    沈念sama閱讀 218,284評(píng)論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異帝雇,居然都是意外死亡涮俄,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,115評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門(mén)尸闸,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)彻亲,“玉大人,你說(shuō)我怎么就攤上這事吮廉“ⅲ” “怎么了?”我有些...
    開(kāi)封第一講書(shū)人閱讀 164,614評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵宦芦,是天一觀的道長(zhǎng)宙址。 經(jīng)常有香客問(wèn)我,道長(zhǎng)调卑,這世上最難降的妖魔是什么抡砂? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,671評(píng)論 1 293
  • 正文 為了忘掉前任大咱,我火速辦了婚禮,結(jié)果婚禮上注益,老公的妹妹穿的比我還像新娘碴巾。我一直安慰自己,他們只是感情好丑搔,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,699評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布厦瓢。 她就那樣靜靜地躺著,像睡著了一般啤月。 火紅的嫁衣襯著肌膚如雪煮仇。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書(shū)人閱讀 51,562評(píng)論 1 305
  • 那天谎仲,我揣著相機(jī)與錄音浙垫,去河邊找鬼。 笑死强重,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的贸人。 我是一名探鬼主播间景,決...
    沈念sama閱讀 40,309評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼艺智!你這毒婦竟也來(lái)了倘要?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 39,223評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤十拣,失蹤者是張志新(化名)和其女友劉穎封拧,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體夭问,經(jīng)...
    沈念sama閱讀 45,668評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡泽西,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,859評(píng)論 3 336
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了缰趋。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片捧杉。...
    茶點(diǎn)故事閱讀 39,981評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖秘血,靈堂內(nèi)的尸體忽然破棺而出味抖,到底是詐尸還是另有隱情,我是刑警寧澤灰粮,帶...
    沈念sama閱讀 35,705評(píng)論 5 347
  • 正文 年R本政府宣布仔涩,位于F島的核電站,受9級(jí)特大地震影響粘舟,放射性物質(zhì)發(fā)生泄漏熔脂。R本人自食惡果不足惜佩研,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,310評(píng)論 3 330
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望锤悄。 院中可真熱鬧韧骗,春花似錦、人聲如沸零聚。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 31,904評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)隶症。三九已至政模,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間蚂会,已是汗流浹背淋样。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 33,023評(píng)論 1 270
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留胁住,地道東北人趁猴。 一個(gè)月前我還...
    沈念sama閱讀 48,146評(píng)論 3 370
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像彪见,于是被迫代替她去往敵國(guó)和親儡司。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,933評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容