在之前的文章中介紹了通過(guò)spark-shell訪問(wèn)hive中數(shù)據(jù)的方法,那么在IDEA中應(yīng)該怎樣連接Hive并訪問(wèn)數(shù)據(jù)呢作谭?
網(wǎng)上有很多篇文章介紹,但可能是因?yàn)榄h(huán)境不同,訪問(wèn)過(guò)程中出現(xiàn)了很多問(wèn)題鸳君,在此記錄一下
一、初始環(huán)境
1.最開(kāi)始患蹂,我的pom文件中有以下依賴:
<scala.version>2.11.8</scala.version>
<spark.version>2.3.1</spark.version>
<hadoop.version>2.6.0-cdh5.7.0</hadoop.version>
<hive.version>1.1.0-cdh5.7.0</hive.version>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.27</version>
</dependency>
2.resource文件夾中或颊,將hive-site.xml, core-site.xml和 hdfs-site.xml拷貝進(jìn)來(lái)三個(gè)配置文件內(nèi)容分別如下:(hive-site中有一些是從網(wǎng)上直接copy過(guò)來(lái)的,有坑传于,一個(gè)一個(gè)講囱挑;另外,嘗試了一下只有hive-site.xml沒(méi)有另外兩個(gè)配置文件沼溜,也可以訪問(wèn)=_=)
1)hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
#用于存儲(chǔ)不同map/reduce階段的執(zhí)行計(jì)劃和這些階段的中間輸出結(jié)果
<name>hive.exec.scratchdir</name>
<value>hdfs://192.168.137.141:9000/hive/tmp</value>
<description>Scratch space for Hive jobs</description>
</property>
<property>
#hive數(shù)據(jù)存儲(chǔ)在hdfs上的目錄
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://192.168.137.141:9000/hive/warehouse</value>
<description>location of default database for the warehous</description>
</property>
<property>
#Hive實(shí)時(shí)查詢?nèi)罩舅诘哪夸浧教簦绻撝禐榭眨瑢⒉粍?chuàng)建實(shí)時(shí)的查詢?nèi)罩? <name>hive.querylog.location</name>
<value>hdfs://192.168.137.141:9000/hive/log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
# hive元數(shù)據(jù)服務(wù)的地址
<name>hive.metastore.uris</name>
<value>thrift://192.168.137.141:9083</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>false</value>
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
<property>
#監(jiān)聽(tīng)的TCP端口號(hào), 默認(rèn)為 10000
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface.Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>
</property>
<property> #元數(shù)據(jù)schema驗(yàn)證
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
<property>
#元數(shù)據(jù)地址
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.137.141:3306/ruozedata_basic03?createDatabaseIfNotExist=true</value>
<description>The default connection string for the database that stores temporary hive statistics.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
</configuration>
2)hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.137.141:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>192.168.137.141:50091</value>
</property>
</configuration>
3)core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.137.141:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>hdfs://192.168.137.141:9000/hadoop/tmp</value>
</property>
</configuration>
3.IDEA開(kāi)發(fā)代碼如下:
import org.apache.spark.sql.SparkSession
object HiveConnApp {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[2]").appName("HiveConnApp")
.enableHiveSupport()
.getOrCreate()
spark.sql("show databases").show(false)
spark.sql("use ruozedata")
spark.sql("show tables").show(false)
}
}
二系草、運(yùn)行過(guò)程中可能出現(xiàn)的幾個(gè)錯(cuò)誤及解決辦法
1.首次運(yùn)行上述代碼通熄,嘗試連接
//報(bào)錯(cuò)
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
查看enableHiveSupport()的源碼
/**
* Enables Hive support, including connectivity to a persistent Hive metastore, support for
* Hive serdes, and Hive user-defined functions.
*
* @since 2.0.0
*/
def enableHiveSupport(): Builder = synchronized {
if (hiveClassesArePresent) {
config(CATALOG_IMPLEMENTATION.key, "hive")
} else {
throw new IllegalArgumentException(
"Unable to instantiate SparkSession with Hive support because " +
"Hive classes are not found.")
}
}
發(fā)現(xiàn)hiveClassesArePresent判斷為true才可以,不然就報(bào)上面的錯(cuò)誤找都,繼續(xù)深入
/**
* @return true if Hive classes can be loaded, otherwise false.
*/
private[spark] def hiveClassesArePresent: Boolean = {
try {
Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
true
} catch {
case _: ClassNotFoundException | _: NoClassDefFoundError => false
}
}
hiveClassesArePresent判斷為true需要兩個(gè)條件唇辨,一個(gè)是HIVE_SESSION_STATE_BUILDER_CLASS_NAME(private val HIVE_SESSION_STATE_BUILDER_CLASS_NAME =
"org.apache.spark.sql.hive.HiveSessionStateBuilder"),一個(gè)是"org.apache.hadoop.hive.conf.HiveConf"
spark-hive的jar包我已經(jīng)導(dǎo)入了能耻,猜想是不是因?yàn)闆](méi)有hadoop的jar包
于是添加依賴
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
2.再次運(yùn)行赏枚,之前的錯(cuò)誤沒(méi)有了,出現(xiàn)了新的錯(cuò)誤
Exception in thread "main" java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
有人在網(wǎng)上問(wèn)過(guò)這個(gè)問(wèn)題:
https://stackoverflow.com/questions/44151374/getting-either-exception-java-lang-nosuchfielderror-metastore-client-socket-li
官網(wǎng)描述
于是嘗試把hive版本改成1.2.1
<hive.version>1.2.1</hive.version>
3.再次運(yùn)行嚎京,之前的錯(cuò)誤沒(méi)有了嗡贺,又出現(xiàn)了新的錯(cuò)誤
HiveConf of name hive.metastore.local does not exist
于是嘗試把hive-site.xml文件中將hive.metastore.local設(shè)為false的配置刪掉
<property>
<name>hive.metastore.local</name>
<value>false</value>
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
4.錯(cuò)誤消失,運(yùn)行又報(bào)另外一個(gè)錯(cuò)
metastore: Failed to connect to the MetaStore Server...
于是嘗試把hive-site.xml文件中配置hive元數(shù)據(jù)服務(wù)地址的配置刪掉
<property>
# hive元數(shù)據(jù)服務(wù)的地址
<name>hive.metastore.uris</name>
<value>thrift://192.168.137.141:9083</value>
</property>
5.錯(cuò)誤消失鞍帝,運(yùn)行繼續(xù)報(bào)錯(cuò)
Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: hdfs://192.168.137.141:9000/hive/tmp on HDFS should be writable. Current permissions are: rwxr-xr-x;
提示hdfs中/hive/tmp文件夾沒(méi)有寫(xiě)權(quán)限诫睬,于是將權(quán)限改為777
[hadoop@hadoop001 ~]$ hadoop fs -chmod -R 777 /hive/tmp
[hadoop@hadoop001 ~]$ hadoop fs -ls /hive
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2018-11-02 20:08 /hive/log
drwxrwxrwx - hadoop supergroup 0 2018-11-02 20:08 /hive/tmp
drwxr-xr-x - hadoop supergroup 0 2018-11-02 20:08 /hive/warehouse
6.再次運(yùn)行,終于訪問(wèn)成功
//IDEA訪問(wèn)hive成功
// spark.sql("show databases").show(false)
+---------------+
|databaseName |
+---------------+
|default |
|hive |
|hive2_ruozedata|
|hive3 |
|ruozedata |
+---------------+
// spark.sql("use ruozedata")
// spark.sql("show tables").show(false)
+---------+-----------------------+-----------+
|database |tableName |isTemporary|
+---------+-----------------------+-----------+
|ruozedata|a |false |
|ruozedata|b |false |
|ruozedata|city_info |false |
|ruozedata|dual |false |
|ruozedata|emp_sqoop |false |
|ruozedata|order_4_partition |false |
|ruozedata|order_mulit_partition |false |
|ruozedata|order_partition |false |
|ruozedata|product_info |false |
|ruozedata|product_rank |false |
|ruozedata|ruoze_dept |false |
|ruozedata|ruozedata_dynamic_emp |false |
|ruozedata|ruozedata_emp |false |
|ruozedata|ruozedata_emp2 |false |
|ruozedata|ruozedata_emp3_new |false |
|ruozedata|ruozedata_emp4 |false |
|ruozedata|ruozedata_emp_partition|false |
|ruozedata|ruozedata_person |false |
|ruozedata|ruozedata_static_emp |false |
|ruozedata|user_click |false |
+---------+-----------------------+-----------+
only showing top 20 rows
//進(jìn)入hive查看
hive> show databases;
OK
default
hive
hive2_ruozedata
hive3
ruozedata
hive> use ruozedata;
OK
hive> show tables;
OK
a
b
city_info
dual
emp_sqoop
order_4_partition
order_mulit_partition
order_partition
product_info
product_rank
ruoze_dept
ruozedata_dynamic_emp
ruozedata_emp
ruozedata_emp2
ruozedata_emp3_new
ruozedata_emp4
ruozedata_emp_partition
ruozedata_person
ruozedata_static_emp
user_click
user_click_tmp
新建一個(gè)數(shù)據(jù)庫(kù)
spark.sql("create database test_1")
運(yùn)行代碼帕涌,報(bào)錯(cuò)
ERROR log: Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=zh, access=WRITE, inode="/hive/warehouse":hadoop:supergroup:drwxr-xr-x
又是沒(méi)有寫(xiě)的權(quán)限摄凡,干脆把hive下的三個(gè)文件夾權(quán)限全部改成777
[hadoop@hadoop001 bin]$ hadoop fs -ls /hive
Found 3 items
drwxrwxrwx - hadoop supergroup 0 2018-11-02 20:08 /hive/log
drwxrwxrwx - hadoop supergroup 0 2018-11-02 23:59 /hive/tmp
drwxrwxrwx - hadoop supergroup 0 2018-11-02 20:08 /hive/warehouse
再次執(zhí)行語(yǔ)句spark.sql("create database test_1")
[hadoop@hadoop001 bin]$ hadoop fs -ls /hive/warehouse
Found 1 items
drwxrwxrwx - zh supergroup 0 2018-11-03 10:48 /hive/warehouse/test_1.db
已經(jīng)新建了數(shù)據(jù)庫(kù)test_1.db,并存在hdfs的/hive/warehouse/文件夾下
查看Hive里已經(jīng)有了test_1
hive> show databases;
OK
default
hive
hive2_ruozedata
hive3
ruozedata
test_1
查看MySQL里的元數(shù)據(jù)信息
mysql> mysql> select * from dbs;
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
| 1 | Default Hive database | hdfs://192.168.137.141:9000/user/hive/warehouse | default | public | ROLE |
| 6 | NULL | hdfs://192.168.137.141:9000/user/hive/warehouse/hive.db | hive | hadoop | USER |
| 9 | this is ruozedata 03 test database | hdfs://192.168.137.141:9000/user/hive/warehouse/hive2_ruozedata.db | hive2_ruozedata | hadoop | USER |
| 10 | NULL | hdfs://192.168.137.141:9000/zh | hive3 | hadoop | USER |
| 11 | NULL | hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db | ruozedata | hadoop | USER |
| 16 | | hdfs://192.168.137.141:9000/hive/warehouse/test_1.db | test_1 | NULL | USER |
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
6 rows in set (0.00 sec)
/tmp和/log兩個(gè)文件夾暫時(shí)還都是空的
[hadoop@hadoop001 bin]$ hadoop fs -ls /hive/tmp
Found 1 items
drwx------ - zh supergroup 0 2018-11-03 10:48 /hive/tmp/zh
[hadoop@hadoop001 bin]$ hadoop fs -ls /hive/tmp/zh
[hadoop@hadoop001 bin]$ hadoop fs -ls /hive/log
***最終的hive-site.xml文件內(nèi)容
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
#用于存儲(chǔ)不同map/reduce階段的執(zhí)行計(jì)劃和這些階段的中間輸出結(jié)果
<name>hive.exec.scratchdir</name>
<value>hdfs://192.168.137.141:9000/hive/tmp</value>
<description>Scratch space for Hive jobs</description>
</property>
<property>
#hive數(shù)據(jù)存儲(chǔ)在hdfs上的目錄蚓曼,默認(rèn)就在/user/hive/warehouse文件夾下亲澡,可以不設(shè)置
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://192.168.137.141:9000/user/hive/warehouse</value>
<description>location of default database for the warehous</description>
</property>
<property>
#Hive實(shí)時(shí)查詢?nèi)罩舅诘哪夸洠绻撝禐榭杖野妫瑢⒉粍?chuàng)建實(shí)時(shí)的查詢?nèi)罩? <name>hive.querylog.location</name>
<value>hdfs://192.168.137.141:9000/hive/log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.metastore.local</name>
<value>true</value>
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
<property>
#監(jiān)聽(tīng)的TCP端口號(hào), 默認(rèn)為 10000
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface.Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>
</property>
<property> #元數(shù)據(jù)schema驗(yàn)證
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
<property>
#元數(shù)據(jù)地址
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.137.141:3306/ruozedata_basic03?createDatabaseIfNotExist=true</value>
<description>The default connection string for the database that stores temporary hive statistics.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
</configuration>