一邑雅、安裝概述
計(jì)劃使用sparksql組件從hive中讀取數(shù)據(jù)片橡,基于前三篇文章妈经,我已經(jīng)安裝好了hadoop淮野、spark和mysql捧书,對(duì)于我想通過(guò)sparksql來(lái)讀取hdfs上的數(shù)據(jù)來(lái)說(shuō),這三個(gè)軟件必不可少骤星。安裝hive數(shù)據(jù)倉(cāng)庫(kù)经瓷,還需要下載hive安裝包以及mysql的驅(qū)動(dòng)。
二洞难、mysql驅(qū)動(dòng)下載
- 下載地址:https://downloads.mysql.com/archives/c-j/
我的mysql版本是5.7.17舆吮,官網(wǎng)上的驅(qū)動(dòng)選擇5.x即可
image.png - 利用xftp或xsecure工具上傳jar包到hadoop用戶的根目錄
三、hive下載
- hive版本的下載需要基于hadoop版本队贱,具體查看hive的官方說(shuō)明色冀,https://hive.apache.org/downloads.html
image.png - 下載地址:https://dlcdn.apache.org/hive/,我選擇的版本是2.3.9
image.png
image.png - 利用xftp或xsecure工具上傳jar包到hadoop用戶的根目錄
四柱嫌、hive安裝
- 解壓hive到hadoop用戶的根目錄
[hadoop@hadoop01 ~]$ tar -zxvf apache-hive-2.3.9-bin.tar.gz
- 解壓mysql到hadoop用戶的根目錄
[hadoop@hadoop01 ~]$ tar -zxvf mysql-connector-java-5.1.49.tar.gz
- 將mysql的驅(qū)動(dòng)包拷貝到hive的lib目錄下
[hadoop@hadoop01 ~]$ cp mysql-connector-java-5.1.49/mysql-connector-java-5.1.49.jar apache-hive-2.3.9-bin/lib/
- 修改hive-env.sh
[hadoop@hadoop01 ~]$ echo $HADOOP_HOME
/home/hadoop/hadoop-2.10.1
[hadoop@hadoop01 ~]$ cd apache-hive-2.3.9-bin/conf/
[hadoop@hadoop01 conf]$ cp hive-env.sh.template hive-env.sh
[hadoop@hadoop01 conf]$ vim hive-env.sh
HADOOP_HOME=/home/hadoop/hadoop-2.10.1
- 修改hive-site.xml锋恬,hive-default.xml.template是hive-site.xml文件的示例文件,但是這個(gè)文件默認(rèn)的配置太多了编丘,不太方便与学,我決定新建這個(gè)文件
[hadoop@hadoop01 conf]$ vim hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.16.100.26:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/local/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://172.16.100.26:9083</value>
</property>
</configuration>
- 在.bashrc中添加hive的環(huán)境變量
[hadoop@hadoop01 conf]$ vim ~/.bashrc
export JAVA_HOME=/usr/local/java
export HADOOP_HOME=/home/hadoop/hadoop-2.10.1
export SCALA_HOME=/home/hadoop/scala-2.12.2
export SPARK_HOME=/home/hadoop/spark-3.0.3-bin-hadoop2.7
export HIVE_HOME=/home/hadoop/apache-hive-2.3.9-bin
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$HIVE_HOME/bin
[hadoop@hadoop01 conf]$ source ~/.bashrc
- 初始化hive的元數(shù)據(jù)
[hadoop@hadoop01 lib]$ cd ../bin
[hadoop@hadoop01 bin]$ ./schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://172.16.100.26:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Mon Dec 06 09:17:23 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Mon Dec 06 09:17:23 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Initialization script completed
Mon Dec 06 09:17:24 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
schemaTool completed
- 登錄mysql數(shù)據(jù)庫(kù),查看是否通過(guò)hive-site.xml的配置創(chuàng)建了hive的源數(shù)據(jù)庫(kù)
[hadoop@hadoop01 lib]$ mysql -uhive -p123456
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.36 MySQL Community Server (GPL)
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
mysql> use hive;
mysql> show tables;
+---------------------------+
| Tables_in_hive |
+---------------------------+
| AUX_TABLE |
| BUCKETING_COLS |
| CDS |
......
| TYPES |
| TYPE_FIELDS |
| VERSION |
| WRITE_SET |
+---------------------------+
57 rows in set (0.01 sec)
- 啟動(dòng)hadoop服務(wù)
mysql> exit
Bye
[hadoop@hadoop01 lib]$ cd $HADOOP_HOME/sbin
[hadoop@hadoop01 sbin]$ ./stop-all.sh
[hadoop@hadoop01 sbin]$ ./start-dfs.sh
- 啟動(dòng)hive-server服務(wù)
[hadoop@hadoop01 sbin]$ nohup hive --service hiveserver2 &
- 進(jìn)入hive客戶端
[hadoop@hadoop01 sbin]$ hive
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/java/jdk1.8.0_311/bin:/usr/local/mysql/bin:/root/bin:/home/hadoop/hadoop-2.10.1/bin:/home/hadoop/hadoop-2.10.1/sbin:/home/hadoop/scala-2.12.2/bin:/home/hadoop/spark-3.0.3-bin-hadoop2.7/bin:/home/hadoop/hadoop-2.10.1/bin:/home/hadoop/hadoop-2.10.1/sbin:/home/hadoop/scala-2.12.2/bin:/home/hadoop/spark-3.0.3-bin-hadoop2.7/bin:/home/hadoop/apache-hive-2.3.9-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/home/hadoop/apache-hive-2.3.9-bin/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
- hive安裝結(jié)果測(cè)試
hive> create database test;
OK
Time taken: 0.19 seconds
hive> show databases;
OK
default
test
Time taken: 0.017 seconds, Fetched: 2 row(s)
- hdfs界面驗(yàn)證嘉抓,http://172.16.100.26:50070/explorer.html#/usr/local/warehouse
image.png - 開(kāi)啟metastore服務(wù)索守,使用thrift協(xié)議保證spark與hive的通信
hive> exit
[hadoop@hadoop01 sbin]$ nohup hive --service metastore &
- 配置spark,使spark能夠讀取hive數(shù)據(jù)庫(kù)抑片,將hive-site.xml復(fù)制到spark的conf目錄下
[hadoop@hadoop01 sbin]$ cd $HIVE_HOME/conf
[hadoop@hadoop01 conf]$ cp hive-site.xml $SPARK_HOME/conf
- 通過(guò)spark-sql交互界面卵佛,查看能否訪問(wèn)hive數(shù)據(jù)庫(kù)表
[hadoop@hadoop01 conf]$ cd $SPARK_HOME/bin
[hadoop@hadoop01 bin]$ ./spark-sql
...
spark-sql> show databases;
21/12/06 10:03:38 INFO CodeGenerator: Code generated in 154.101485 ms
21/12/06 10:03:38 INFO CodeGenerator: Code generated in 5.977354 ms
default
test
Time taken: 1.688 seconds, Fetched 2 row(s)
21/12/06 10:03:38 INFO SparkSQLCLIDriver: Time taken: 1.688 seconds, Fetched 2 row(s)
五、我遇到的問(wèn)題
- 在配置hive-site.xml時(shí)敞斋,已經(jīng)完成了初始化mysql的元數(shù)據(jù)庫(kù)级遭,但是我想重新修改一下數(shù)據(jù)庫(kù)配置信息。修改完成后渺尘,再次執(zhí)行./schematool -initSchema -dbType mysql挫鸽,提示我格式化失敗。
解決方案鸥跟,使用mysql的root用戶丢郊,刪除已經(jīng)初始化后的數(shù)據(jù)庫(kù),重新再執(zhí)行即可医咨。 - 明文登錄mysql數(shù)據(jù)庫(kù)時(shí)枫匾,即直接將密碼寫在命令行上,會(huì)提示warning拟淮,可忽略這個(gè)安全提示
mysql: [Warning] Using a password on the command line interface can be insecure.
這樣操作干茉,才是正確的做法:
[hadoop@hadoop01 ~]$ mysql -uhive -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 31
Server version: 5.7.36 MySQL Community Server (GPL)
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
不過(guò)無(wú)傷大雅,開(kāi)發(fā)環(huán)境