我們使用mysql來存儲hive的元數(shù)據(jù)(metadata)乔询,這里關(guān)于元數(shù)據(jù)以及元數(shù)據(jù)的存儲方式摘錄了這篇文章里的部分內(nèi)容
Hive安裝配置指北(含Hive Metastore詳解)
1.1 Metadata白粉、Metastore作用
metadata即元數(shù)據(jù)澈圈。元數(shù)據(jù)包含用Hive創(chuàng)建的database臭脓、table等的元信息。
元數(shù)據(jù)存儲在關(guān)系型數(shù)據(jù)庫中。如Derby、MySQL等盹兢。
Metastore的作用是:客戶端連接metastore服務(wù),metastore再去連接MySQL數(shù)據(jù)庫來存取元數(shù)據(jù)守伸。有了metastore服務(wù)绎秒,就可以有多個客戶端同時連接,而且這些客戶端不需要知道MySQL數(shù)據(jù)庫的用戶名和密碼尼摹,只需要連接metastore 服務(wù)即可见芹。
1.2 三種配置方式區(qū)別
內(nèi)嵌模式使用的是內(nèi)嵌的Derby數(shù)據(jù)庫來存儲元數(shù)據(jù)剂娄,也不需要額外起Metastore服務(wù)寸谜。這個是默認的畅形,配置簡單,但是一次只能一個客戶端連接鸟悴,適用于用來實驗徘铝,不適用于生產(chǎn)環(huán)境耳胎。
本地元存儲和遠程元存儲都采用外部數(shù)據(jù)庫來存儲元數(shù)據(jù),目前支持的數(shù)據(jù)庫有:MySQL惕它、Postgres怕午、Oracle、MS SQL Server.在這里我們使用MySQL淹魄。
本地元存儲和遠程元存儲的區(qū)別是:本地元存儲不需要單獨起metastore服務(wù)郁惜,用的是跟hive在同一個進程里的metastore服務(wù)。遠程元存儲需要單獨起metastore服務(wù)揭北,然后每個客戶端都在配置文件里配置連接到該metastore服務(wù)扳炬。遠程元存儲的metastore服務(wù)和hive運行在不同的進程里。
在生產(chǎn)環(huán)境中搔体,建議用遠程元存儲來配置Hive Metastore恨樟。
下面是安裝過程:
復制、解壓疚俱、改名
sudo cp apache-hive-1.1.0-bin.tar.gz /usr/local
cd /usr/local
sudo tar zxvf ./apache-hive-1.1.0-bin.tar.gz
sudo mv apache-hive-1.1.0-bin hive
修改環(huán)境變量
sudo nano /etc/profile
在下面加上兩行:
export HIVE_HOME=/usr/local/hive
export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:$PATH
進入MySQL
mysql –u root –p
創(chuàng)建用戶hive劝术,密碼hive
GRANT USAGE ON *.* TO 'hive'@'%' IDENTIFIED BY 'hive' WITH GRANT OPTION;
create database hive;
grant all on hive.* to hive@'%' identified by 'hive';
grant all on hive.* to hive@'localhost' identified by 'hive';
flush privileges;
exit;
驗證hive用戶
mysql -uhive -phive
show databases;
看到如下反饋信息,則說明創(chuàng)建成功
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| test |
+--------------------+
3 rows in set (0.00 sec)
退出mysql
Exit
修改hive-site.xml
sudo cp hive/conf/hive-default.xml.template hive/conf/hive-site.xml
sudo nano hive/conf/hive-site.xml
添加以下屬性
<property>
<name>javax.jdo.option.ConnectionURL </name>
<value>jdbc:mysql://master:3306/hive </value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName </name>
<value>com.mysql.jdbc.Driver </value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword </name>
<value>hive </value>
</property>
<property>
<name>hive.hwi.listen.port </name>
<value>9999 </value>
<description>This is the port the Hive Web Interface will listen on </description>
</property>
<property>
<name>datanucleus.autoCreateSchema </name>
<value>true</value>
</property>
<property>
<name>datanucleus.fixedDatastore </name>
<value>false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/hive/iotmp</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/hive/iotmp</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/local/hive/iotmp</value>
<description>Location of Hive run time structured log file</description>
</property>
拷貝mysql-connector-java-5.1.43-bin.jar到hive的lib下面
mv /home/hadoop-sna/Downloads/mysql-connector-java-5.1.43-bin.jar /usr/local/hive/lib/
把jline-2.12.jar拷貝到hadoop相應(yīng)的目錄下呆奕,替代jline-0.9.94.jar养晋,否則啟動會報錯
cp /usr/local/hive/lib/jline-2.12.jar /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/
mv /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar.bak
創(chuàng)建hive臨時文件夾
mkdir /usr/local/hive/iotmp
遇到的問題
- 運行hive,報錯:Unable to instantiate org.apache.hadoop.hive.
嘗試修改了一下MySQL的日志格式:
我第一次安裝的時候改成這樣就可以了梁钾,但是在后續(xù)安裝sqoop的時候绳泉,這里改成"ROW"并不可以,當時出現(xiàn)了問題2姆泻。mysq -u root -p mysql> set global binlog_format='ROW';
- 當啟動Hive的時候報錯:
Caused by: javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
NestedThrowables: java.sql.SQLException: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
這個問題是由于hive的元數(shù)據(jù)存儲MySQL配置不當引起的零酪,可以這樣解決:
mysql> set global binlog_format='MIXED';
- 再次啟動hive,然后會報錯java.lang.RuntimeException: java.io.IOException: 權(quán)限不夠
改完權(quán)限后就能成功啟動了sudo chown -R hadoop-sna hive sudo chgrp -R hadoop-sna hive