概述
我們的Hive是HortonWorks提供的1.2.1, 本文檔記錄下我們在使用過程中遇到的問題和解決方法。
問題
高并發(fā)請求時,請求報(bào)錯:Timed out waiting for a free available connection.
原因
我們查看hiveserver2中的日志發(fā)現(xiàn)這些錯誤最終都的cause by都是一樣的
MetaException(message:Unable to update transaction database java.sql.SQLException: Timed out waiting for a free available connection.
at com.jolbox.bonecp.DefaultConnectionStrategy.getConnectionInternal(DefaultConnectionStrategy.java:88)
at com.jolbox.bonecp.AbstractConnectionStrategy.getConnection(AbstractConnectionStrategy.java:90)
at com.jolbox.bonecp.BoneCP.getConnection(BoneCP.java:553)
at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:131)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:1956)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:941)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:882)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5911)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at com.sun.proxy.$Proxy12.lock(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:1947)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:178)
at com.sun.proxy.$Proxy13.lock(Unknown Source)
at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:102)
at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:357)
at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:373)
at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:182)
at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:1079)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1281)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1153)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197)
at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)
at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:253)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
從異常堆棧里可以看到,這個問題查詢的時候申請鎖造成的拥诡。申請鎖的線程太多的時候,BoneCP連接池的代碼會拋出異常氮发。BoneCP連接池的拋錯代碼如下
這個類是DefaultConnectionStrategy渴肉,中文注釋是我加的
@Override
protected Connection getConnectionInternal() throws SQLException {
ConnectionHandle result = pollConnection();
// we still didn't find an empty one, wait forever (or as per config) until our partition is free
if (result == null) {
int partition = (int) (Thread.currentThread().getId() % this.pool.partitionCount);
ConnectionPartition connectionPartition = this.pool.partitions[partition];
// 每個partition內(nèi)部是一個單獨(dú)的LinkedBlockingQueue,來存儲連接對象
// 下面的this.pool.nullOnConnectionTimeout是連接池的一個配置項(xiàng)爽冕,默認(rèn)值是false 仇祭。
// 當(dāng)連接沒拿到的時候,如果這個值是true就返回一個空對象颈畸,否則就拋異常了
try {
result = connectionPartition.getFreeConnections().poll(this.pool.connectionTimeoutInMs, TimeUnit.MILLISECONDS);
if (result == null){
if (this.pool.nullOnConnectionTimeout){
return null;
}
// 08001 = The application requester is unable to establish the connection.
// 這里就是沒獲取到連接乌奇,拋出異常的地方
throw new SQLException("Timed out waiting for a free available connection.", "08001");
}
}
catch (InterruptedException e) {
if (this.pool.nullOnConnectionTimeout){
return null;
}
throw PoolUtil.generateSQLException(e.getMessage(), e);
}
}
return result;
}
解決辦法
這個特性是支持事務(wù)才開啟的嚣艇,而Hive當(dāng)前版本里,對事務(wù)的支持很雞肋华弓,所以設(shè)置一致性的hive.support.concurrency為false,就可以解決這個問題了困乒。
SQL中出現(xiàn)中文時報(bào)不支持的編碼方式
原因
Hive中的CBO組件幫我們對SQL做了優(yōu)化寂屏,但是這個組件只支持ISO-8859-1,所以中文會報(bào)錯
解決辦法
配置 hive.cbo.enable 為false 關(guān)閉CBO優(yōu)化
運(yùn)行一段時間后娜搂,metastore報(bào)錯OOM
原因
對HiveMetastore做dump后得到的結(jié)果根據(jù)MAT的分析迁霎,大量的對象來源于AggregateStatsCache類中的ConcurrentHashMap,從名稱上判斷百宇,這個類是一個緩存類考廉,大概看了下源碼,是在緩存表中分區(qū)的信息携御。這個緩存的開關(guān)屬性由:hive.metastore.aggregate.stats.cache.enabled 配置來決定昌粤,默認(rèn)配置是true。這個緩存類在初始化時也有一系列默認(rèn)參數(shù)來控制緩存內(nèi)的對象數(shù)量啄刹,但默認(rèn)的數(shù)量是100萬個節(jié)點(diǎn)涮坐,每個節(jié)點(diǎn)100萬個元素,所以總共是1億個元素的緩存誓军,而我們metastore啟動內(nèi)存只有2G袱讹。這個類中有一個方法開啟了低優(yōu)先級的線程來清除緩存,但因?yàn)槟J(rèn)配置是:緩存達(dá)到最大存儲元素個數(shù)的90%時才會啟動這個線程昵时,所以我們的環(huán)境中捷雕,這個線程一直沒有啟動過,但內(nèi)存已經(jīng)占到最大了壹甥。
解決辦法
因?yàn)槲覀兊脑獢?shù)據(jù)都是存儲在本地機(jī)房中獨(dú)立的MySQL里救巷,且MySQL所在機(jī)器性能很好,內(nèi)部萬兆網(wǎng)絡(luò)盹廷,所以直接關(guān)閉了這個特性征绸。
后續(xù)的發(fā)現(xiàn)
關(guān)閉了上述特性后,Metastore內(nèi)存占滿的情況得到了緩解俄占,但是過一段時間后管怠,仍然會有內(nèi)存占用達(dá)到75%的報(bào)警。jstat -gcutil 發(fā)現(xiàn)Metastore運(yùn)行了好久都沒有FullGC缸榄,但是老年代已經(jīng)占到80%以上了渤弛。突然想到看下垃圾回收器,發(fā)現(xiàn)默認(rèn)情況下甚带,Metastore使用的是ParellelScanvage回收器她肯,這個回收器在對象晉升到老年代的時候會判斷一下每次晉升到老年代的對象的平均大小與老年代當(dāng)前剩余大小之間的關(guān)系佳头,如果當(dāng)前老年代內(nèi)存不夠了,才會執(zhí)行FullGC晴氨。所以我們把這個垃圾回收器修改為G1就可以了康嘉。
要修改的文件是 hive-env.sh
export HADOOP_USER_CLASSPATH_FIRST=true #this prevents old metrics libs from mapreduce lib from bringing in old jar deps overriding HIVE_LIB
if [ "$SERVICE" = "cli" ]; then
if [ -z "$DEBUG" ]; then
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseNUMA -XX:+UseParallelGC -XX:-UseGCOverheadLimit"
else
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
fi
fi
# The heap size of the jvm stared by hive shell script can be controlled via:
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_HEAPSIZE={{hive_metastore_heapsize}} # Setting for HiveMetastore
else
export HADOOP_HEAPSIZE={{hive_heapsize}} # Setting for HiveServer2 and Client
fi
# Set JVM parameters -Xms4g -Xmx -XX:+UseG1GC when we start metastore, make sure it has sufficent memory and use g1 garbage collector
# 這個是我的修改,判斷啟動的是metastore就去設(shè)置4g內(nèi)存同時配置使用G1回收器
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xms4g -Xmx4g -XX:+UseG1GC "
else
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx${HADOOP_HEAPSIZE}m"
fi
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS{{heap_dump_opts}}"
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server (hwi etc).
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=${HADOOP_HOME:-{{hadoop_home}}}
export HIVE_HOME=${HIVE_HOME:-{{hive_home_dir}}}
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=${HIVE_CONF_DIR:-{{hive_config_dir}}}
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
if [ "${HIVE_AUX_JARS_PATH}" != "" ]; then
if [ -f "${HIVE_AUX_JARS_PATH}" ]; then
export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}
elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then
export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar
fi
elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then
export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar
fi
export METASTORE_PORT={{hive_metastore_port}}
{% if sqla_db_used or lib_dir_available %}
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:{{jdbc_libs_dir}}"
export JAVA_LIBRARY_PATH="$JAVA_LIBRARY_PATH:{{jdbc_libs_dir}}"
{% endif %}