問題整理
一轩猩、大數(shù)據(jù)環(huán)境配置
1.windows本地安裝配置hadoop后九榔,cmd執(zhí)行"hadoop"后報錯:ERROR:JAVA_HOME is incorrectly set.
方案:由于JAVA_HOME路徑有空格導致茁计,可修改hadoop下\etc\hadoop\hadoop_env.cmd文檔中set JAVA_HOME以修復該問題沃暗。
eg:set JAVA_HOME=C:\PROGRA~1\Java\jdk1.8.0_161
參考:https://www.cnblogs.com/zlslch/p/8580446.html
2.windows本地安裝配置hadoop后冤馏,cmd執(zhí)行"hadoop"后提示:系統(tǒng)找不到指定的批處理標簽-print_usage
方案:將hadoop的bin目錄中的所有cmd文件用notepad++打開典徊,進行文檔格式轉(zhuǎn)換内列。
打開文件->編輯->文檔格式轉(zhuǎn)換->轉(zhuǎn)為Windows(CR LF)->保存
二撵术、hive
1、hive任務(wù)執(zhí)行時话瞧,任務(wù)失敗嫩与,日志顯示虛擬內(nèi)存不足
方案:由于集群節(jié)點虛擬內(nèi)存不足導致的,解決辦法很簡單交排,直接關(guān)閉虛擬內(nèi)存檢測就可以了
修改:yarn-site.xml
文件划滋,添加如下配置
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
分發(fā)至集群其他節(jié)點,并重啟集群埃篓。
1处坪、通過load data local overwrite方式向分桶表加載數(shù)據(jù),overwrite未生效架专,數(shù)據(jù)會追加到目標表中同窘。需要通過insert overwrite table target_table select * from source_table的方式覆蓋目標表。
三胶征、Hbase
1塞椎、Hbase jar包在集群執(zhí)行報錯如圖:
報錯原因:沒有在hadoop-env.sh文件里面配置HADOOP_CLASSPATH環(huán)境變量,所以你執(zhí)行hadoop jar
命令時睛低,它找不到運行程序所依賴的jar包案狠,所以配置下就行服傍。
解決方案:修改hadoop-env.sh文件,添加HADOOP-CLASSPATH環(huán)境變量
[hadoop@node02 hadoop]$ cd /kkb/install/hadoop-3.1.4/etc/hadoop
[hadoop@node02 hadoop]$ vim hadoop-env.sh
export HADOOP_CLASSPATH=/kkb/install/hbase-2.2.2/lib/*
# * 一定要不然報錯骂铁,注意不要用$HBASE_HOME代替
2吹零、集群Hbase正常啟動后,HRegionServer節(jié)點幾分鐘后自動斷開
通過查看日志拉庵,如圖報錯
java.lang.NoClassDefFoundError: org/apache/htrace/SpanReceiver
解決方案:復制htrace-core4-4.2.0-incubating.jar至lib目錄下
[hadoop@node03 client-facing-thirdparty]$ pwd
/kkb/install/hbase-2.2.2/lib/client-facing-thirdparty
[hadoop@node03 client-facing-thirdparty]$ cp /kkb/install/hbase-2.2.2/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar /kkb/install/hbase-2.2.2/lib/
四灿椅、Flume
1、Flume agent執(zhí)行如圖報錯:
報錯原因:apache-flume-1.9.0-bin钞支、hadoop-3.1.4都有g(shù)uava包茫蛹,但是版本不一致,會造成沖突
解決方案:將hadoop中高版本的guava包烁挟,替換flume中低版本的包
cd /kkb/install/apache-flume-1.9.0-bin/lib
rm -f guava-11.0.2.jar
cp /kkb/install/hadoop-3.1.4/share/hadoop/common/lib/guava-27.0-jre.jar .
MySQL
1婴洼、導出mysql表數(shù)據(jù),指定csv格式
語法:select * from tablename into outfile "目錄路徑/tablename.csv" fields terminated by ',' lines terminated by '\n';
報錯:ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
解決方案:查看mysql變量secure_file_priv設(shè)置撼嗓,按照設(shè)置路徑修改導出目錄路徑即可柬采。
mysql> select * from students into outfile "/tmp/students.csv" fields terminated by ',' lines terminated by '\n';
ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
mysql> show variables like '*file*';
Empty set (0.00 sec)
mysql> show variables like '*secure_file*';
Empty set (0.00 sec)
mysql> show variables like '%secure_file%';
+------------------+-----------------------+
| Variable_name | Value |
+------------------+-----------------------+
| secure_file_priv | /var/lib/mysql-files/ |
+------------------+-----------------------+
1 row in set (0.00 sec)
mysql> select * from students into outfile "/var/lib/mysql-files/students.csv" fields terminated by ',' lines terminated by '\n';
Query OK, 6 rows affected (0.01 sec)
Spark
1、相同的jar包通過集群cluster的方式且警,在yarn執(zhí)行成功粉捻,在spark的standalone下執(zhí)行失敗
bin/spark-submit --master spark://node01:7077 \
--deploy-mode cluster \
--class com.kkb.spark.core.SparkCountCluster \
--executor-memory 1G \
--total-executor-cores 2 \
hdfs://node01:8020/original-spark-core-1.0-SNAPSHOT.jar \
hdfs://node01:8020/word.txt hdfs://node01:8020/output
報錯:
Launch Command: "/kkb/install/jdk1.8.0_141/bin/java" "-cp" "/kkb/install/spark-2.3.3-bin-hadoop2.7/conf/:/kkb/install/spark-2.3.3-bin-hadoop2.7/jars/*:/kkb/install/hadoop-3.1.4/etc/hadoop/" "-Xmx1024M" "-Dspark.eventLog.enabled=true" "-Dspark.submit.deployMode=cluster" "-Dspark.yarn.historyServer.address=node01:4000" "-Dspark.app.name=com.kkb.spark.core.SparkCountCluster" "-Dspark.driver.supervise=false" "-Dspark.executor.memory=1g" "-Dspark.eventLog.dir=hdfs://node01:8020/spark_log" "-Dspark.master=spark://node01:7077" "-Dspark.driver.extraClassPath=/kkb/install/hadoop-3.1.4/share/hadoop/common/hadoop-lzo-0.4.20.jar" "-Dspark.eventLog.compress=true" "-Dspark.cores.max=2" "-Dspark.executor.extraClassPath=/kkb/install/hadoop-3.1.4/share/hadoop/common/hadoop-lzo-0.4.20.jar" "-Dspark.history.ui.port=4000" "-Dspark.rpc.askTimeout=10s" "-Dspark.jars=hdfs://node01:8020/original-spark-core-1.0-SNAPSHOT.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@192.168.153.110:41049" "/kkb/install/spark-2.3.3-bin-hadoop2.7/work/driver-20210115100008-0000/original-spark-core-1.0-SNAPSHOT.jar" "com.kkb.spark.core.SparkCountCluster" "hdfs://node01:8020/word.txt" "hdfs://node01:8020/output"
========================================
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:187)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:78)
at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:78)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:326)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:326)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:325)
at com.kkb.spark.core.SparkCountCluster$.main(SparkCountCluster.scala:12)
at com.kkb.spark.core.SparkCountCluster.main(SparkCountCluster.scala)
... 6 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 45 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 50 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
... 52
解決方案1:
--master指定6066端口,即REST URL: spark://node01.kaikeba.com:6066 (cluster mode)
bin/spark-submit --master spark://node01:6066 \
--deploy-mode cluster \
--class com.kkb.spark.core.SparkCountCluster \
--executor-memory 1G \
--total-executor-cores 2 \
hdfs://node01:8020/original-spark-core-1.0-SNAPSHOT.jar \
hdfs://node01:8020/word.txt hdfs://node01:8020/output
解決方案2:
修改配置文件 spark-defaults.conf斑芜,添加配置spark.master spark://node01:7077,node02:7077