大數(shù)據(jù)開發(fā)問題整理

問題整理

一轩猩、大數(shù)據(jù)環(huán)境配置

1.windows本地安裝配置hadoop后九榔,cmd執(zhí)行"hadoop"后報錯:ERROR:JAVA_HOME is incorrectly set.
方案:由于JAVA_HOME路徑有空格導致茁计,可修改hadoop下\etc\hadoop\hadoop_env.cmd文檔中set JAVA_HOME以修復該問題沃暗。
eg:set JAVA_HOME=C:\PROGRA~1\Java\jdk1.8.0_161
參考:https://www.cnblogs.com/zlslch/p/8580446.html
2.windows本地安裝配置hadoop后冤馏,cmd執(zhí)行"hadoop"后提示:系統(tǒng)找不到指定的批處理標簽-print_usage
方案:將hadoop的bin目錄中的所有cmd文件用notepad++打開典徊,進行文檔格式轉(zhuǎn)換内列。
打開文件->編輯->文檔格式轉(zhuǎn)換->轉(zhuǎn)為Windows(CR LF)->保存

image.png

image.png

二撵术、hive

1、hive任務(wù)執(zhí)行時话瞧,任務(wù)失敗嫩与,日志顯示虛擬內(nèi)存不足
方案:由于集群節(jié)點虛擬內(nèi)存不足導致的,解決辦法很簡單交排,直接關(guān)閉虛擬內(nèi)存檢測就可以了
修改:yarn-site.xml 文件划滋,添加如下配置
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
分發(fā)至集群其他節(jié)點,并重啟集群埃篓。
1处坪、通過load data local overwrite方式向分桶表加載數(shù)據(jù),overwrite未生效架专,數(shù)據(jù)會追加到目標表中同窘。需要通過insert overwrite table target_table select * from source_table的方式覆蓋目標表。

三胶征、Hbase

1塞椎、Hbase jar包在集群執(zhí)行報錯如圖:

image.png

報錯原因:沒有在hadoop-env.sh文件里面配置HADOOP_CLASSPATH環(huán)境變量,所以你執(zhí)行hadoop jar
命令時睛低,它找不到運行程序所依賴的jar包案狠,所以配置下就行服傍。
解決方案:修改hadoop-env.sh文件,添加HADOOP-CLASSPATH環(huán)境變量

[hadoop@node02 hadoop]$ cd /kkb/install/hadoop-3.1.4/etc/hadoop
[hadoop@node02 hadoop]$ vim hadoop-env.sh 
export HADOOP_CLASSPATH=/kkb/install/hbase-2.2.2/lib/*
# * 一定要不然報錯骂铁,注意不要用$HBASE_HOME代替

2吹零、集群Hbase正常啟動后,HRegionServer節(jié)點幾分鐘后自動斷開
通過查看日志拉庵,如圖報錯
java.lang.NoClassDefFoundError: org/apache/htrace/SpanReceiver

image.png

解決方案:復制htrace-core4-4.2.0-incubating.jar至lib目錄下

[hadoop@node03 client-facing-thirdparty]$ pwd
/kkb/install/hbase-2.2.2/lib/client-facing-thirdparty
[hadoop@node03 client-facing-thirdparty]$ cp /kkb/install/hbase-2.2.2/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar /kkb/install/hbase-2.2.2/lib/

四灿椅、Flume

1、Flume agent執(zhí)行如圖報錯:

image.png

報錯原因:apache-flume-1.9.0-bin钞支、hadoop-3.1.4都有g(shù)uava包茫蛹,但是版本不一致,會造成沖突
解決方案:將hadoop中高版本的guava包烁挟,替換flume中低版本的包

cd /kkb/install/apache-flume-1.9.0-bin/lib
rm -f guava-11.0.2.jar
cp /kkb/install/hadoop-3.1.4/share/hadoop/common/lib/guava-27.0-jre.jar .

MySQL

1婴洼、導出mysql表數(shù)據(jù),指定csv格式
語法:select * from tablename into outfile "目錄路徑/tablename.csv" fields terminated by ',' lines terminated by '\n';
報錯:ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
解決方案:查看mysql變量secure_file_priv設(shè)置撼嗓,按照設(shè)置路徑修改導出目錄路徑即可柬采。

mysql> select * from students  into outfile "/tmp/students.csv" fields terminated by ',' lines terminated by '\n'; 
ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
mysql> show variables like '*file*';
Empty set (0.00 sec)

mysql> show variables like '*secure_file*';
Empty set (0.00 sec)

mysql> show variables like '%secure_file%'; 
+------------------+-----------------------+
| Variable_name    | Value                 |
+------------------+-----------------------+
| secure_file_priv | /var/lib/mysql-files/ |
+------------------+-----------------------+
1 row in set (0.00 sec)

mysql> select * from students  into outfile "/var/lib/mysql-files/students.csv" fields terminated by ',' lines terminated by '\n';           
Query OK, 6 rows affected (0.01 sec)

Spark

1、相同的jar包通過集群cluster的方式且警,在yarn執(zhí)行成功粉捻,在spark的standalone下執(zhí)行失敗

bin/spark-submit --master spark://node01:7077 \
--deploy-mode cluster \
--class com.kkb.spark.core.SparkCountCluster  \
--executor-memory 1G \
--total-executor-cores 2 \
hdfs://node01:8020/original-spark-core-1.0-SNAPSHOT.jar \
hdfs://node01:8020/word.txt hdfs://node01:8020/output

報錯:

Launch Command: "/kkb/install/jdk1.8.0_141/bin/java" "-cp" "/kkb/install/spark-2.3.3-bin-hadoop2.7/conf/:/kkb/install/spark-2.3.3-bin-hadoop2.7/jars/*:/kkb/install/hadoop-3.1.4/etc/hadoop/" "-Xmx1024M" "-Dspark.eventLog.enabled=true" "-Dspark.submit.deployMode=cluster" "-Dspark.yarn.historyServer.address=node01:4000" "-Dspark.app.name=com.kkb.spark.core.SparkCountCluster" "-Dspark.driver.supervise=false" "-Dspark.executor.memory=1g" "-Dspark.eventLog.dir=hdfs://node01:8020/spark_log" "-Dspark.master=spark://node01:7077" "-Dspark.driver.extraClassPath=/kkb/install/hadoop-3.1.4/share/hadoop/common/hadoop-lzo-0.4.20.jar" "-Dspark.eventLog.compress=true" "-Dspark.cores.max=2" "-Dspark.executor.extraClassPath=/kkb/install/hadoop-3.1.4/share/hadoop/common/hadoop-lzo-0.4.20.jar" "-Dspark.history.ui.port=4000" "-Dspark.rpc.askTimeout=10s" "-Dspark.jars=hdfs://node01:8020/original-spark-core-1.0-SNAPSHOT.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@192.168.153.110:41049" "/kkb/install/spark-2.3.3-bin-hadoop2.7/work/driver-20210115100008-0000/original-spark-core-1.0-SNAPSHOT.jar" "com.kkb.spark.core.SparkCountCluster" "hdfs://node01:8020/word.txt" "hdfs://node01:8020/output"
========================================

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:187)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:78)
    at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:78)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.immutable.List.map(List.scala:285)
    at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:326)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:326)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:325)
    at com.kkb.spark.core.SparkCountCluster$.main(SparkCountCluster.scala:12)
    at com.kkb.spark.core.SparkCountCluster.main(SparkCountCluster.scala)
    ... 6 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    ... 45 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
    at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
    ... 50 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
    ... 52 

解決方案1:
--master指定6066端口,即REST URL: spark://node01.kaikeba.com:6066 (cluster mode)

image.png

bin/spark-submit --master spark://node01:6066 \
--deploy-mode cluster \
--class com.kkb.spark.core.SparkCountCluster  \
--executor-memory 1G \
--total-executor-cores 2 \
hdfs://node01:8020/original-spark-core-1.0-SNAPSHOT.jar \
hdfs://node01:8020/word.txt hdfs://node01:8020/output

解決方案2:
修改配置文件 spark-defaults.conf斑芜,添加配置spark.master spark://node01:7077,node02:7077

image.png

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末肩刃,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子杏头,更是在濱河造成了極大的恐慌树酪,老刑警劉巖,帶你破解...
    沈念sama閱讀 210,914評論 6 490
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件大州,死亡現(xiàn)場離奇詭異,居然都是意外死亡垂谢,警方通過查閱死者的電腦和手機厦画,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 89,935評論 2 383
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來滥朱,“玉大人根暑,你說我怎么就攤上這事♂懔冢” “怎么了排嫌?”我有些...
    開封第一講書人閱讀 156,531評論 0 345
  • 文/不壞的土叔 我叫張陵,是天一觀的道長缰犁。 經(jīng)常有香客問我淳地,道長怖糊,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 56,309評論 1 282
  • 正文 為了忘掉前任颇象,我火速辦了婚禮伍伤,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘遣钳。我一直安慰自己扰魂,他們只是感情好,可當我...
    茶點故事閱讀 65,381評論 5 384
  • 文/花漫 我一把揭開白布蕴茴。 她就那樣靜靜地躺著劝评,像睡著了一般。 火紅的嫁衣襯著肌膚如雪倦淀。 梳的紋絲不亂的頭發(fā)上蒋畜,一...
    開封第一講書人閱讀 49,730評論 1 289
  • 那天,我揣著相機與錄音晃听,去河邊找鬼百侧。 笑死,一個胖子當著我的面吹牛能扒,可吹牛的內(nèi)容都是我干的佣渴。 我是一名探鬼主播,決...
    沈念sama閱讀 38,882評論 3 404
  • 文/蒼蘭香墨 我猛地睜開眼初斑,長吁一口氣:“原來是場噩夢啊……” “哼辛润!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起见秤,我...
    開封第一講書人閱讀 37,643評論 0 266
  • 序言:老撾萬榮一對情侶失蹤砂竖,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后鹃答,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體乎澄,經(jīng)...
    沈念sama閱讀 44,095評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,448評論 2 325
  • 正文 我和宋清朗相戀三年测摔,在試婚紗的時候發(fā)現(xiàn)自己被綠了置济。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 38,566評論 1 339
  • 序言:一個原本活蹦亂跳的男人離奇死亡锋八,死狀恐怖浙于,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情挟纱,我是刑警寧澤羞酗,帶...
    沈念sama閱讀 34,253評論 4 328
  • 正文 年R本政府宣布,位于F島的核電站紊服,受9級特大地震影響檀轨,放射性物質(zhì)發(fā)生泄漏胸竞。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 39,829評論 3 312
  • 文/蒙蒙 一裤园、第九天 我趴在偏房一處隱蔽的房頂上張望撤师。 院中可真熱鬧,春花似錦拧揽、人聲如沸剃盾。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,715評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽痒谴。三九已至,卻和暖如春铡羡,著一層夾襖步出監(jiān)牢的瞬間积蔚,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,945評論 1 264
  • 我被黑心中介騙來泰國打工烦周, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留尽爆,地道東北人。 一個月前我還...
    沈念sama閱讀 46,248評論 2 360
  • 正文 我出身青樓读慎,卻偏偏與公主長得像漱贱,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子夭委,可洞房花燭夜當晚...
    茶點故事閱讀 43,440評論 2 348

推薦閱讀更多精彩內(nèi)容