近日莫名遭遇異常一枚救恨,如下:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 271.0 failed 1 times, most recent failure: Lost task 0.0 in stage 271.0 (TID 544, localhost): java.io.IOException: Failed to create local dir in /tmp/blockmgr-4223dca8-7355-4ab2-98b9-87e763c7becd/1d.
at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:87)
at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:97)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.getIndexFile(IndexShuffleBlockResolver.scala:58)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:140)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:127)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:87)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:107)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
原因分析:
1 Failed to create local dir,什么時(shí)候spark會(huì)創(chuàng)建臨時(shí)文件呢?
shuffle時(shí)需要通過diskBlockManage將map結(jié)果寫入本地腿准,優(yōu)先寫入memory store,在memore store空間不足時(shí)會(huì)創(chuàng)建臨時(shí)文件(二級(jí)目錄拾碌,如異常中的blockmgr-4223dca8-7355-4ab2-98b9-87e763c7becd/1d)吐葱。
2 shuffle又是咋回事呢?
spark作為并行計(jì)算框架校翔,同一個(gè)作業(yè)會(huì)被劃分為多個(gè)任務(wù)在多個(gè)節(jié)點(diǎn)執(zhí)行弟跑,reduce的輸入可能存在于多個(gè)節(jié)點(diǎn),因此需要shuffle將所有reduce的輸入?yún)R總起來(lái)防症。
3 memory store的大小是多少孟辑,什么情況下會(huì)超出使用disk store?
memory store的大小取決于spark.excutor.memory大小蔫敲,默認(rèn)為spark.excutor.memory*0.6
4 臨時(shí)文件默認(rèn)創(chuàng)建于/temp饲嗽,如果修改?
spark.env中添加配置SPARK_LOCAL_DIRS或程序中配置奈嘿,可配置多個(gè)路徑貌虾,逗號(hào)分隔增強(qiáng)io效率
SPARK_LOCAL_DIRS:
Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks.
5 保證磁盤空間充足和磁盤讀寫權(quán)限。磁盤空間按需配置裙犹。