目前發(fā)現hive on spark任務執(zhí)行時,有幾率會在執(zhí)行過程中RunningTasksCount數逐漸減少,
導致任務執(zhí)行效率降低驰徊。通過排查資源庶诡,未發(fā)現資源不足惦银,hdfs RPC也未發(fā)現異常。
具體什么原因呢?
異常情況
2023-09-10 02:12:14,765 Stage-10_0: 0/1099 Stage-9_0: 930(+85)/9467
2023-09-10 02:12:15,767 Stage-10_0: 0/1099 Stage-9_0: 932(+85)/9467
2023-09-10 02:12:16,769 Stage-10_0: 0/1099 Stage-9_0: 934(+84)/9467
2023-09-10 02:12:17,772 Stage-10_0: 0/1099 Stage-9_0: 939(+83)/9467
2023-09-10 02:12:18,774 Stage-10_0: 0/1099 Stage-9_0: 941(+83)/9467
2023-09-10 02:12:19,776 Stage-10_0: 0/1099 Stage-9_0: 949(+83)/9467
2023-09-10 02:12:20,778 Stage-10_0: 0/1099 Stage-9_0: 951(+82)/9467
2023-09-10 02:12:21,780 Stage-10_0: 0/1099 Stage-9_0: 957(+82)/9467
2023-09-10 02:12:22,782 Stage-10_0: 0/1099 Stage-9_0: 959(+81)/9467
2023-09-10 02:12:23,783 Stage-10_0: 0/1099 Stage-9_0: 968(+79)/9467
2023-09-10 02:12:24,785 Stage-10_0: 0/1099 Stage-9_0: 972(+79)/9467
2023-09-10 02:12:25,787 Stage-10_0: 0/1099 Stage-9_0: 975(+77)/9467
2023-09-10 02:12:26,792 Stage-10_0: 0/1099 Stage-9_0: 978(+76)/9467
2023-09-10 02:12:27,795 Stage-10_0: 0/1099 Stage-9_0: 980(+76)/9467
2023-09-10 02:12:28,797 Stage-10_0: 0/1099 Stage-9_0: 981(+75)/9467
2023-09-10 02:12:30,800 Stage-10_0: 0/1099 Stage-9_0: 984(+73)/9467
2023-09-10 02:12:31,802 Stage-10_0: 0/1099 Stage-9_0: 988(+71)/9467
2023-09-10 02:12:32,804 Stage-10_0: 0/1099 Stage-9_0: 993(+68)/9467
2023-09-10 02:12:33,806 Stage-10_0: 0/1099 Stage-9_0: 998(+65)/9467
2023-09-10 02:12:34,808 Stage-10_0: 0/1099 Stage-9_0: 1006(+61)/9467
2023-09-10 02:12:35,810 Stage-10_0: 0/1099 Stage-9_0: 1009(+61)/9467
2023-09-10 02:12:36,812 Stage-10_0: 0/1099 Stage-9_0: 1011(+61)/9467
2023-09-10 02:12:37,814 Stage-10_0: 0/1099 Stage-9_0: 1014(+61)/9467
2023-09-10 02:12:38,816 Stage-10_0: 0/1099 Stage-9_0: 1019(+58)/9467
2023-09-10 02:12:39,818 Stage-10_0: 0/1099 Stage-9_0: 1022(+57)/9467
2023-09-10 02:12:40,820 Stage-10_0: 0/1099 Stage-9_0: 1025(+54)/9467
2023-09-10 02:12:41,822 Stage-10_0: 0/1099 Stage-9_0: 1028(+54)/9467
2023-09-10 02:12:42,824 Stage-10_0: 0/1099 Stage-9_0: 1030(+53)/9467
2023-09-10 02:12:43,826 Stage-10_0: 0/1099 Stage-9_0: 1036(+50)/9467
2023-09-10 02:12:44,828 Stage-10_0: 0/1099 Stage-9_0: 1038(+50)/9467
2023-09-10 02:12:45,830 Stage-10_0: 0/1099 Stage-9_0: 1040(+50)/9467
2023-09-10 02:12:46,832 Stage-10_0: 0/1099 Stage-9_0: 1042(+49)/9467
2023-09-10 02:12:47,834 Stage-10_0: 0/1099 Stage-9_0: 1043(+49)/9467
2023-09-10 02:12:48,836 Stage-10_0: 0/1099 Stage-9_0: 1048(+47)/9467
正常情況
2023-09-13 02:12:16,887 Stage-10_0: 0/1099 Stage-9_0: 472(+480)/9478
2023-09-13 02:12:17,892 Stage-10_0: 0/1099 Stage-9_0: 474(+478)/9478
2023-09-13 02:12:18,895 Stage-10_0: 0/1099 Stage-9_0: 477(+478)/9478
2023-09-13 02:12:19,907 Stage-10_0: 0/1099 Stage-9_0: 486(+478)/9478
2023-09-13 02:12:20,908 Stage-10_0: 0/1099 Stage-9_0: 491(+476)/9478
2023-09-13 02:12:21,910 Stage-10_0: 0/1099 Stage-9_0: 494(+475)/9478
2023-09-13 02:12:22,912 Stage-10_0: 0/1099 Stage-9_0: 498(+474)/9478
2023-09-13 02:12:23,914 Stage-10_0: 0/1099 Stage-9_0: 505(+469)/9478
2023-09-13 02:12:24,915 Stage-10_0: 0/1099 Stage-9_0: 507(+467)/9478
2023-09-13 02:12:25,917 Stage-10_0: 0/1099 Stage-9_0: 511(+465)/9478
2023-09-13 02:12:26,919 Stage-10_0: 0/1099 Stage-9_0: 515(+464)/9478
2023-09-13 02:12:27,922 Stage-10_0: 0/1099 Stage-9_0: 522(+461)/9478
2023-09-13 02:12:28,924 Stage-10_0: 0/1099 Stage-9_0: 527(+458)/9478
2023-09-13 02:12:29,925 Stage-10_0: 0/1099 Stage-9_0: 550(+452)/9478
2023-09-13 02:12:30,928 Stage-10_0: 0/1099 Stage-9_0: 561(+446)/9478
2023-09-13 02:12:31,930 Stage-10_0: 0/1099 Stage-9_0: 568(+444)/9478
2023-09-13 02:12:32,932 Stage-10_0: 0/1099 Stage-9_0: 576(+442)/9478
2023-09-13 02:12:33,933 Stage-10_0: 0/1099 Stage-9_0: 587(+439)/9478
2023-09-13 02:12:34,935 Stage-10_0: 0/1099 Stage-9_0: 597(+436)/9478
2023-09-13 02:12:35,937 Stage-10_0: 0/1099 Stage-9_0: 605(+431)/9478
2023-09-13 02:12:36,939 Stage-10_0: 0/1099 Stage-9_0: 612(+429)/9478
2023-09-13 02:12:37,941 Stage-10_0: 0/1099 Stage-9_0: 621(+425)/9478
2023-09-13 02:12:38,942 Stage-10_0: 0/1099 Stage-9_0: 633(+418)/9478
2023-09-13 02:12:39,944 Stage-10_0: 0/1099 Stage-9_0: 639(+414)/9478
2023-09-13 02:12:40,946 Stage-10_0: 0/1099 Stage-9_0: 647(+406)/9478
2023-09-13 02:12:41,948 Stage-10_0: 0/1099 Stage-9_0: 652(+403)/9478
2023-09-13 02:12:42,950 Stage-10_0: 0/1099 Stage-9_0: 660(+398)/9478
2023-09-13 02:12:43,952 Stage-10_0: 0/1099 Stage-9_0: 671(+391)/9478
2023-09-13 02:12:44,954 Stage-10_0: 0/1099 Stage-9_0: 682(+383)/9478
2023-09-13 02:12:45,956 Stage-10_0: 0/1099 Stage-9_0: 692(+378)/9478
2023-09-13 02:12:46,959 Stage-10_0: 0/1099 Stage-9_0: 699(+375)/9478
2023-09-13 02:12:47,962 Stage-10_0: 0/1099 Stage-9_0: 705(+371)/9478
2023-09-13 02:12:48,964 Stage-10_0: 0/1099 Stage-9_0: 721(+363)/9478
2023-09-13 02:12:49,966 Stage-10_0: 0/1099 Stage-9_0: 731(+358)/9478
2023-09-13 02:12:50,968 Stage-10_0: 0/1099 Stage-9_0: 741(+351)/9478
2023-09-13 02:12:51,970 Stage-10_0: 0/1099 Stage-9_0: 754(+346)/9478
本文只針對hive on spark展開扯俱。
日志中對比之前運行過程中书蚪,task數突然變少,導致sql執(zhí)行緩慢迅栅。
通過對比執(zhí)行計劃殊校,如發(fā)現task數據變少。大概率是統(tǒng)計信息出現問題读存,可以通過重新分析統(tǒng)計信息解決为流。
ANALYZE TABLE ods_fact_sale_partion PARTITION(sale_date='2010-04-12') COMPUTE STATISTICS;
如果出現長尾,首先要考慮數據傾斜让簿。
排除數據傾斜情況敬察,需要查看日志中慢的task的執(zhí)行節(jié)點分布,如果慢task都集中在某幾個節(jié)點尔当,大概率是節(jié)點機出現異常莲祸。
首先關注CPU,內存椭迎,io(CPU IO WAIT)锐帜,磁盤讀寫效率等指標。