ElasticSearch異常處理案例

解釋索引 app-log-2024.05.20.18 分片 0 未分配的原因:
{"index":"app-log-2024.05.20.18","shard":0,"primary":true,"current_state":"started","current_node":{"id":"lkdWWSzjS9iGFcvZaaVvrA","name":"elk-02.zgzf.com","transport_address":"172.19.70.11:9800","attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"weight_ranking":2},"can_remain_on_current_node":"yes","can_rebalance_cluster":"no","can_rebalance_cluster_decisions":[{"decider":"rebalance_only_when_active","decision":"NO","explanation":"rebalancing is not allowed until all replicas in the cluster are active"},{"decider":"cluster_rebalance","decision":"NO","explanation":"the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"}],"can_rebalance_to_other_node":"no","rebalance_explanation":"rebalancing is not allowed, even though there is at least one node on which the shard can be allocated","node_allocation_decisions":[{"node_id":"Fyk0dv4hSUaqHVMqsAqXdg","node_name":"elk-01.zgzf.com","transport_address":"172.19.70.10:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"yes","weight_ranking":1},{"node_id":"NwmszklQTqqAQ1p9xzYXZw","node_name":"elk-03.zgzf.com","transport_address":"172.19.70.12:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"worse_balance","weight_ranking":3}]}

調(diào)整集群配置

如果您確信所有節(jié)點(diǎn)都正常且準(zhǔn)備好接受分片,可以調(diào)整集群設(shè)置以允許重新平衡糖驴×诺唬可以臨時(shí)修改 cluster.routing.allocation.allow_rebalance 設(shè)置:

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.allow_rebalance": "always"
  }
}

恢復(fù)默認(rèn)設(shè)置

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.allow_rebalance": indices_all_active
  }
}
解釋索引 app-log-2024.05.19.09 分片 1 未分配的原因:
{"index":"app-log-2024.05.19.09","shard":1,"primary":false,"current_state":"unassigned","unassigned_info":{"reason":"ALLOCATION_FAILED","at":"2024-06-03T13:48:36.982Z","failed_allocation_attempts":5,"details":"failed shard on node [Fyk0dv4hSUaqHVMqsAqXdg]: failed recovery, failure RecoveryFailedException[[app-log-2024.05.19.09][1]: Recovery failed from {elk-02.zgzf.com}{lkdWWSzjS9iGFcvZaaVvrA}{VcX4s-mXTt65du9F6NI5XA}{172.19.70.11}{172.19.70.11:9800}{dilm}{ml.machine_memory=66854977536, ml.max_open_jobs=20, xpack.installed=true} into {elk-01.zgzf.com}{Fyk0dv4hSUaqHVMqsAqXdg}{QvoljIiTSOaGvOvzXuObkw}{172.19.70.10}{172.19.70.10:9800}{dilm}{ml.machine_memory=66854977536, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk-02.zgzf.com][172.19.70.11:9800][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk-01.zgzf.com][172.19.70.10:9800][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/zgapp/data/elasticsearch7/nodes/0/indices/T8UUt8CWRd6XZUyHJu8skA/1/index/_1bo_Lucene84_0.tim: Too many open files]; ","last_allocation_status":"no_attempt"},"can_allocate":"no","allocate_explanation":"cannot allocate because allocation is not permitted to any of the nodes","node_allocation_decisions":[{"node_id":"Fyk0dv4hSUaqHVMqsAqXdg","node_name":"elk-01.zgzf.com","transport_address":"172.19.70.10:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"no","deciders":[{"decider":"max_retry","decision":"NO","explanation":"shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-06-03T13:48:36.982Z], failed_attempts[5], failed_nodes[[Fyk0dv4hSUaqHVMqsAqXdg]], delayed=false, details[failed shard on node [Fyk0dv4hSUaqHVMqsAqXdg]: failed recovery, failure RecoveryFailedException[[app-log-2024.05.19.09][1]: Recovery failed from {elk-02.zgzf.com}{lkdWWSzjS9iGFcvZaaVvrA}{VcX4s-mXTt65du9F6NI5XA}{172.19.70.11}{172.19.70.11:9800}{dilm}{ml.machine_memory=66854977536, ml.max_open_jobs=20, xpack.installed=true} into {elk-01.zgzf.com}{Fyk0dv4hSUaqHVMqsAqXdg}{QvoljIiTSOaGvOvzXuObkw}{172.19.70.10}{172.19.70.10:9800}{dilm}{ml.machine_memory=66854977536, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk-02.zgzf.com][172.19.70.11:9800][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk-01.zgzf.com][172.19.70.10:9800][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/zgapp/data/elasticsearch7/nodes/0/indices/T8UUt8CWRd6XZUyHJu8skA/1/index/_1bo_Lucene84_0.tim: Too many open files]; ], allocation_status[no_attempt]]]"}]},{"node_id":"NwmszklQTqqAQ1p9xzYXZw","node_name":"elk-03.zgzf.com","transport_address":"172.19.70.12:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"no","deciders":[{"decider":"max_retry","decision":"NO","explanation":"shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-06-03T13:48:36.982Z], failed_attempts[5], failed_nodes[[Fyk0dv4hSUaqHVMqsAqXdg]], delayed=false, details[failed shard on node [Fyk0dv4hSUaqHVMqsAqXdg]: failed recovery, failure RecoveryFailedException[[app-log-2024.05.19.09][1]: Recovery failed from {elk-02.zgzf.com}{lkdWWSzjS9iGFcvZaaVvrA}{VcX4s-mXTt65du9F6NI5XA}{172.19.70.11}{172.19.70.11:9800}{dilm}{ml.machine_memory=66854977536, ml.max_open_jobs=20, xpack.installed=true} into {elk-01.zgzf.com}{Fyk0dv4hSUaqHVMqsAqXdg}{QvoljIiTSOaGvOvzXuObkw}{172.19.70.10}{172.19.70.10:9800}{dilm}{ml.machine_memory=66854977536, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk-02.zgzf.com][172.19.70.11:9800][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk-01.zgzf.com][172.19.70.10:9800][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/zgapp/data/elasticsearch7/nodes/0/indices/T8UUt8CWRd6XZUyHJu8skA/1/index/_1bo_Lucene84_0.tim: Too many open files]; ], allocation_status[no_attempt]]]"}]},{"node_id":"lkdWWSzjS9iGFcvZaaVvrA","node_name":"elk-02.zgzf.com","transport_address":"172.19.70.11:9800","node_attributes":{"ml.machine_memory":"66854977536","ml.max_open_jobs":"20","xpack.installed":"true"},"node_decision":"no","deciders":[{"decider":"max_retry","decision":"NO","explanation":"shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-06-03T13:48:36.982Z], failed_attempts[5], failed_nodes[[Fyk0dv4hSUaqHVMqsAqXdg]], delayed=false, details[failed shard on node [Fyk0dv4hSUaqHVMqsAqXdg]: failed recovery, failure RecoveryFailedException[[app-log-2024.05.19.09][1]: Recovery failed from {elk-02.zgzf.com}{lkdWWSzjS9iGFcvZaaVvrA}{VcX4s-mXTt65du9F6NI5XA}{172.19.70.11}{172.19.70.11:9800}{dilm}{ml.machine_memory=66854977536, ml.max_open_jobs=20, xpack.installed=true} into {elk-01.zgzf.com}{Fyk0dv4hSUaqHVMqsAqXdg}{QvoljIiTSOaGvOvzXuObkw}{172.19.70.10}{172.19.70.10:9800}{dilm}{ml.machine_memory=66854977536, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[elk-02.zgzf.com][172.19.70.11:9800][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[elk-01.zgzf.com][172.19.70.10:9800][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: FileSystemException[/zgapp/data/elasticsearch7/nodes/0/indices/T8UUt8CWRd6XZUyHJu8skA/1/index/_1bo_Lucene84_0.tim: Too many open files]; ], allocation_status[no_attempt]]]"},{"decider":"same_shard","decision":"NO","explanation":"the shard cannot be allocated to the same node on which a copy of the shard already exists [[app-log-2024.05.19.09][1], node[lkdWWSzjS9iGFcvZaaVvrA], [P], s[STARTED], a[id=luKIA-plTle91E77OPn1GA]]"}]}]}

主要問題是恢復(fù)過程中打開文件過多(Too many open files)佛致,導(dǎo)致分片無(wú)法成功分配。
檢查并增加文件描述符限制

ulimit -n 檢查
ulimit -n 65536 臨時(shí)修改
/etc/security/limits.conf 持久修改
* soft nofile 65536
* hard nofile 65536

增加文件描述符限制后辙谜,手動(dòng)重試分片分配

POST /_cluster/reroute?retry_failed=true

在重試分配后俺榆,監(jiān)控集群健康狀態(tài),確保所有分片正常分配:

GET /_cluster/health
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末装哆,一起剝皮案震驚了整個(gè)濱河市罐脊,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌蜕琴,老刑警劉巖爹殊,帶你破解...
    沈念sama閱讀 216,324評(píng)論 6 498
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異奸绷,居然都是意外死亡梗夸,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,356評(píng)論 3 392
  • 文/潘曉璐 我一進(jìn)店門号醉,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)反症,“玉大人,你說(shuō)我怎么就攤上這事畔派∏Π” “怎么了?”我有些...
    開封第一講書人閱讀 162,328評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵线椰,是天一觀的道長(zhǎng)胞谈。 經(jīng)常有香客問我,道長(zhǎng)憨愉,這世上最難降的妖魔是什么烦绳? 我笑而不...
    開封第一講書人閱讀 58,147評(píng)論 1 292
  • 正文 為了忘掉前任,我火速辦了婚禮配紫,結(jié)果婚禮上径密,老公的妹妹穿的比我還像新娘。我一直安慰自己躺孝,他們只是感情好享扔,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,160評(píng)論 6 388
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著植袍,像睡著了一般惧眠。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上于个,一...
    開封第一講書人閱讀 51,115評(píng)論 1 296
  • 那天氛魁,我揣著相機(jī)與錄音,去河邊找鬼。 笑死呆盖,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的贷笛。 我是一名探鬼主播应又,決...
    沈念sama閱讀 40,025評(píng)論 3 417
  • 文/蒼蘭香墨 我猛地睜開眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼乏苦!你這毒婦竟也來(lái)了株扛?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 38,867評(píng)論 0 274
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤汇荐,失蹤者是張志新(化名)和其女友劉穎洞就,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體掀淘,經(jīng)...
    沈念sama閱讀 45,307評(píng)論 1 310
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡旬蟋,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,528評(píng)論 2 332
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了革娄。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片倾贰。...
    茶點(diǎn)故事閱讀 39,688評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖拦惋,靈堂內(nèi)的尸體忽然破棺而出匆浙,到底是詐尸還是另有隱情,我是刑警寧澤厕妖,帶...
    沈念sama閱讀 35,409評(píng)論 5 343
  • 正文 年R本政府宣布首尼,位于F島的核電站,受9級(jí)特大地震影響言秸,放射性物質(zhì)發(fā)生泄漏软能。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,001評(píng)論 3 325
  • 文/蒙蒙 一举畸、第九天 我趴在偏房一處隱蔽的房頂上張望埋嵌。 院中可真熱鬧,春花似錦俱恶、人聲如沸雹嗦。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,657評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)了罪。三九已至,卻和暖如春聪全,著一層夾襖步出監(jiān)牢的瞬間泊藕,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,811評(píng)論 1 268
  • 我被黑心中介騙來(lái)泰國(guó)打工难礼, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留娃圆,地道東北人玫锋。 一個(gè)月前我還...
    沈念sama閱讀 47,685評(píng)論 2 368
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像讼呢,于是被迫代替她去往敵國(guó)和親撩鹿。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,573評(píng)論 2 353

推薦閱讀更多精彩內(nèi)容