背景
用戶在使用線上某服務(wù)時(涉及到
es
已有索引寫入操作), 反饋服務(wù)總是失敗. 而其他服務(wù)(涉及es的讀/新索引的寫入)均正常.
排查過程
經(jīng)過對后臺日志的分析, 發(fā)現(xiàn)如下錯誤日志:
{'index': {'_index': 'bi_4769dcd50ba048cd163bfd48e2f500f7', '_type': 'doc', '_id': 'evoOy24BgwL6ozngNCWG', 'status': 403, 'error': {'type':'cluster_block_exception', 'reason': 'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];'}}
經(jīng)過Google, 發(fā)現(xiàn)此問題是由于es數(shù)據(jù)存儲磁盤剩余空間過少導(dǎo)致的. 官網(wǎng)對此的說明如下:
cluster.routing.allocation.disk.watermark.flood_stage
Controls the flood stage watermark. It defaults to 95%, meaning that Elasticsearch enforces a read-only index block (
index.blocks.read_only_allow_delete
) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is a last resort to prevent nodes from running out of disk space. The index block must be released manually once there is enough disk space available to allow indexing operations to continue.即es存在一種
flood_stage
的機(jī)制. 默認(rèn)磁盤空間設(shè)置為95%, 當(dāng)磁盤占用超過此值時, 將會觸發(fā)flood_stage
機(jī)制, es將強(qiáng)制將各索引index.blocks.read_only_allow_delete
設(shè)置為true
, 即僅允許只讀只刪, 不允許新增.
以上排查結(jié)果表明: 線上服務(wù)出問題的原因在于es索引均被設(shè)置為只讀只刪模式. 所以導(dǎo)致索引數(shù)據(jù)寫入時失敗.
問題原因
前兩周, 在線上服務(wù)所在機(jī)器上, 進(jìn)行了涉及
MySQL
大量數(shù)據(jù)(涉及數(shù)據(jù)量達(dá)1億量級)的讀寫操作, 當(dāng)時磁盤空間被耗盡. 觸發(fā)es的flood_stage
, 全部索引被設(shè)置為只讀只刪.但因近期(一個月左右), 沒有用戶使用對es索引寫入數(shù)據(jù)服務(wù), 導(dǎo)致一直未發(fā)現(xiàn)此問題.
解決方法
解決方法很簡單, 僅需將對應(yīng)es節(jié)點上的索引設(shè)置進(jìn)行如下設(shè)置即可.
PUT _settings { "index": { "blocks": { "read_only_allow_delete": "false" } } }
反思!!!
在線上進(jìn)行某些操作時, 應(yīng)對服務(wù)器資源占用有個粗略的估計或?qū)Ψ?wù)器資源進(jìn)行監(jiān)控.
避免在執(zhí)行某些操作時, 耗盡服務(wù)器資源而導(dǎo)致其他一些服務(wù)的異常
引用
ES報錯 [FORBIDDEN/12/index read-only / allow delete (api)] - read only elasticsearch indices