最低要求的角色:超級(jí)管理員(所謂超級(jí)管理員就是 admin)
可以通過(guò)兩種方式從集群中刪除主機(jī):
- 使用 Cloudera Manager 刪除假勿;
- 從集群 A 中刪除主機(jī)重窟,將其提供給由 Cloudera Manager 管理的其他集群献幔;
兩種方法都會(huì)涉及退役主機(jī)(decommission)陆盘、刪除角色(role)鳄厌、刪除托管服務(wù)軟件(service)岂贩,但最終都會(huì)保留數(shù)據(jù)目錄。
使用 Cloudera Manager 刪除主機(jī)
- 在 Cloudera Manager 管理控制臺(tái)中个扰,單擊
Hosts
選項(xiàng)卡,在All Hots
中:
- 選擇要?jiǎng)h除的主機(jī)葱色;
- 選擇 Actions for Selected > Decommission递宅;
這里需要注意 3 點(diǎn):
- 尤其是運(yùn)行了 hdfs datanode、kafka broker 這樣的數(shù)據(jù)節(jié)點(diǎn)苍狰,退役節(jié)點(diǎn)所需時(shí)間和數(shù)據(jù)量線性正相關(guān)办龄;
- 請(qǐng)勿同時(shí)下線 2 臺(tái)或者以上的數(shù)據(jù)節(jié)點(diǎn),可能會(huì)造成數(shù)據(jù)丟失淋昭;
- 必須要等待這一步執(zhí)行完成才可以進(jìn)行下一步操作俐填;
- 停止 cloudera-scm-agent:
pssh -h list_cm_agent "sudo systemctl stop cloudera-scm-agent"
- 確認(rèn)進(jìn)程停止:
pssh -h list_cm_agent -P "ps aux | grep 'cloudera-scm-agent.pid' | grep -v grep"
- 在 Cloudera Manager 管理控制臺(tái)中,單擊
Hosts
選項(xiàng)卡翔忽; - 重新選擇步驟 2 中選擇的主機(jī)英融;
- 選擇 Actions for Selected > Delete;
從集群中刪除主機(jī)
該操作可以把節(jié)點(diǎn)從集群中挪走歇式,但是保留 Cloudera 管理服務(wù)角色(如 ServiceMonitor 等)驶悟。
- 在 Cloudera Manager 管理控制臺(tái)中,單擊
Hosts
選項(xiàng)卡材失; - 選擇要?jiǎng)h除的主機(jī)痕鳍;
- 選擇 Actions for Selected > Remove From Cluster 。將顯示 "Remove Hosts From Cluster" 對(duì)話框;
- 跳過(guò)移除 Cloudera 管理服務(wù)角色的步驟笼呆,單擊 "Confirm" 以完成刪除選定的主機(jī)熊响;
常見(jiàn)問(wèn)題
Q:如果沒(méi)有按照如上步驟正確刪除主機(jī),可能會(huì)導(dǎo)致主機(jī)仍然在 All Hosts
列表中诗赌,但是實(shí)際已經(jīng)下線汗茄。這樣走 7180 endpoint 的監(jiān)控腳本就會(huì)產(chǎn)生誤報(bào),類似信息大致如下:
HOST_AGENT_LOG_DIRECTORY_FREE_SPACE=NOT_AVAILABLE, HOST_AGENT_PARCEL_DIRECTORY_FREE_SPACE=NOT_AVAILABLE, HOST_AGENT_PROCESS_DIRECTORY_FREE_SPACE=NOT_AVAILABLE, HOST_CLOCK_OFFSET=NOT_AVAILABLE, HOST_DNS_RESOLUTION=NOT_AVAILABLE, HOST_MEMORY_SWAPPING=NOT_AVAILABLE, HOST_NETWORK_FRAME_ERRORS=NOT_AVAILABLE, HOST_NETWORK_INTERFACES_SLOW_MODE=NOT_AVAILABLE, HOST_SCM_HEALTH=BAD
A:對(duì)于 CM Manager 來(lái)說(shuō)境肾,它會(huì)把主機(jī)元數(shù)據(jù)全部存放在 DB 里剔难,對(duì)應(yīng)的表為 ${DATABASE}.HOSTS
,我們只要?jiǎng)h除對(duì)應(yīng)的僵尸機(jī)器記錄即可奥喻,數(shù)據(jù)庫(kù)名稱一般是默認(rèn)的 cm
:
DELETE FROM cm.HOSTS WHERE HOST_ID='${HOST_ID}';
但是可能會(huì)因?yàn)橹皥?zhí)行過(guò)的指令和該行記錄產(chǎn)生了外鍵依賴而刪除不了偶宫,報(bào)類似外鍵依賴的錯(cuò),那么就首先需要?jiǎng)h除對(duì)應(yīng)的 COMMANDS
表的記錄:
DELETE FROM cm.AUDITS WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.CLIENT_CONFIGS_TO_HOSTS WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.COMMANDS WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.COMMANDS_SCHEDULES WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.CONFIGS WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.CONFIGS_AUD WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.HOSTS_AUD WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.PROCESSES WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.ROLES WHERE HOST_ID='${HOST_ID}';
DELETE FROM cm.ROLES_AUD WHERE HOST_ID='${HOST_ID}';
然后再執(zhí)行刪除:
SET FOREIGN_KEY_CHECKS = 0
DELETE FROM cm.HOSTS WHERE HOST_ID='${HOST_ID}';
SET FOREIGN_KEY_CHECKS = 1