我們都知道通過cluster nodes可以查看集群列表,當(dāng)遇到機器下線或者機器物理故障的時候需要置換機器。但是通過cluster nodes查看的時候還可以看到原來的無效ip, 所幸redis提供了cluster forget xx這個命令浓冒。
突然有一次執(zhí)行完cluster forget
后,經(jīng)過短暫的幾秒后,依然可以查到該無效ip,但是節(jié)點狀態(tài)變成了"handshake"
握手狀態(tài),而且nodeId在不停的發(fā)生變化挖腰。
后面經(jīng)查證,是因為集群所有節(jié)點都持有該節(jié)點的信息,不停的在發(fā)起重連操作。而且redis作者也針對這種情況給出了結(jié)論:
There are only two ways this can happen:
1. You fail to send CLUSTER FORGET to all the nodes in the cluster. So eventually there are nodes that still has a clue about this other node, and it will inform the other nodes via gossip. Make sure to send CLUSTER FORGET to every single node in the cluster.
2. Or alternatively, there is an instance running in 10.15.107.150 but you said there is not.
也就是需要在redis cluster所有節(jié)點上(包括從節(jié)點)執(zhí)行cluster forget xx
操作,才能徹底的移除掉無效節(jié)點列表,問題才得以解決胳徽。