原文地址:https://redis.io/topics/cluster-spec
(據(jù)說是)Redis官方中文網(wǎng)的翻譯:http://www.redis.cn/topics/cluster-spec.html
一個(gè)個(gè)人認(rèn)為更好的翻譯:https://www.cnblogs.com/kaleidoscope/p/9635163.html
Redis Cluster Specification
Welcome to the Redis Cluster Specification. Here you'll find information about algorithms and design rationales of Redis Cluster. This document is a work in progress as it is continuously synchronized with the actual implementation of Redis.
Main properties and rationales of the design
Redis Cluster goals
Redis Cluster is a distributed implementation of Redis with the following goals, in order of importance in the design:
Redis 集群是 Redis 的一種分布式實(shí)現(xiàn),它主要是為了實(shí)現(xiàn)以下這些目標(biāo)(按照在設(shè)計(jì)中的重要性排序):
- High performance and linear scalability up to 1000 nodes. There are no proxies, asynchronous replication is used, and no merge operations are performed on values.
高性能和線性可擴(kuò)展性,支持高達(dá)1000個(gè)節(jié)點(diǎn)。沒有代理雀彼、使用異步復(fù)制系忙,并且不對(duì)值執(zhí)行合并操作台颠。
- Acceptable degree of write safety: the system tries (in a best-effort way) to retain all the writes originating from clients connected with the majority of the master nodes. Usually there are small windows where acknowledged writes can be lost. Windows to lose acknowledged writes are larger when clients are in a minority partition.
一定程度上的寫入操作的安全性:系統(tǒng)最大限度嘗試保留來自于與多數(shù)主節(jié)點(diǎn)連接的客戶端的所有寫操作瞎嬉。通常姐赡,會(huì)存在一些小的時(shí)間窗口唐片,在這些窗口中丙猬,可能會(huì)丟失已確認(rèn)的寫入。當(dāng)出現(xiàn)網(wǎng)絡(luò)分區(qū)時(shí)费韭,若客戶端連接的是少數(shù)分區(qū)茧球,則寫操作丟失的窗口會(huì)更大。
譯注:這一句應(yīng)該指的是在網(wǎng)絡(luò)分區(qū)(P)時(shí)星持,Redis是選擇支持AP而不是CP抢埋。“small windows”指的應(yīng)該就是網(wǎng)絡(luò)分區(qū)的時(shí)間督暂。
- Availability: Redis Cluster is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable slave for every master node that is no longer reachable. Moreover using replicas migration, masters no longer replicated by any slave will receive one from a master which is covered by multiple slaves.
可用性:如果發(fā)生了網(wǎng)絡(luò)分區(qū)揪垄,但是大多數(shù)主節(jié)點(diǎn)是可訪問的,同時(shí)那些不可訪問的主節(jié)點(diǎn)至少還有一個(gè)從節(jié)點(diǎn)是可以提供訪問的逻翁,那么福侈,Redis集群在這種場景下依然可以提供服務(wù)。此外卢未,當(dāng)使用“副本遷移”特性時(shí)肪凛,還可以從擁有多個(gè)從節(jié)點(diǎn)的主節(jié)點(diǎn)那里,遷移一個(gè)從節(jié)點(diǎn)給沒有從節(jié)點(diǎn)的主節(jié)點(diǎn)辽社。
譯注:副本遷移可以保證所有主節(jié)點(diǎn)都擁有至少1個(gè)從節(jié)點(diǎn)伟墙,從而避免某個(gè)主節(jié)點(diǎn)掛掉之后部分槽無法提供服務(wù)。
What is described in this document is implemented in Redis 3.0 or greater.
這篇文檔中所描述的內(nèi)容滴铅,在Redis 3.0或者更高的版本中已實(shí)現(xiàn)戳葵。
Implemented subset
Redis Cluster implements all the single key commands available in the non-distributed version of Redis. Commands performing complex multi-key operations like Set type unions or intersections are implemented as well as long as the keys all hash to the same slot.
Redis集群版本實(shí)現(xiàn)了非分布式(單機(jī))版本中所有的single-key命令。復(fù)雜的multi-key操作汉匙,如求集合的并集或交集拱烁,(在某些場景下)也支持(簡單的就比如mget之類的)生蚁,只要這些key在哈希之后在同一個(gè)槽中。
Redis Cluster implements a concept called hash tags that can be used in order to force certain keys to be stored in the same hash slot. However during manual resharding, multi-key operations may become unavailable for some time while single key operations are always available.
Redis集群實(shí)現(xiàn)了一個(gè)叫hash tags的概念戏自。它可以用來強(qiáng)制讓一些key存儲(chǔ)在相同的哈希槽當(dāng)中邦投。但是,如果存在人工重新分片的場景擅笔,multi-key操作可能就不可用了志衣。但是single-key的操作總是可用的。
譯注:正常是對(duì)key進(jìn)行CRC16猛们,再用結(jié)果對(duì)18384取模得到具體槽位的念脯,hash tags是為了繞開這個(gè)限制。但是resharding之后這個(gè)hash tags就不生效了弯淘。
Redis Cluster does not support multiple databases like the stand alone version of Redis. There is just database 0 and the SELECT command is not allowed.
和單機(jī)版本不同绿店,Redis集群不支持多數(shù)據(jù)庫。只有0號(hào)數(shù)據(jù)庫是可用的庐橙,同時(shí)SELECT命令是禁止的假勿。
Clients and Servers roles in the Redis Cluster protocol
In Redis Cluster nodes are responsible for holding the data, and taking the state of the cluster, including mapping keys to the right nodes. Cluster nodes are also able to auto-discover other nodes, detect non-working nodes, and promote slave nodes to master when needed in order to continue to operate when a failure occurs.
在Redis集群中,節(jié)點(diǎn)的主要作用是用于存儲(chǔ)數(shù)據(jù)以及獲取集群狀態(tài)怕午,包括把key映射到正確的節(jié)點(diǎn)上废登。集群節(jié)點(diǎn)同樣也能夠自動(dòng)發(fā)現(xiàn)其他節(jié)點(diǎn)淹魄、探查非工作節(jié)點(diǎn)郁惜,以及,為了是集群能夠持續(xù)提供服務(wù)甲锡,當(dāng)必要時(shí)(對(duì)應(yīng)的主節(jié)點(diǎn)掛掉)兆蕉,把從節(jié)點(diǎn)提升為主節(jié)點(diǎn)。
To perform their tasks all the cluster nodes are connected using a TCP bus and a binary protocol, called the Redis Cluster Bus. Every node is connected to every other node in the cluster using the cluster bus. Nodes use a gossip protocol to propagate information about the cluster in order to discover new nodes, to send ping packets to make sure all the other nodes are working properly, and to send cluster messages needed to signal specific conditions. The cluster bus is also used in order to propagate Pub/Sub messages across the cluster and to orchestrate manual failovers when requested by users (manual failovers are failovers which are not initiated by the Redis Cluster failure detector, but by the system administrator directly).
為了執(zhí)行上述的那些任務(wù)缤沦,所有節(jié)點(diǎn)之間通過TCP總線和二進(jìn)制協(xié)議相連虎韵,稱為Redis集群總線。所有節(jié)點(diǎn)間通過集群總線相連缸废。為了發(fā)現(xiàn)新的節(jié)點(diǎn)包蓝,節(jié)點(diǎn)通過gossip協(xié)議傳播集群信息。通過發(fā)送ping包來確保所有其他節(jié)點(diǎn)處于正常工作的狀態(tài)企量。還可以在特定情況下测萎,發(fā)送集群消息。集群總線可以用來在集群中傳播發(fā)布/訂閱消息届巩。它還可以用于編排人工故障切換硅瞧。人工故障切換指的是,非Redis集群故障探測機(jī)制發(fā)起的恕汇,由系統(tǒng)管理員直接人工執(zhí)行的故障切換腕唧。
譯注:這個(gè)bus翻譯成總線也不是特別貼切或辖,但是暫時(shí)沒有更好的option了。
Since cluster nodes are not able to proxy requests, clients may be redirected to other nodes using redirection errors -MOVED
and -ASK
. The client is in theory free to send requests to all the nodes in the cluster, getting redirected if needed, so the client is not required to hold the state of the cluster. However clients that are able to cache the map between keys and nodes can improve the performance in a sensible way.
由于集群節(jié)點(diǎn)不能代理請(qǐng)求枣接,客戶端可能被重定向錯(cuò)誤指令颂暇,如MOVED和ASK,重定向到其他節(jié)點(diǎn)月腋◇凹埽客戶端理論上可以給集群所有的節(jié)點(diǎn)發(fā)送請(qǐng)求,必要時(shí)再重定向(到真正處理請(qǐng)求的節(jié)點(diǎn))榆骚,因此客戶端不需要包括集群狀態(tài)片拍。然而,客戶端應(yīng)該可以緩存一份映射key和節(jié)點(diǎn)的信息妓肢,以一種合理的方式提升性能捌省。
譯注:這里的map between keys and nodes不知道作者的原意想表達(dá)的是什么。理論上來說在客戶端進(jìn)行CRC16(key) mod 16384找到對(duì)應(yīng)的slot碉钠,之后再根據(jù)<slot, node>的映射關(guān)系找到node會(huì)更合適纲缓。
因?yàn)閗ey的數(shù)量理論上是沒有上限的,如果全量緩存下來內(nèi)存不一定扛得住喊废。
不過這里引申出一種可能的實(shí)現(xiàn)祝高,在假設(shè)就是緩存了<key, node>的映射,可以設(shè)置map的max size和過期淘汰策略(guava有現(xiàn)成的實(shí)現(xiàn))污筷。之后工闺,如果map里有則直接取,否則就重新計(jì)算slot瓣蛀。
Write safety
Redis Cluster uses asynchronous replication between nodes, and last failover wins implicit merge function. This means that the last elected master dataset eventually replaces all the other replicas. There is always a window of time when it is possible to lose writes during partitions. However these windows are very different in the case of a client that is connected to the majority of masters, and a client that is connected to the minority of masters.
Redis集群在(主從)節(jié)點(diǎn)間通過異步的方式同步數(shù)據(jù)陆蟆,并采用last failover wins的機(jī)制。它是指最新選舉出來的主節(jié)點(diǎn)惋增,最終會(huì)代替(這個(gè)分片的)其他從節(jié)點(diǎn)叠殷。在出現(xiàn)網(wǎng)絡(luò)分區(qū)是,存在一個(gè)時(shí)間窗口诈皿,可能會(huì)導(dǎo)致寫丟失林束。這個(gè)窗口時(shí)間的大小,完全取決于稽亏,在分區(qū)時(shí)壶冒,客戶端連接的是多數(shù)主節(jié)點(diǎn)(所在的分區(qū)),還是少數(shù)主節(jié)點(diǎn)(所在的分區(qū))措左。
譯注:這里的寫丟失依痊,指的是客戶端收到服務(wù)端返回的寫入成功,但最終查詢的時(shí)候查詢不到這條數(shù)據(jù)。
Redis Cluster tries harder to retain writes that are performed by clients connected to the majority of masters, compared to writes performed in the minority side. The following are examples of scenarios that lead to loss of acknowledged writes received in the majority partitions during failures:
相對(duì)于客戶端連接到少數(shù)(主)節(jié)點(diǎn)所在分區(qū)的情況胸嘁,當(dāng)客戶端連接的是多數(shù)(主)節(jié)點(diǎn)所在的分區(qū)瓶摆,Redis集群會(huì)盡更大的努力去嘗試保留已經(jīng)被(服務(wù)端)確認(rèn)的寫操作。以下的例子性宏,列舉了一些寫入操作被多數(shù)節(jié)點(diǎn)確認(rèn)但仍出現(xiàn)數(shù)據(jù)丟失的場景:
- A write may reach a master, but while the master may be able to reply to the client, the write may not be propagated to slaves via the asynchronous replication used between master and slave nodes. If the master dies without the write reaching the slaves, the write is lost forever if the master is unreachable for a long enough period that one of its slaves is promoted. This is usually hard to observe in the case of a total, sudden failure of a master node since masters try to reply to clients (with the acknowledge of the write) and slaves (propagating the write) at about the same time. However it is a real world failure mode.
寫操作發(fā)送給主節(jié)點(diǎn)群井,但由于主從節(jié)點(diǎn)之間的數(shù)據(jù)同步是異步的,此時(shí)可能主節(jié)點(diǎn)已經(jīng)給客戶端響應(yīng)毫胜,但數(shù)據(jù)還沒有同步給從節(jié)點(diǎn)书斜。在這種情況下,如果主節(jié)點(diǎn)不可用的持續(xù)時(shí)間太長酵使,以至于在此期間荐吉,從節(jié)點(diǎn)提升為新的主節(jié)點(diǎn),則數(shù)據(jù)可能永久丟失口渔。主節(jié)點(diǎn)突然掛掉(宕機(jī)样屠、網(wǎng)絡(luò)不可達(dá)等),是一種很難被觀察到的寫失敗缺脉,因?yàn)橹鞴?jié)點(diǎn)給客戶端發(fā)送寫入成功響應(yīng)痪欲,以及給從節(jié)點(diǎn)發(fā)送同步數(shù)據(jù)的,幾乎是在同一時(shí)間進(jìn)行的攻礼,然而卻這卻是真實(shí)存在的一種場景业踢。
- Another theoretically possible failure mode where writes are lost is the following:
- A master is unreachable because of a partition.
- It gets failed over by one of its slaves.
- After some time it may be reachable again.
- A client with an out-of-date routing table may write to the old master before it is converted into a slave (of the new master) by the cluster.
另一種理論上可能導(dǎo)致寫入失敗的情況是這樣的:主節(jié)點(diǎn)因?yàn)榫W(wǎng)絡(luò)分區(qū)不可達(dá)。此時(shí)從節(jié)點(diǎn)被選舉為主節(jié)點(diǎn)(這句話的隱藏含義是礁扮,從節(jié)點(diǎn)在多數(shù)節(jié)點(diǎn)所在的分區(qū))知举。一段時(shí)間之后,主節(jié)點(diǎn)重新可達(dá)(即網(wǎng)絡(luò)分區(qū)消失)深员。但是因?yàn)榭蛻舳司S護(hù)的路由表是過時(shí)的负蠕,導(dǎo)致它一直在跟這臺(tái)主節(jié)點(diǎn)通信蛙埂。當(dāng)這臺(tái)主節(jié)點(diǎn)變?yōu)樾碌闹鞴?jié)點(diǎn)的從節(jié)點(diǎn)之后倦畅,寫操作丟失。
譯注:如果舊的主節(jié)點(diǎn)下線足夠久绣的,期間從節(jié)點(diǎn)晉升為新的主節(jié)點(diǎn)叠赐,則舊的主節(jié)點(diǎn)重新上線后會(huì)降級(jí)為(新的主節(jié)點(diǎn)的)從節(jié)點(diǎn)。
The second failure mode is unlikely to happen because master nodes unable to communicate with the majority of the other masters for enough time to be failed over will no longer accept writes, and when the partition is fixed writes are still refused for a small amount of time to allow other nodes to inform about configuration changes. This failure mode also requires that the client's routing table has not yet been updated.
第二種(寫入)失斅沤(實(shí)際上)不太可能會(huì)發(fā)生芭概。因?yàn)槿绻鞴?jié)點(diǎn)不能和集群中其他主節(jié)點(diǎn)長時(shí)間不能通信的話,將不會(huì)再支持寫入操作惩嘉,同時(shí)罢洲,當(dāng)分區(qū)恢復(fù)時(shí),仍會(huì)有一小段時(shí)間拒絕寫操作的執(zhí)行,這段時(shí)間是用來通知(inform?)配置變更的惹苗。而且殿较,還要在客戶端的路由表是過期的情況下才能發(fā)生。
譯注:核心想表達(dá)的是這種寫失敗的發(fā)生條件非常的嚴(yán)苛桩蓉。
Writes targeting the minority side of a partition have a larger window in which to get lost. For example, Redis Cluster loses a non-trivial number of writes on partitions where there is a minority of masters and at least one or more clients, since all the writes sent to the masters may potentially get lost if the masters are failed over in the majority side.
(如果)客戶端向少數(shù)節(jié)點(diǎn)所在的分區(qū)發(fā)送寫入操作的指令淋纲,則會(huì)有一個(gè)較大的、寫丟失的時(shí)間窗口院究。這是因?yàn)榍⑺玻绻@些接受寫指令的主節(jié)點(diǎn),在另一個(gè)有多數(shù)節(jié)點(diǎn)所在的分區(qū)中進(jìn)行了故障轉(zhuǎn)移(也就是被徹底剔除掉了)业汰。此時(shí)的Redis集群會(huì)丟失大量的寫入數(shù)據(jù)伙窃,也就是說所有發(fā)送給這些主節(jié)點(diǎn)的(并且得到確認(rèn)的)數(shù)據(jù)都會(huì)丟失。
譯注:這一段是意譯的样漆。
Specifically, for a master to be failed over it must be unreachable by the majority of masters for at least NODE_TIMEOUT
, so if the partition is fixed before that time, no writes are lost. When the partition lasts for more than NODE_TIMEOUT
, all the writes performed in the minority side up to that point may be lost. However the minority side of a Redis Cluster will start refusing writes as soon as NODE_TIMEOUT
time has elapsed without contact with the majority, so there is a maximum window after which the minority becomes no longer available. Hence, no writes are accepted or lost after that time.
特別的对供,主節(jié)點(diǎn)如果進(jìn)行故障轉(zhuǎn)移,那么它一定被(另一個(gè)分區(qū)的)大多數(shù)主節(jié)點(diǎn)判定為不可訪問狀態(tài)超過NODE_TIMEOUT
所定義的時(shí)長氛濒。如果再此期間內(nèi)網(wǎng)絡(luò)分區(qū)修復(fù)了产场,則沒有寫入丟失。如果網(wǎng)絡(luò)分區(qū)的持續(xù)時(shí)間超過了NODE_TIMEOUT
舞竿,則期間內(nèi)京景,所有往較少節(jié)點(diǎn)所在分區(qū)發(fā)送的寫操作指令都可能丟失。少數(shù)節(jié)點(diǎn)所在的集群在超過NODE_TIMEOUT
的時(shí)限后骗奖,如果依舊無法與大多數(shù)節(jié)點(diǎn)所在的集群通信确徙,則會(huì)開始拒絕寫入指令,因此执桌,在網(wǎng)絡(luò)分區(qū)后鄙皇,有一個(gè)最大的(寫丟失)窗口,在此之后所有的寫操作都不再被接受仰挣。
Availability
Redis Cluster is not available in the minority side of the partition. In the majority side of the partition assuming that there are at least the majority of masters and a slave for every unreachable master, the cluster becomes available again after NODE_TIMEOUT
time plus a few more seconds required for a slave to get elected and failover its master (failovers are usually executed in a matter of 1 or 2 seconds).
當(dāng)發(fā)生網(wǎng)絡(luò)分區(qū)時(shí)伴逸,majority side指的是分區(qū)中的這樣一端,其中大部分主節(jié)點(diǎn)是可用的膘壶,同時(shí)桅狠,那些不可用的主節(jié)點(diǎn)至少有一個(gè)從節(jié)點(diǎn)是可用的扣孟。與之相對(duì)的一端稱為minority side,它在分區(qū)發(fā)生時(shí)是不可用的。分區(qū)發(fā)生時(shí)凸克,majority side會(huì)有一段短暫的不可用的時(shí)間妒蛇。時(shí)長等于NODE_TIMEOUT
加上一小段從節(jié)點(diǎn)完成選舉(成為主節(jié)點(diǎn))并對(duì)原先的主節(jié)點(diǎn)進(jìn)行故障轉(zhuǎn)移的時(shí)間菱属,通常是1-2秒引瀑。此后,majority side恢復(fù)可用狀態(tài)。
This means that Redis Cluster is designed to survive failures of a few nodes in the cluster, but it is not a suitable solution for applications that require availability in the event of large net splits.
這意味著束世,Redis集群被設(shè)計(jì)為悼吱,可以適應(yīng)少部分節(jié)點(diǎn)故障,但它不適合于發(fā)的網(wǎng)絡(luò)分區(qū)良狈。
In the example of a cluster composed of N master nodes where every node has a single slave, the majority side of the cluster will remain available as long as a single node is partitioned away, and will remain available with a probability of 1-(1/(N\*2-1))
when two nodes are partitioned away (after the first node fails we are left with N\*2-1
nodes in total, and the probability of the only master without a replica to fail is 1/(N\*2-1))
.
以一個(gè)有N個(gè)主節(jié)點(diǎn)后添,各自有一個(gè)從節(jié)點(diǎn)(也就是共N個(gè)從節(jié)點(diǎn))的集群為例。majority side可以在掛掉1個(gè)節(jié)點(diǎn)的時(shí)候薪丁,保證可用性遇西。在掛掉2個(gè)節(jié)點(diǎn)的時(shí)候,有1-(1/(N\*2-1))
的概率繼續(xù)可用严嗜。(只要那個(gè)沒有從節(jié)點(diǎn)的主節(jié)點(diǎn)不要掛掉粱檀,就可以繼續(xù)服務(wù))
For example, in a cluster with 5 nodes and a single slave per node, there is a 1/(5\*2-1) = 11.11%
probability that after two nodes are partitioned away from the majority, the cluster will no longer be available.
比如有5主5從的一個(gè)集群,當(dāng)其中的2個(gè)節(jié)點(diǎn)無法連接到majority side時(shí)漫玄,集群有11.11%的概率不再可用茄蚯。
譯注:說人話就是,A/A' B/B' C/C' 3主3從組成的集群睦优,隨機(jī)掛掉一臺(tái)(假設(shè)是A)渗常,集群都可以提供服務(wù)。此時(shí)剩下5個(gè)節(jié)點(diǎn)汗盘,每臺(tái)掛掉的概率都是1/5皱碘,只要A'沒掛,集群就不受影響隐孽,所以集群有4/5的概率可以再承受一臺(tái)節(jié)點(diǎn)故障癌椿。
Thanks to a Redis Cluster feature called replicas migration the Cluster availability is improved in many real world scenarios by the fact that replicas migrate to orphaned masters (masters no longer having replicas). So at every successful failure event, the cluster may reconfigure the slaves layout in order to better resist the next failure.
Redis集群提供了一個(gè)叫副本遷移的特性,它可以提升集群的可用性菱阵,來適應(yīng)很多在真實(shí)世界可能發(fā)生的場景踢俄。副本遷移會(huì)將富余的從節(jié)點(diǎn)遷移給孤兒主節(jié)點(diǎn)(即沒有從節(jié)點(diǎn)的主節(jié)點(diǎn))。因此晴及,每次成功處理集群故障的時(shí)候都办,集群可能會(huì)重新分配從節(jié)點(diǎn)(和主節(jié)點(diǎn)的拓?fù)潢P(guān)系),來更好的應(yīng)對(duì)下一次可能的故障抗俄。
Performance
In Redis Cluster nodes don't proxy commands to the right node in charge for a given key, but instead they redirect clients to the right nodes serving a given portion of the key space.
在Redis集群中脆丁,不會(huì)通過代理的方式把某個(gè)key的請(qǐng)求轉(zhuǎn)發(fā)給某個(gè)節(jié)點(diǎn)世舰。而是通過讓客戶端重定向的方式完成动雹。(前者對(duì)客戶端無感知,后者對(duì)客戶端感知)
Eventually clients obtain an up-to-date representation of the cluster and which node serves which subset of keys, so during normal operations clients directly contact the right nodes in order to send a given command.
最終客戶端會(huì)擁有一份集群的拓?fù)潢P(guān)系(應(yīng)該是node和slot的映射關(guān)系)跟压,以及哪些節(jié)點(diǎn)能支持哪些key的映射關(guān)系胰蝠。所以,在正常的操作期間,客戶端會(huì)直接和正確的節(jié)點(diǎn)通信茸塞。
Because of the use of asynchronous replication, nodes do not wait for other nodes' acknowledgment of writes (if not explicitly requested using the WAIT command).
因?yàn)橹鲝闹g使用的是異步的備份機(jī)制躲庄,因此,主節(jié)點(diǎn)不會(huì)等待從節(jié)點(diǎn)寫入成功之后再給客戶端發(fā)送回執(zhí)(除非客戶端顯式的發(fā)送WAIT指令)钾虐。
Also, because multi-key commands are only limited to near keys, data is never moved between nodes except when resharding.
同樣的噪窘,因?yàn)閙ulti-key類型的指令限制于near keys(這里的near指的是在同一個(gè)哈希槽中的key),所以數(shù)據(jù)在非重分配期間效扫,根本無需在節(jié)點(diǎn)間移動(dòng)倔监。
Normal operations are handled exactly as in the case of a single Redis instance. This means that in a Redis Cluster with N master nodes you can expect the same performance as a single Redis instance multiplied by N as the design scales linearly. At the same time the query is usually performed in a single round trip, since clients usually retain persistent connections with the nodes, so latency figures are also the same as the single standalone Redis node case.
普通的操作是在集群模式和單實(shí)例模式下一樣的。這意味著菌仁,當(dāng)集群擁有N個(gè)節(jié)點(diǎn)時(shí)浩习,你可以擁有接近N倍于單實(shí)例的性能。與此同時(shí)济丘,因?yàn)榭蛻舳送ǔ?huì)和節(jié)點(diǎn)保持長連接谱秽,因此請(qǐng)求通常在一次往返中執(zhí)行,所以延遲通常也和單實(shí)例模式下一樣摹迷。
Very high performance and scalability while preserving weak but reasonable forms of data safety and availability is the main goal of Redis Cluster.
追求極致的高性能和高拓展性疟赊,同時(shí)保證相對(duì)較弱但合理的數(shù)據(jù)安全性和可用性時(shí)Redis集群的首要設(shè)計(jì)目標(biāo)。
Why merge operations are avoided
Redis Cluster design avoids conflicting versions of the same key-value pair in multiple nodes as in the case of the Redis data model this is not always desirable. Values in Redis are often very large; it is common to see lists or sorted sets with millions of elements. Also data types are semantically complex. Transferring and merging these kind of values can be a major bottleneck and/or may require the non-trivial involvement of application-side logic, additional memory to store meta-data, and so forth.
Redis集群在設(shè)計(jì)時(shí)避免了相同的鍵值對(duì)在多個(gè)節(jié)點(diǎn)中存在版本沖突(的可能)峡碉,Redis的數(shù)據(jù)模型不提倡這么做听绳。Redis中存儲(chǔ)的值通常非常大,擁有百萬級(jí)別元素的列表或者有序集合非常常見异赫。數(shù)據(jù)類型在語義上也很復(fù)雜椅挣。傳輸和合并這些類型的值,可能成為一個(gè)主要的性能瓶頸塔拳,并且/或者鼠证,可能需要應(yīng)用程序側(cè)邏輯參與其中來做一些專門的適配,比如增加額外的存儲(chǔ)來存儲(chǔ)元數(shù)據(jù)信息等靠抑。
There are no strict technological limits here. CRDTs or synchronously replicated state machines can model complex data types similar to Redis. However, the actual run time behavior of such systems would not be similar to Redis Cluster. Redis Cluster was designed in order to cover the exact use cases of the non-clustered Redis version.
以上沒有嚴(yán)格的技術(shù)限制量九。 CRDT或同步復(fù)制狀態(tài)機(jī)可以對(duì)復(fù)雜數(shù)據(jù)類型進(jìn)行建模,達(dá)到和Redis近似的效果颂碧。但是荠列,這類系統(tǒng)實(shí)際的運(yùn)行時(shí)行為將與Redis集群不同。 而Redis Cluster的設(shè)計(jì)载城,旨在保證和單機(jī)版的Redis行為一致肌似。
Overview of Redis Cluster main components
Keys distribution model
The key space is split into 16384 slots, effectively setting an upper limit for the cluster size of 16384 master nodes (however the suggested max size of nodes is in the order of ~ 1000 nodes).
Redis集群的鍵空間分散在16384個(gè)槽位中。這些槽位最大可以分配給16384個(gè)節(jié)點(diǎn)诉瓦。(不過川队,建議的最大節(jié)點(diǎn)數(shù)大約在1000左右)
Each master node in a cluster handles a subset of the 16384 hash slots. The cluster is stable when there is no cluster reconfiguration in progress (i.e. where hash slots are being moved from one node to another). When the cluster is stable, a single hash slot will be served by a single node (however the serving node can have one or more slaves that will replace it in the case of net splits or failures, and that can be used in order to scale read operations where reading stale data is acceptable).
集群中的每一個(gè)節(jié)點(diǎn)處理16384個(gè)哈希槽中的一部分力细。如果沒有進(jìn)行中的集群再分配行為,比如哈希槽從某個(gè)節(jié)點(diǎn)挪到另一個(gè)節(jié)點(diǎn)固额,那么集群的拓?fù)浣Y(jié)構(gòu)是穩(wěn)定的眠蚂。此時(shí),任意一個(gè)哈希槽斗躏,只會(huì)有一個(gè)節(jié)點(diǎn)來承載并對(duì)外提供服務(wù)逝慧。一種例外的情況是,主節(jié)點(diǎn)可能有一個(gè)或多個(gè)從節(jié)點(diǎn)啄糙,以應(yīng)對(duì)網(wǎng)絡(luò)分區(qū)或故障的發(fā)生馋艺,如果可以接受讀到的數(shù)據(jù)不是最新的,那么迈套,這些從節(jié)點(diǎn)也可以被用來拓展讀操作的性能捐祠。
The base algorithm used to map keys to hash slots is the following (read the next paragraph for the hash tag exception to this rule):
HASH_SLOT = CRC16(key) mod 16384
把key映射到具體的哈希槽的基礎(chǔ)算法為:對(duì)key執(zhí)行CRC16后對(duì)16384取模。(后面的段落介紹了通過hash tag的方式繞過這個(gè)算法)
The CRC16 is specified as follows:
- Name: XMODEM (also known as ZMODEM or CRC-16/ACORN)
- Width: 16 bit
- Poly: 1021 (That is actually x16 + x12 + x5 + 1)
- Initialization: 0000
- Reflect Input byte: False
- Reflect Output CRC: False
- Xor constant to output CRC: 0000
- Output for "123456789": 31C3
14 out of 16 CRC16 output bits are used (this is why there is a modulo 16384 operation in the formula above).
In our tests CRC16 behaved remarkably well in distributing different kinds of keys evenly across the 16384 slots.
Note: A reference implementation of the CRC16 algorithm used is available in the Appendix A of this document.
譯注:CRC16的具體細(xì)節(jié)這里就不展開翻譯了桑李,作者表示了CRC16的哈希效果非常均勻踱蛀,并在文檔的附錄A里提供了完整的代碼。
Keys hash tags
There is an exception for the computation of the hash slot that is used in order to implement hash tags. Hash tags are a way to ensure that multiple keys are allocated in the same hash slot. This is used in order to implement multi-key operations in Redis Cluster.
計(jì)算哈希槽的時(shí)候贵白,有一種例外率拒,被稱為哈希標(biāo)簽。它能確保多個(gè)key被分配到同一個(gè)哈希槽上禁荒。這是為了實(shí)現(xiàn)Redis集群的多key操作猬膨。
In order to implement hash tags, the hash slot for a key is computed in a slightly different way in certain conditions. If the key contains a "{...}" pattern only the substring between {
and }
is hashed in order to obtain the hash slot. However since it is possible that there are multiple occurrences of {
or }
the algorithm is well specified by the following rules:
- IF the key contains a
{
character. - AND IF there is a
}
character to the right of{
- AND IF there are one or more characters between the first occurrence of
{
and the first occurrence of}
.
Then instead of hashing the key, only what is between the first occurrence of {
and the following first occurrence of }
is hashed.
為了實(shí)現(xiàn)哈希標(biāo)記的特性,當(dāng)滿足特定條件的時(shí)候呛伴,計(jì)算某個(gè)key屬于哪個(gè)哈希槽的時(shí)候勃痴,有輕微的不同。如果key中包含{...}
热康,則只有沛申。花括號(hào)中的內(nèi)容會(huì)被用來參與計(jì)算哈希槽的位置姐军。然而铁材,一個(gè)key中很可能包含多個(gè){
或者}
,此時(shí)會(huì)根據(jù)以下的規(guī)則:提取第一對(duì)能匹配的花括號(hào)奕锌,且花括號(hào)內(nèi)至少有一個(gè)字符(猜測否則的話會(huì)有大量key被哈希到一個(gè)slot中著觉,造成數(shù)據(jù)傾斜)。則提取花括號(hào)中的字符串代替整個(gè)key來計(jì)算哈希槽惊暴。
譯注:這里的第一對(duì)不是人眼直觀上的第一對(duì)饼丘,如果是{{...}}的情況,則提取出來的是{...
以下例子比較冗長缴守,個(gè)人感受還不如直接看Ruby代碼來的直觀葬毫。簡短說就是先定位到一個(gè)左花括號(hào)所在的index镇辉,再往后定位到第一個(gè)右花括號(hào)所在的index屡穗,如果中間有內(nèi)容就提取出來贴捡。
Examples:
- The two keys
{user1000}.following
and{user1000}.followers
will hash to the same hash slot since only the substringuser1000
will be hashed in order to compute the hash slot. - For the key
foo{}{bar}
the whole key will be hashed as usually since the first occurrence of{
is followed by}
on the right without characters in the middle. - For the key
foo{{bar}}zap
the substring{bar
will be hashed, because it is the substring between the first occurrence of{
and the first occurrence of}
on its right. - For the key
foo{bar}{zap}
the substringbar
will be hashed, since the algorithm stops at the first valid or invalid (without bytes inside) match of{
and}
. - What follows from the algorithm is that if the key starts with
{}
, it is guaranteed to be hashed as a whole. This is useful when using binary data as key names.
Adding the hash tags exception, the following is an implementation of the HASH_SLOT
function in Ruby and C language.
Ruby example code:
def HASH_SLOT(key)
s = key.index "{"
if s
e = key.index "}",s+1
if e && e != s+1
key = key[s+1..e-1]
end
end
crc16(key) % 16384
end
C example code:
unsigned int HASH_SLOT(char *key, int keylen) {
int s, e; /* start-end indexes of { and } */
/* Search the first occurrence of '{'. */
for (s = 0; s < keylen; s++)
if (key[s] == '{') break;
/* No '{' ? Hash the whole key. This is the base case. */
if (s == keylen) return crc16(key,keylen) & 16383;
/* '{' found? Check if we have the corresponding '}'. */
for (e = s+1; e < keylen; e++)
if (key[e] == '}') break;
/* No '}' or nothing between {} ? Hash the whole key. */
if (e == keylen || e == s+1) return crc16(key,keylen) & 16383;
/* If we are here there is both a { and a } on its right. Hash
* what is in the middle between { and }. */
return crc16(key+s+1,e-s-1) & 16383;
}
Cluster nodes attributes
Every node has a unique name in the cluster. The node name is the hex representation of a 160 bit random number, obtained the first time a node is started (usually using /dev/urandom). The node will save its ID in the node configuration file, and will use the same ID forever, or at least as long as the node configuration file is not deleted by the system administrator, or a hard reset is requested via the CLUSTER RESET command.
每個(gè)節(jié)點(diǎn)都有一個(gè)獨(dú)一無二的名字。這個(gè)名字是用16進(jìn)制表示的一個(gè)160bit的隨機(jī)字符串村砂。它通常是節(jié)點(diǎn)啟動(dòng)的時(shí)候(從/dev/urandom中)獲得的烂斋。節(jié)點(diǎn)會(huì)把這個(gè)名稱寫進(jìn)配置文件中,并且在除配置文件被刪除或執(zhí)行了集群的RESET指令以外的場景下础废,節(jié)點(diǎn)永遠(yuǎn)不會(huì)主動(dòng)改變這個(gè)名稱汛骂。
The node ID is used to identify every node across the whole cluster. It is possible for a given node to change its IP address without any need to also change the node ID. The cluster is also able to detect the change in IP/port and reconfigure using the gossip protocol running over the cluster bus.
節(jié)點(diǎn)ID(就是上一段的那個(gè)unique name)被用來唯一標(biāo)識(shí)集群中的每一個(gè)節(jié)點(diǎn)。節(jié)點(diǎn)可能改變它的IP评腺,但是不需要改變這個(gè)ID帘瞭。集群應(yīng)該有能力感知IP和端口的變化,并且通過gossip協(xié)議來重配置集群(的信息)蒿讥。
The node ID is not the only information associated with each node, but is the only one that is always globally consistent. Every node has also the following set of information associated. Some information is about the cluster configuration detail of this specific node, and is eventually consistent across the cluster. Some other information, like the last time a node was pinged, is instead local to each node.
和每個(gè)節(jié)點(diǎn)相關(guān)的信息蝶念,不只是有節(jié)點(diǎn)ID信息,但節(jié)點(diǎn)ID總是全局一致的芋绸。每個(gè)節(jié)點(diǎn)還會(huì)關(guān)聯(lián)以下(兩類)信息媒殉。一些信息與某個(gè)特定節(jié)點(diǎn)的集群配置的詳細(xì)信息有關(guān),并且最終在整個(gè)集群中保持一致摔敛。 相反廷蓉,另一些其他信息(例如上次對(duì)節(jié)點(diǎn)執(zhí)行ping操作)則位于每個(gè)節(jié)點(diǎn)本地。
Every node maintains the following information about other nodes that it is aware of in the cluster: The node ID, IP and port of the node, a set of flags, what is the master of the node if it is flagged as slave
, last time the node was pinged and the last time the pong was received, the current configuration epoch of the node (explained later in this specification), the link state and finally the set of hash slots served.
每個(gè)節(jié)點(diǎn)马昙,都維護(hù)有關(guān)集群中其他節(jié)點(diǎn)的以下信息:節(jié)點(diǎn)ID桃犬、節(jié)點(diǎn)的IP和端口號(hào)、一系列的標(biāo)識(shí)位行楞、如果是從節(jié)點(diǎn)的話疫萤,那主節(jié)點(diǎn)是誰、最近一次心跳(ping/pong)的情況敢伸、節(jié)點(diǎn)配置的當(dāng)前epoch(文檔的后續(xù)部分會(huì)解釋)扯饶、連接的狀態(tài)以及哈希槽的分配情況。
A detailed explanation of all the node fields is described in the CLUSTER NODES documentation.
關(guān)于所有節(jié)點(diǎn)字段更詳細(xì)的解釋參考CLUSTER NODES相關(guān)的文檔池颈。
The CLUSTER NODES command can be sent to any node in the cluster and provides the state of the cluster and the information for each node according to the local view the queried node has of the cluster.
CLUSTER NODES命令可以發(fā)送到群集中的任何節(jié)點(diǎn)尾序,并根據(jù)執(zhí)行查詢的節(jié)點(diǎn)的本地視圖,提供群集的狀態(tài)以及每個(gè)節(jié)點(diǎn)的信息躯砰。
The following is sample output of the CLUSTER NODES command sent to a master node in a small cluster of three nodes.
以下是一個(gè)包含三個(gè)主節(jié)點(diǎn)的小集群的示例輸出每币。
$ redis-cli cluster nodes
d1861060fe6a534d42d8a19aeb36600e18785e04 127.0.0.1:6379 myself - 0 1318428930 1 connected 0-1364
3886e65cc906bfd9b1f7e7bde468726a052d1dae 127.0.0.1:6380 master - 1318428930 1318428931 2 connected 1365-2729
d289c575dcbc4bdd2931585fd4339089e461a27d 127.0.0.1:6381 master - 1318428931 1318428931 3 connected 2730-4095
In the above listing the different fields are in order: node id, address:port, flags, last ping sent, last pong received, configuration epoch, link state, slots. Details about the above fields will be covered as soon as we talk of specific parts of Redis Cluster.
上述列表中不同的字段按如下順序排列:節(jié)點(diǎn)ID、IP:端口琢歇、標(biāo)識(shí)信息(主或者從兰怠,從的話是誰的從)梦鉴,最近一次發(fā)送ping消息的時(shí)間(其實(shí)是相對(duì)于1970-01-01 00:00:00的毫秒數(shù);如果沒有阻塞的ping請(qǐng)求揭保,則為0)肥橙、最近一次接收到pong消息的時(shí)間、配置的epoch秸侣、連接狀態(tài)存筏、槽分配情況。細(xì)節(jié)的信息在我們討論Redis集群的特定部分時(shí)會(huì)詳細(xì)討論味榛。
The Cluster bus
Every Redis Cluster node has an additional TCP port for receiving incoming connections from other Redis Cluster nodes. This port is at a fixed offset from the normal TCP port used to receive incoming connections from clients. To obtain the Redis Cluster port, 10000 should be added to the normal commands port. For example, if a Redis node is listening for client connections on port 6379, the Cluster bus port 16379 will also be opened.
Redis集群中的每一個(gè)節(jié)點(diǎn)椭坚,都有一個(gè)額外的TCP端口,用來接收其他節(jié)點(diǎn)發(fā)來的連接搏色。這個(gè)端口和用于接收客戶端連接的端口之間有一個(gè)固定大小的偏移量善茎,通常是10000。比如频轿,如果客戶端通過6379端口連接服務(wù)端垂涯,則16379會(huì)被用來作為額外的端口。
Node-to-node communication happens exclusively using the Cluster bus and the Cluster bus protocol: a binary protocol composed of frames of different types and sizes. The Cluster bus binary protocol is not publicly documented since it is not intended for external software devices to talk with Redis Cluster nodes using this protocol. However you can obtain more details about the Cluster bus protocol by reading the cluster.h
and cluster.c
files in the Redis Cluster source code.
點(diǎn)對(duì)點(diǎn)的通信只會(huì)通過集群總線和集群總線協(xié)議(一種二進(jìn)制的協(xié)議略吨,由不同類型和大小的幀構(gòu)成)來進(jìn)行集币。集群總線協(xié)議并沒有公開的文檔記錄,因?yàn)椴⒉淮蛩阌盟鼇碜鳛橥獠寇浖图汗?jié)點(diǎn)通信的協(xié)議翠忠。當(dāng)然鞠苟,你可以通過Redis源碼的cluster.h
和cluster.c
兩個(gè)文件了解更多細(xì)節(jié)。
Cluster topology
Redis Cluster is a full mesh where every node is connected with every other node using a TCP connection.
Redis集群是一個(gè)完全的網(wǎng)格:每一個(gè)節(jié)點(diǎn)之間互相通過TCP連接秽之。
In a cluster of N nodes, every node has N-1 outgoing TCP connections, and N-1 incoming connections.
一個(gè)N個(gè)幾點(diǎn)的集群当娱,每個(gè)節(jié)點(diǎn)都分別有N-1個(gè)連入和連出連接。
These TCP connections are kept alive all the time and are not created on demand. When a node expects a pong reply in response to a ping in the cluster bus, before waiting long enough to mark the node as unreachable, it will try to refresh the connection with the node by reconnecting from scratch.
這些TCP連接會(huì)始終保持活躍考榨,并且不會(huì)按需創(chuàng)建跨细。當(dāng)一個(gè)節(jié)點(diǎn)期望收到某個(gè)節(jié)點(diǎn)的pong響應(yīng)時(shí),在等待足夠長的時(shí)間之后河质,它會(huì)嘗試重新刷新與該節(jié)點(diǎn)的連接冀惭,而不是直接將該節(jié)點(diǎn)標(biāo)記為不可訪問。
While Redis Cluster nodes form a full mesh, nodes use a gossip protocol and a configuration update mechanism in order to avoid exchanging too many messages between nodes during normal conditions, so the number of messages exchanged is not exponential.
盡管Redis集群會(huì)組成一個(gè)完全的網(wǎng)格,但是在正常情況下,節(jié)點(diǎn)之間是通過gossip協(xié)議以及某些配置更新機(jī)制馏予,來防止節(jié)點(diǎn)間有過多的消息(這里指兩兩發(fā)送消息),因此(當(dāng)節(jié)點(diǎn)增加時(shí))消息交換并不是指數(shù)級(jí)(增加的)戚丸。
Nodes handshake
Nodes always accept connections on the cluster bus port, and even reply to pings when received, even if the pinging node is not trusted. However, all other packets will be discarded by the receiving node if the sending node is not considered part of the cluster.
節(jié)點(diǎn)總是接受來自集群總線的連接請(qǐng)求(大概是想表達(dá)不支持拜占庭容錯(cuò))。
即使發(fā)送ping消息的節(jié)點(diǎn)不值得信任扔嵌,也依然會(huì)給予回執(zhí)限府。但是夺颤,如果某個(gè)節(jié)點(diǎn)不是集群的一部分,那么它所發(fā)送的其他類型(非ping類型)的包都會(huì)被集群節(jié)點(diǎn)丟棄掉胁勺。
A node will accept another node as part of the cluster only in two ways:
-
If a node presents itself with a
MEET
message. A meet message is exactly like a PING message, but forces the receiver to accept the node as part of the cluster. Nodes will sendMEET
messages to other nodes only if the system administrator requests this via the following command:CLUSTER MEET ip port
A node will also register another node as part of the cluster if a node that is already trusted will gossip about this other node. So if A knows B, and B knows C, eventually B will send gossip messages to A about C. When this happens, A will register C as part of the network, and will try to connect with C.
一個(gè)節(jié)點(diǎn)只會(huì)在以下兩種情況下世澜,接受另一個(gè)節(jié)點(diǎn)作為集群的一部分:
一是,如果一個(gè)節(jié)點(diǎn)給另一個(gè)節(jié)點(diǎn)發(fā)送MEET消息姻几。MEET消息和PING消息類似宜狐,但是強(qiáng)迫接收消息的節(jié)點(diǎn)把發(fā)送消息的節(jié)點(diǎn)作為集群的一部分势告。同時(shí)蛇捌,只有系統(tǒng)管理員通過cluster meet命令才能讓節(jié)點(diǎn)發(fā)送MEET消息。
二是咱台,一個(gè)節(jié)點(diǎn)同樣會(huì)把另一個(gè)節(jié)點(diǎn)注冊(cè)為集群的一部分络拌,如果那個(gè)節(jié)點(diǎn)已經(jīng)被某個(gè)網(wǎng)絡(luò)中的節(jié)點(diǎn)所信任。因此回溺,如果A知道B春贸,B知道C,最后B會(huì)給A發(fā)送關(guān)于C的gossip消息遗遵。這時(shí)萍恕,A會(huì)把C注冊(cè)為網(wǎng)絡(luò)的一部分,并嘗試和C建立連接车要。
This means that as long as we join nodes in any connected graph, they'll eventually form a fully connected graph automatically. This means that the cluster is able to auto-discover other nodes, but only if there is a trusted relationship that was forced by the system administrator.
這意味著允粤,只要我們把節(jié)點(diǎn)以圖的形式相連,他們最終會(huì)自動(dòng)的互相完全連接(指的是借助于gossip協(xié)議翼岁,不需要人為的把所有節(jié)點(diǎn)信息配置起來)类垫。這也意味著,如果有一個(gè)系統(tǒng)管理員手動(dòng)指定的可信關(guān)系(指的是meet命令)琅坡,集群有能力自動(dòng)發(fā)現(xiàn)其他節(jié)點(diǎn)(并把它們加入集群中)悉患。
This mechanism makes the cluster more robust but prevents different Redis clusters from accidentally mixing after change of IP addresses or other network related events.
這一機(jī)制讓集群更加健壯,同時(shí)防止了不同的Redis集群因?yàn)榕既坏腎P地址變更或其他網(wǎng)絡(luò)相關(guān)事件混在在一起榆俺。
Redirection and resharding
MOVED Redirection
A Redis client is free to send queries to every node in the cluster, including slave nodes. The node will analyze the query, and if it is acceptable (that is, only a single key is mentioned in the query, or the multiple keys mentioned are all to the same hash slot) it will lookup what node is responsible for the hash slot where the key or keys belong.
Redis客戶端可以把請(qǐng)求發(fā)送給集群的任意節(jié)點(diǎn)售躁,也包括從節(jié)點(diǎn)。節(jié)點(diǎn)會(huì)分析請(qǐng)求茴晋,如果請(qǐng)求是合法的(只有一個(gè)key或者所有key都屬于同一個(gè)哈希槽)陪捷,就會(huì)判斷應(yīng)該由哪個(gè)節(jié)點(diǎn)來負(fù)責(zé)響應(yīng)請(qǐng)求。
If the hash slot is served by the node, the query is simply processed, otherwise the node will check its internal hash slot to node map, and will reply to the client with a MOVED error, like in the following example:
GET x
-MOVED 3999 127.0.0.1:6381
如果這個(gè)請(qǐng)求所在的哈希槽就是由當(dāng)前節(jié)點(diǎn)處理的晃跺,則它會(huì)處理掉這個(gè)請(qǐng)求揩局,否則它會(huì)檢查自己的slot和node映射關(guān)系表,并且給客戶端一個(gè)含有MOVED錯(cuò)誤的回執(zhí)掀虎。
The error includes the hash slot of the key (3999) and the ip:port of the instance that can serve the query. The client needs to reissue the query to the specified node's IP address and port. Note that even if the client waits a long time before reissuing the query, and in the meantime the cluster configuration changed, the destination node will reply again with a MOVED error if the hash slot 3999 is now served by another node. The same happens if the contacted node had no updated information.
這個(gè)錯(cuò)誤中包含了請(qǐng)求的key的哈希槽(3999)以及可以處理這個(gè)key(負(fù)責(zé)3999哈希槽)的實(shí)例的ip:port凌盯。注意付枫,即使客戶端在重新發(fā)起請(qǐng)求之前,等待了很長的時(shí)間驰怎,與此同時(shí)阐滩,集群的拓?fù)潢P(guān)系發(fā)生了變化,如果3999這個(gè)哈希槽移動(dòng)到了其他節(jié)點(diǎn)县忌,那么目標(biāo)節(jié)點(diǎn)會(huì)再次發(fā)送一個(gè)MOVED錯(cuò)誤掂榔。如果目標(biāo)節(jié)點(diǎn)的集群信息不是最新的,這樣的情況也會(huì)發(fā)生症杏。
譯注:比如認(rèn)為3999應(yīng)該在A節(jié)點(diǎn)装获,實(shí)際上在B節(jié)點(diǎn),則會(huì)給客戶端發(fā)送一個(gè)MOVED錯(cuò)誤厉颤,重定向給A穴豫。當(dāng)A收到請(qǐng)求時(shí),發(fā)現(xiàn)自己無法處理逼友,會(huì)再次重定向給B精肃。
So while from the point of view of the cluster nodes are identified by IDs we try to simplify our interface with the client just exposing a map between hash slots and Redis nodes identified by IP:port pairs.
從集群的視角來看,節(jié)點(diǎn)是通過ID來標(biāo)識(shí)的帜乞,所以我們?cè)囍喕覀兊慕涌谒颈В蛻舳酥恍枰涗浌2酆虸P:端口的映射關(guān)系。
The client is not required to, but should try to memorize that hash slot 3999 is served by 127.0.0.1:6381. This way once a new command needs to be issued it can compute the hash slot of the target key and have a greater chance of choosing the right node.
雖然不是必須的黎烈,但是客戶端應(yīng)該試著記住3999槽位习柠,是由127.0.0.1:6381提供服務(wù)的。用這種方法怨喘,一旦有新的命令津畸,那么就可以計(jì)算出哈希槽,并且(根據(jù)槽和節(jié)點(diǎn)的映射關(guān)系)有較大的概率選擇正確的節(jié)點(diǎn)(避免多余的重定向)必怜。
An alternative is to just refresh the whole client-side cluster layout using the CLUSTER NODES or CLUSTER SLOTS commands when a MOVED redirection is received. When a redirection is encountered, it is likely multiple slots were reconfigured rather than just one, so updating the client configuration as soon as possible is often the best strategy.
一個(gè)可選的方式肉拓,是客戶端收到MOVED重定向錯(cuò)誤時(shí),通過CLUSTER NODES和CLUSTER SLOTS兩個(gè)命令(之一)把整個(gè)映射關(guān)系都刷新為最新的梳庆。因?yàn)榕荆?dāng)重定向發(fā)生時(shí),大概率不止一個(gè)槽位被重分配了膏执,因此最佳策略是盡快更新客戶端的配置驻售。
Note that when the Cluster is stable (no ongoing changes in the configuration), eventually all the clients will obtain a map of hash slots -> nodes, making the cluster efficient, with clients directly addressing the right nodes without redirections, proxies or other single point of failure entities.
值得一提的是,當(dāng)集群處于穩(wěn)定狀態(tài)下(沒有正在槽遷移之類的調(diào)整)更米,最終客戶端會(huì)獲得一個(gè)slot -> node的映射關(guān)系欺栗,這種方式會(huì)使得集群更加高效,因?yàn)榭蛻舳藭?huì)直接跟正確的節(jié)點(diǎn)通信,而不需要重定向或者代理迟几,也沒有單點(diǎn)故障消请。
A client must be also able to handle -ASK redirections that are described later in this document, otherwise it is not a complete Redis Cluster client.
客戶端必須還能夠處理ASK重定向,稍后的章節(jié)會(huì)詳細(xì)描述它类腮。否則臊泰,它不是一個(gè)完整的Redis集群客戶端。
Cluster live reconfiguration
Redis Cluster supports the ability to add and remove nodes while the cluster is running. Adding or removing a node is abstracted into the same operation: moving a hash slot from one node to another. This means that the same basic mechanism can be used in order to rebalance the cluster, add or remove nodes, and so forth.
- To add a new node to the cluster an empty node is added to the cluster and some set of hash slots are moved from existing nodes to the new node.
- To remove a node from the cluster the hash slots assigned to that node are moved to other existing nodes.
- To rebalance the cluster a given set of hash slots are moved between nodes.
Redis集群支持在線添加/刪除節(jié)點(diǎn)的能力蚜枢。新增和刪除一個(gè)節(jié)點(diǎn)缸逃,被統(tǒng)一抽象為在兩個(gè)節(jié)點(diǎn)間進(jìn)行哈希槽的遷移。這也意味著厂抽,可以使用相同的基礎(chǔ)機(jī)制對(duì)群集進(jìn)行再平衡以及添加或刪除節(jié)點(diǎn)等等需频。新增就是給集群添加一個(gè)空的新節(jié)點(diǎn),并把一些哈希槽從舊節(jié)點(diǎn)遷移到新節(jié)點(diǎn)修肠;刪除就是把某個(gè)(要被移除的)節(jié)點(diǎn)上贺辰,把它負(fù)責(zé)的哈希槽重新分配給另一些節(jié)點(diǎn)户盯;再平衡就是在集群節(jié)點(diǎn)間(互相)遷移哈希槽嵌施。
The core of the implementation is the ability to move hash slots around. From a practical point of view a hash slot is just a set of keys, so what Redis Cluster really does during resharding is to move keys from an instance to another instance. Moving a hash slot means moving all the keys that happen to hash into this hash slot.
實(shí)現(xiàn)的核心是能夠移動(dòng)哈希槽。實(shí)際上莽鸭,哈希槽就是一系列key的集合吗伤,因此Redis集群在重分片期間真正做的,是把key從一個(gè)實(shí)例移動(dòng)到另一個(gè)實(shí)例硫眨。移動(dòng)一個(gè)哈希槽足淆,意味著移動(dòng)所有剛好哈希到這個(gè)槽位的key。
To understand how this works we need to show the CLUSTER
subcommands that are used to manipulate the slots translation table in a Redis Cluster node.
The following subcommands are available (among others not useful in this case):
- CLUSTER ADDSLOTS slot1 [slot2] ... [slotN]
- CLUSTER DELSLOTS slot1 [slot2] ... [slotN]
- CLUSTER SETSLOT slot NODE node
- CLUSTER SETSLOT slot MIGRATING node
- CLUSTER SETSLOT slot IMPORTING node
為了理解它怎么工作的礁阁,我們需要展示用來維護(hù)Redis集群節(jié)中的槽轉(zhuǎn)換表的CLUSTER相關(guān)的子命令巧号。以下子命令是可用的。
The first two commands, ADDSLOTS
and DELSLOTS
, are simply used to assign (or remove) slots to a Redis node. Assigning a slot means to tell a given master node that it will be in charge of storing and serving content for the specified hash slot.
最開始的2條子命令姥闭,ADDSLOTS和DELSLOTS是用于把哈希槽分配給某個(gè)節(jié)點(diǎn)的(或者把哈希槽從某個(gè)節(jié)點(diǎn)刪掉)丹鸿。分配一個(gè)哈希槽,意味著告訴那個(gè)節(jié)點(diǎn)棚品,它要負(fù)責(zé)提供和這個(gè)槽相關(guān)的存儲(chǔ)和查詢服務(wù)靠欢。
After the hash slots are assigned they will propagate across the cluster using the gossip protocol, as specified later in the configuration propagation section.
當(dāng)哈希槽被分配好,這個(gè)信息會(huì)通過gossip協(xié)議在集群中傳播铜跑,如后續(xù)的配置傳播章節(jié)說的那樣门怪。
The ADDSLOTS
command is usually used when a new cluster is created from scratch to assign each master node a subset of all the 16384 hash slots available.
ADDSLOTS
命令通常用于當(dāng)一個(gè)集群剛創(chuàng)建時(shí),把16384個(gè)哈希槽分配給各個(gè)主節(jié)點(diǎn)的動(dòng)作锅纺。
The DELSLOTS
is mainly used for manual modification of a cluster configuration or for debugging tasks: in practice it is rarely used.
DELSLOTS
命令主要用于手動(dòng)修改集群的配置或者調(diào)試任務(wù)掷空,在實(shí)踐中很少使用。
The SETSLOT
subcommand is used to assign a slot to a specific node ID if the SETSLOT <slot> NODE
form is used. Otherwise the slot can be set in the two special states MIGRATING
and IMPORTING
. Those two special states are used in order to migrate a hash slot from one node to another.
- When a slot is set as MIGRATING, the node will accept all queries that are about this hash slot, but only if the key in question exists, otherwise the query is forwarded using a
-ASK
redirection to the node that is target of the migration. - When a slot is set as IMPORTING, the node will accept all queries that are about this hash slot, but only if the request is preceded by an
ASKING
command. If theASKING
command was not given by the client, the query is redirected to the real hash slot owner via a-MOVED
redirection error, as would happen normally.
如果采用SETSLOT <slot> NODE這樣形式的命令,它的作用是把一個(gè)槽位分配給一個(gè)特殊的節(jié)點(diǎn)ID坦弟。除此之外疼电,槽位可以被設(shè)置為兩種特殊狀態(tài)MIGRATING
和IMPORTING
。這兩種特殊狀態(tài)是用來從一個(gè)節(jié)點(diǎn)把哈希槽遷移到另一個(gè)節(jié)點(diǎn)的:當(dāng)一個(gè)槽位被置為MIGRATING减拭,這個(gè)節(jié)點(diǎn)將會(huì)(嘗試)接受所有這個(gè)槽位的請(qǐng)求蔽豺,如果key存在則返回?cái)?shù)據(jù),不存在則返回一個(gè)ASK錯(cuò)誤拧粪,把請(qǐng)求重定向給目標(biāo)節(jié)點(diǎn)(數(shù)據(jù)遷入的那個(gè))修陡;當(dāng)一個(gè)槽位被置為IMPORTING,節(jié)點(diǎn)將會(huì)(嘗試)接受所有這個(gè)槽位的請(qǐng)求可霎,前提是先發(fā)送了ASKING命令魄鸦。否則的話,會(huì)通過MOVED錯(cuò)誤癣朗,重定向到哈希槽當(dāng)前真正的節(jié)點(diǎn)拾因。
Let's make this clearer with an example of hash slot migration. Assume that we have two Redis master nodes, called A and B. We want to move hash slot 8 from A to B, so we issue commands like this:
- We send B: CLUSTER SETSLOT 8 IMPORTING A
- We send A: CLUSTER SETSLOT 8 MIGRATING B
讓我們用一個(gè)例子更清晰的闡述一下哈希槽的遷移。假設(shè)我們有兩個(gè)Redis主節(jié)點(diǎn)旷余,A和B绢记。我們想把編號(hào)為8的哈希槽從A遷移到B,則我們:在B節(jié)點(diǎn)上執(zhí)行IMPORTING A正卧,在A節(jié)點(diǎn)上執(zhí)行MIGRATING B蠢熄。
All the other nodes will continue to point clients to node "A" every time they are queried with a key that belongs to hash slot 8, so what happens is that:
- All queries about existing keys are processed by "A".
- All queries about non-existing keys in A are processed by "B", because "A" will redirect clients to "B".
當(dāng)客戶端請(qǐng)求編號(hào)為8的哈希槽時(shí),所有其他的節(jié)點(diǎn)都會(huì)把客戶端指向A節(jié)點(diǎn)炉旷。所有已存在的key都會(huì)由A來處理签孔。所有不存在的key都會(huì)由B來處理,因?yàn)锳會(huì)把客戶端重定向給B窘行。
譯注:這里已存在的應(yīng)該指GET這種讀操作饥追,不存在的指SET這種寫操作。
This way we no longer create new keys in "A". In the meantime, a special program called redis-trib
used during reshardings and Redis Cluster configuration will migrate existing keys in hash slot 8 from A to B. This is performed using the following command:
CLUSTER GETKEYSINSLOT slot count
通過這種方式罐盔,不會(huì)再在A節(jié)點(diǎn)創(chuàng)建新的key但绕。與此同時(shí),一個(gè)名為redis-trib
的特殊程序翘骂,會(huì)在重分配期間被使用壁熄。Redis的集群配置也會(huì)把哈希槽8的歸屬從A設(shè)置為B。通過GETKEYSINSLOT
命令實(shí)現(xiàn)碳竟。
The above command will return count
keys in the specified hash slot. For every key returned, redis-trib
sends node "A" a MIGRATE command, that will migrate the specified key from A to B in an atomic way (both instances are locked for the time (usually very small time) needed to migrate a key so there are no race conditions). This is how MIGRATE works:
以上的命令會(huì)返回某個(gè)槽中的count個(gè)key(count為1就返回1個(gè)key草丧,為2就返回2個(gè)key)。每一個(gè)返回的key莹桅,redis-trib都會(huì)給A節(jié)點(diǎn)發(fā)送一個(gè)MIGRATE命令昌执,這個(gè)命令會(huì)把某個(gè)key從A原子的遷移到B中烛亦。兩個(gè)節(jié)點(diǎn)通常都會(huì)鎖(很小的)一段時(shí)間。這是為了防止某個(gè)key出現(xiàn)競態(tài)條件(這個(gè)破翻譯懂拾,算了約定俗成吧煤禽,其實(shí)指的是至少有1個(gè)寫)。
MIGRATE target_host target_port key target_database id timeout
MIGRATE will connect to the target instance, send a serialized version of the key, and once an OK code is received, the old key from its own dataset will be deleted. From the point of view of an external client a key exists either in A or B at any given time.
MIGRATE會(huì)連接目標(biāo)實(shí)例岖赋,發(fā)送一個(gè)序列化版本的key檬果,一旦收到“OK”的返回,在老節(jié)點(diǎn)上的舊的key就會(huì)被刪除掉唐断。從外部客戶端的視角來看选脊,一個(gè)key在任意時(shí)間,只存在于A或者B上(不會(huì)同時(shí)存在)脸甘。
In Redis Cluster there is no need to specify a database other than 0, but MIGRATE is a general command that can be used for other tasks not involving Redis Cluster. MIGRATE is optimized to be as fast as possible even when moving complex keys such as long lists, but in Redis Cluster reconfiguring the cluster where big keys are present is not considered a wise procedure if there are latency constraints in the application using the database.
在Redis的集群模式下恳啥,不需要指定數(shù)據(jù)庫,只能使用0號(hào)數(shù)據(jù)庫丹诀。但是MIGRATE是一個(gè)通用命令钝的,可以被用作其他非集群模式的任務(wù)。MIGRATE被優(yōu)化為盡可能快的執(zhí)行铆遭,即使是移動(dòng)諸如長列表之類的復(fù)雜鍵時(shí)也一樣硝桩。但是,當(dāng)在Redis集群重分片期間有big key疚脐、應(yīng)用間有網(wǎng)絡(luò)延時(shí)亿柑,則遷移被認(rèn)為是一種不明智的行為。
When the migration process is finally finished, the SETSLOT <slot> NODE <node-id>
command is sent to the two nodes involved in the migration in order to set the slots to their normal state again. The same command is usually sent to all other nodes to avoid waiting for the natural propagation of the new configuration across the cluster.
遷移過程最終完成后棍弄,為了把槽位重新設(shè)置為正常狀態(tài),會(huì)把SETSLOT <slot> NODE <node-id>
命令發(fā)送到遷移中涉及的兩個(gè)節(jié)點(diǎn)上疟游。為了避免配置在集群節(jié)點(diǎn)中(通過Gossip協(xié)議)自然傳播呼畸,同一命令也會(huì)被發(fā)送給所有其他節(jié)點(diǎn)。
ASK redirection
In the previous section we briefly talked about ASK redirection. Why can't we simply use MOVED redirection? Because while MOVED means that we think the hash slot is permanently served by a different node and the next queries should be tried against the specified node, ASK means to send only the next query to the specified node.
在上一節(jié)颁虐,我們簡要的討論了ASK重定向蛮原。為什么我們不能簡單的使用MOVED重定向呢?因?yàn)镸OVED重定向意味著另绩,我們認(rèn)為某個(gè)哈希槽永久的由另一個(gè)節(jié)點(diǎn)提供服務(wù)了儒陨,同時(shí)后續(xù)的所有查詢請(qǐng)求都應(yīng)該發(fā)送給那個(gè)節(jié)點(diǎn)。而ASK代表僅僅下一次查詢需要發(fā)送給那個(gè)節(jié)點(diǎn)笋籽。
This is needed because the next query about hash slot 8 can be about a key that is still in A, so we always want the client to try A and then B if needed. Since this happens only for one hash slot out of 16384 available, the performance hit on the cluster is acceptable.
這樣的行為是必須的蹦漠,因?yàn)橄乱淮文硞€(gè)key的查詢,可能落在在哈希槽8上车海,(在遷移完成前)依然位于A節(jié)點(diǎn)笛园。因此我們總是在必要時(shí)希望客戶端先查詢A節(jié)點(diǎn)再查詢B節(jié)點(diǎn)。由于這是16384個(gè)槽中,僅1個(gè)槽出現(xiàn)遷移的場景研铆,所以集群的性能在一個(gè)可接受的范圍內(nèi)埋同。
We need to force that client behavior, so to make sure that clients will only try node B after A was tried, node B will only accept queries of a slot that is set as IMPORTING if the client sends the ASKING command before sending the query.
我們需要強(qiáng)制規(guī)范客戶端的行為,以保證客戶端只有在從A節(jié)點(diǎn)查詢不到數(shù)據(jù)的情況下棵红,才去B節(jié)點(diǎn)查找凶赁。如果客戶端在發(fā)生查詢請(qǐng)求前,發(fā)送了ASKING指令逆甜,則B節(jié)點(diǎn)將只會(huì)接受被設(shè)為IMPORTING的槽的查詢哟冬。
Basically the ASKING command sets a one-time flag on the client that forces a node to serve a query about an IMPORTING slot.
基本上,ASKING命令在客戶端設(shè)置了一個(gè)一次性的標(biāo)識(shí)忆绰,它用來強(qiáng)制某個(gè)節(jié)點(diǎn)執(zhí)行一次處于IMPORTING狀態(tài)的槽的查詢浩峡。
The full semantics of ASK redirection from the point of view of the client is as follows:
- If ASK redirection is received, send only the query that was redirected to the specified node but continue sending subsequent queries to the old node.
- Start the redirected query with the ASKING command.
- Don't yet update local client tables to map hash slot 8 to B.
從客戶端的視角,ASK重定向的完整語義如下:
- 如果接收到(來自于原節(jié)點(diǎn)的)ASK重定向回執(zhí)错敢,僅僅把下一次查詢請(qǐng)求發(fā)給特定的節(jié)點(diǎn)翰灾,但是后續(xù)的查詢依然發(fā)給原節(jié)點(diǎn)。
- 先發(fā)送 ASKING 命令稚茅,再開始發(fā)送重定向的查詢請(qǐng)求纸淮。
- 不要更新本地客戶端的映射表,即不要把哈希槽8從節(jié)點(diǎn)A映射到節(jié)點(diǎn) B亚享。
Once hash slot 8 migration is completed, A will send a MOVED message and the client may permanently map hash slot 8 to the new IP and port pair. Note that if a buggy client performs the map earlier this is not a problem since it will not send the ASKING command before issuing the query, so B will redirect the client to A using a MOVED redirection error.
一旦哈希槽8遷移完成咽块,A節(jié)點(diǎn)會(huì)發(fā)送一個(gè)MOVED消息,客戶端也許會(huì)永久的把哈希槽8映射到新的IP欺税、端口上侈沪。注意拥诡,即使一個(gè)有BUG的客戶端稍浆,過早地執(zhí)行這個(gè)映射更新常空,也是沒有問題的帘腹,因?yàn)樗粫?huì)在查詢前發(fā)送 ASKING 命令没讲,所以節(jié)點(diǎn)B會(huì)用MOVED重定向錯(cuò)誤把客戶端重定向到節(jié)點(diǎn) A 上盐固。
Slots migration is explained in similar terms but with different wording (for the sake of redundancy in the documentation) in the CLUSTER SETSLOT command documentation.
在CLUSTER SETSLOT的命令文檔中看疗,使用了相似術(shù)語的不同表述(為了冗余)來解釋槽位遷移唆阿。
Clients first connection and handling of redirections
While it is possible to have a Redis Cluster client implementation that does not remember the slots configuration (the map between slot numbers and addresses of nodes serving it) in memory and only works by contacting random nodes waiting to be redirected, such a client would be very inefficient.
盡管一個(gè)Redis集群的客戶端燥筷,可以允許不在內(nèi)存中記錄哈希槽配置信息(槽編號(hào)和為其提供服務(wù)的節(jié)點(diǎn)地址的映射關(guān)系)箩祥,每次都隨機(jī)挑選一個(gè)節(jié)點(diǎn)通信,并且等著被重定向肆氓,但是袍祖,這樣的客戶端是非常低效的。(花費(fèi)的時(shí)間幾乎是正常的一倍)
Redis Cluster clients should try to be smart enough to memorize the slots configuration. However this configuration is not required to be up to date. Since contacting the wrong node will simply result in a redirection, that should trigger an update of the client view.
Redis集群的客戶端應(yīng)該足夠智能到去緩存哈希槽配置做院。但是盲泛,這份配置并不要求一定是最新的濒持。因?yàn)椋绻湾e(cuò)誤的節(jié)點(diǎn)通信寺滚,會(huì)收到一個(gè)重定向錯(cuò)誤柑营,這將會(huì)觸發(fā)客戶端去更新槽配置信息。
Clients usually need to fetch a complete list of slots and mapped node addresses in two different situations:
- At startup in order to populate the initial slots configuration.
- When a
MOVED
redirection is received.
客戶端在以下2類不同的情景下村视,總是需要獲取完整的槽列表和節(jié)點(diǎn)地址間的映射信息:客戶端剛啟動(dòng)時(shí)官套;收到MOVED重定向時(shí)。
Note that a client may handle the MOVED
redirection by updating just the moved slot in its table, however this is usually not efficient since often the configuration of multiple slots is modified at once (for example if a slave is promoted to master, all the slots served by the old master will be remapped). It is much simpler to react to a MOVED
redirection by fetching the full map of slots to nodes from scratch.
客戶端在收到MOVED重定向時(shí)蚁孔,可能僅更新那一個(gè)槽位在映射表中的信息奶赔,但這樣做通常不是很高效,因?yàn)槎鄠€(gè)槽位通常是一起被更新的(舉例來說杠氢,如果某個(gè)從節(jié)點(diǎn)提升為主節(jié)點(diǎn)時(shí)站刑,所有曾經(jīng)在主節(jié)點(diǎn)上的槽位都會(huì)遷移到從節(jié)點(diǎn))。當(dāng)收到MOVED重定向時(shí)鼻百,直接重新抓去全量的槽位和節(jié)點(diǎn)的映射信息绞旅,相對(duì)來說更加簡單。
In order to retrieve the slots configuration Redis Cluster offers an alternative to the CLUSTER NODES command that does not require parsing, and only provides the information strictly needed to clients.
為了獲取槽位配置温艇,Redis集群提供了一個(gè)(類似)CLUSTER NODES的備選命令因悲,這個(gè)命令不需要對(duì)結(jié)果進(jìn)行解析,直接提供了客戶端需要的信息勺爱。
The new command is called CLUSTER SLOTS and provides an array of slots ranges, and the associated master and slave nodes serving the specified range.
這個(gè)新命令就是CLUSTER SLOTS晃琳,它提供了一個(gè)數(shù)組,數(shù)組內(nèi)部記錄了包括槽位的范圍琐鲁,以及這些槽位被分配到的主卫旱、從節(jié)點(diǎn)。
The following is an example of output of CLUSTER SLOTS:
以下是CLUSTER SLOTS命令返回的示例绣否。
127.0.0.1:7000> cluster slots
1) 1) (integer) 5461
2) (integer) 10922
3) 1) "127.0.0.1"
2) (integer) 7001
4) 1) "127.0.0.1"
2) (integer) 7004
2) 1) (integer) 0
2) (integer) 5460
3) 1) "127.0.0.1"
2) (integer) 7000
4) 1) "127.0.0.1"
2) (integer) 7003
3) 1) (integer) 10923
2) (integer) 16383
3) 1) "127.0.0.1"
2) (integer) 7002
4) 1) "127.0.0.1"
2) (integer) 7005
The first two sub-elements of every element of the returned array are the start-end slots of the range. The additional elements represent address-port pairs. The first address-port pair is the master serving the slot, and the additional address-port pairs are all the slaves serving the same slot that are not in an error condition (i.e. the FAIL flag is not set).
最前面的2個(gè)子元素是開始和結(jié)束的哈希槽范圍誊涯。之后的子元素是地址和端口對(duì)。第一對(duì)是主節(jié)點(diǎn)蒜撮,之后的是從節(jié)點(diǎn),且這些從節(jié)點(diǎn)都處于正常提供服務(wù)的狀態(tài)(比如FAIL標(biāo)志位沒有被設(shè)置)跪呈。
For example the first element of the output says that slots from 5461 to 10922 (start and end included) are served by 127.0.0.1:7001, and it is possible to scale read-only load contacting the slave at 127.0.0.1:7004.
舉例來說段磨,示例中返回的第一個(gè)元素表明,從5461到10922(含)都由127.0.0.1:7001提供(讀寫)服務(wù)耗绿,同時(shí)可以由127.0.0.1:7004提供只讀服務(wù)苹支。
CLUSTER SLOTS is not guaranteed to return ranges that cover the full 16384 slots if the cluster is misconfigured, so clients should initialize the slots configuration map filling the target nodes with NULL objects, and report an error if the user tries to execute commands about keys that belong to unassigned slots.
如果集群漏配了某些槽位信息,CLUSTER SLOTS命令不保證返回的槽范圍信息能覆蓋所有的16384個(gè)哈希槽误阻。所以客戶端應(yīng)該在初始化槽的映射關(guān)系時(shí)债蜜,針對(duì)那些沒有配置目標(biāo)節(jié)點(diǎn)的槽晴埂,用NULL來代替目標(biāo)節(jié)點(diǎn)。同時(shí)寻定,當(dāng)用戶想在某個(gè)未分配的槽上執(zhí)行程序儒洛,客戶端應(yīng)該報(bào)錯(cuò)。
Before returning an error to the caller when a slot is found to be unassigned, the client should try to fetch the slots configuration again to check if the cluster is now configured properly.
在上訴情況(發(fā)現(xiàn)某個(gè)槽未分配)下狼速,客戶端應(yīng)該在給用戶報(bào)錯(cuò)前再獲取一次槽配置信息琅锻,以檢查集群當(dāng)前是否已經(jīng)正確配置了。
Multiple keys operations
Using hash tags, clients are free to use multi-key operations. For example the following operation is valid:
MSET {user:1000}.name Angela {user:1000}.surname White
當(dāng)使用hash tag的時(shí)候向胡,客戶端可以使用multi-key操作(反正key會(huì)被哈希到同一個(gè)槽位)恼蓬。比如以下的MSET命令是允許的。
Multi-key operations may become unavailable when a resharding of the hash slot the keys belong to is in progress.
當(dāng)集群正在針對(duì)某個(gè)槽進(jìn)行重分片時(shí)僵芹,針對(duì)這個(gè)槽的multi-key操作是不允許的处硬。
More specifically, even during a resharding the multi-key operations targeting keys that all exist and all still hash to the same slot (either the source or destination node) are still available.
特別的,當(dāng)重分片期間拇派,如果那些key全部都存在于某個(gè)槽荷辕,且依然都會(huì)被哈希到這個(gè)槽時(shí)(無論是源節(jié)點(diǎn)還是目標(biāo)節(jié)點(diǎn)),multi-key操作依然是可用的攀痊。
Operations on keys that don't exist or are - during the resharding - split between the source and destination nodes, will generate a -TRYAGAIN
error. The client can try the operation after some time, or report back the error.
如果key不存在或者在重分片期間被拆分到目標(biāo)和源節(jié)點(diǎn)上時(shí)桐腌,會(huì)產(chǎn)生一個(gè)TRYAGAIN錯(cuò)誤」毒叮客戶端可以在等待一段時(shí)間之后重試或者報(bào)告這個(gè)錯(cuò)誤案站。
As soon as migration of the specified hash slot has terminated, all multi-key operations are available again for that hash slot.
當(dāng)某個(gè)槽的槽遷移完成時(shí),所有針對(duì)這個(gè)槽的multi-key操作會(huì)再一次可用棘街。
Scaling reads using slave nodes
Normally slave nodes will redirect clients to the authoritative master for the hash slot involved in a given command, however clients can use slaves in order to scale reads using the READONLY command.
通常蟆盐,從節(jié)點(diǎn)會(huì)根據(jù)命令中的哈希槽信息,把客戶端重定向到負(fù)責(zé)這個(gè)哈希槽的主節(jié)點(diǎn)上遭殉。但是石挂,客戶端可以通過READONLY命令,來強(qiáng)制從從節(jié)點(diǎn)讀取數(shù)據(jù)险污,以拓展讀操作的性能痹愚。
READONLY tells a Redis Cluster slave node that the client is ok reading possibly stale data and is not interested in running write queries.
READONLY命令告訴Redis集群的從節(jié)點(diǎn),客戶端可以接受讀取到過期的數(shù)據(jù)蛔糯,并且不會(huì)存在寫操作拯腮。
When the connection is in readonly mode, the cluster will send a redirection to the client only if the operation involves keys not served by the slave's master node. This may happen because:
- The client sent a command about hash slots never served by the master of this slave.
- The cluster was reconfigured (for example resharded) and the slave is no longer able to serve commands for a given hash slot.
當(dāng)通過只讀的模式連接,僅當(dāng)某些key不被這個(gè)從節(jié)點(diǎn)的主節(jié)點(diǎn)所支持時(shí)蚁飒,集群才會(huì)發(fā)送一個(gè)重定向錯(cuò)誤給客戶端动壤。這種情況,只會(huì)在以下兩種場景下發(fā)生:一是客戶端發(fā)送了一個(gè)命令淮逻,這個(gè)key所對(duì)應(yīng)的哈希槽琼懊,從來沒有被從節(jié)點(diǎn)所對(duì)應(yīng)的主節(jié)點(diǎn)服務(wù)過阁簸。二是集群正在重新配置(比如說重分片),從節(jié)點(diǎn)不再能針對(duì)某個(gè)命令提供服務(wù)哼丈。
When this happens the client should update its hashslot map as explained in the previous sections.
當(dāng)這樣的情況發(fā)生時(shí)启妹,客戶端應(yīng)該如前面章節(jié)敘述的那樣,更新哈希槽的映射信息削祈。
The readonly state of the connection can be cleared using the READWRITE command.
當(dāng)使用READWRITE命令時(shí)翅溺,連接的只讀狀態(tài)會(huì)被清除。