花了兩天時間整理了之前記錄的Redis單體與哨兵模式的搭建與使用婚温,又補(bǔ)齊了集群模式的使用和搭建經(jīng)驗与殃,并對集群的一些個原理做了理解母赵。
1、安裝Redis
$ wget http://download.redis.io/releases/redis-6.0.3.tar.gz
$ tar -xzf redis-6.0.3.tar.gz
$ cd redis-6.0.3
$ make
$ make install
筆者安裝中遇到的一些問題:
如果make報錯授药,可能是沒裝gcc或者gcc++編輯器士嚎,安裝之 yum -y install gcc gcc-c++ kernel-devel
,有可能還是提示一些個c文件編譯不過悔叽,gcc -v查看下版本莱衩,如果不到5.3那么升級一下gcc:
yum -y install centos-release-scl
yum -y install devtoolset-9-gcc devtoolset-9-gcc-c++ devtoolset-9-binutils
在/etc/profile
追加一行 source /opt/rh/devtoolset-9/enable
scl enable devtoolset-9 bash
重新make clean, make
這回編譯通過了,提示讓你最好make test一下/
執(zhí)行make test 娇澎,如果提示You need tcl 8.5 or newer in order to run the Redis test
那就升級tcl笨蚁, yum install tcl
重新make test,如果還有error就刪了目錄趟庄,重新tar包解壓重新make , make test
\o/ All tests passed without errors!
赚窃,表示編譯成功。
然后make install即可岔激。
2、啟動Redis
直接運(yùn)行命令: ./redis-server /usr/redis-6.0.3/redis.conf &
[root@VM_0_11_centos src]# ./redis-server /usr/redis-6.0.3/redis.conf &
[1] 4588
[root@VM_0_11_centos src]# 4588:C 22 May 2020 19:45:15.179 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
4588:C 22 May 2020 19:45:15.179 # Redis version=6.0.3, bits=64, commit=00000000, modified=0, pid=4588, just started
4588:C 22 May 2020 19:45:15.179 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.3 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 4588
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
4588:M 22 May 2020 19:45:15.180 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
4588:M 22 May 2020 19:45:15.180 # Server initialized
4588:M 22 May 2020 19:45:15.180 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
4588:M 22 May 2020 19:45:15.180 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
4588:M 22 May 2020 19:45:15.180 * Loading RDB produced by version 6.0.3
4588:M 22 May 2020 19:45:15.180 * RDB age 44 seconds
4588:M 22 May 2020 19:45:15.180 * RDB memory usage when created 0.77 Mb
4588:M 22 May 2020 19:45:15.180 * DB loaded from disk: 0.000 seconds
4588:M 22 May 2020 19:45:15.180 * Ready to accept connections
redis.conf
配置文件里bind 0.0.0.0
設(shè)置外部訪問是掰, requirepass xxxx
設(shè)置密碼虑鼎。
3、Redis高可用
redis高可用方案有兩種:
Replication-Sentinel 主從復(fù)制+哨兵
cluster 集群模式
常用搭建方案為1主1從或1主2從+3哨兵監(jiān)控主節(jié)點键痛, 以及3主3從6節(jié)點集群炫彩。
(1)sentinel哨兵
/usr/redis-6.0.3/src/redis-sentinel /usr/redis-6.0.3/sentinel2.conf &
sentinel2.conf配置:
port 26380 #本哨兵的端口
daemonize yes
pidfile "/var/run/redis-sentinel2.pid" #哨兵daemonize模式需要的pid文件
logfile ""
dir "/tmp"
sentinel myid 5736b9ca22cf0899276316e71810566044d75d14
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 122.xx.xxx.xxx 6379 2 #至少2個哨兵投票選舉認(rèn)為master掛了
sentinel auth-pass mymaster xxxxxxx #哨兵連接master的密碼
sentinel down-after-milliseconds mymaster 30000 #30秒無應(yīng)答認(rèn)為master掛了
sentinel failover-timeout mymaster 30000 #如果在該30秒內(nèi)未能完成failover操作,則認(rèn)為該failover失敗
sentinel config-epoch mymaster 0
protected-mode no
user default on nopass ~* +@all
sentinel leader-epoch mymaster 0
sentinel current-epoch 0
坑1:master節(jié)點也會在故障轉(zhuǎn)移后成為從節(jié)點絮短,也需要配置masterauth
當(dāng)kill master進(jìn)程之后江兢,經(jīng)過sentinel選舉,slave成為了新的master丁频,再次啟動原master杉允,提示如下錯誤:
692:S 06 Jun 2020 13:19:35.280 * (Non critical) Master does not understand REPLCONF listening-port: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * (Non critical) Master does not understand REPLCONF capa: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * Partial resynchronization not possible (no cached master)
692:S 06 Jun 2020 13:19:35.280 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * Retrying with SYNC...
692:S 06 Jun 2020 13:19:35.280 # MASTER aborted replication with an error: NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:36.282 * Connecting to MASTER 127.0.0.1:7001
692:S 06 Jun 2020 13:19:36.282 * MASTER <-> REPLICA sync started
692:S 06 Jun 2020 13:19:36.282 * Non blocking connect for SYNC fired the event.
692:S 06 Jun 2020 13:19:36.282 * Master replied to PING, replication can continue...
原因是此時的master再次啟動已經(jīng)是slave了,需要向現(xiàn)在的新master輸入密碼席里,所以需要在master.conf
中配置:
masterauth xxxx #xxxx是在slave.conf中的requirepass xxxx 密碼
坑2:哨兵配置文件要暴露客戶端可以訪問到的master地址
在sentinel.conf
配置文件的sentinel monitor mymaster 122.xx.xxx.xxx 6379 2
中叔磷,配置該哨兵對應(yīng)的master名字、master地址和端口奖磁,以及達(dá)到多少個哨兵選舉通過認(rèn)為master掛掉改基。其中master地址要站在redis訪問者(也就是客戶端)的角度、配置訪問者能訪問的地址咖为,例如sentinel與master在一臺服務(wù)器(122.xx.xxx.xxx)上秕狰,那么相對sentinel其master在本機(jī)也就是127.0.0.1上稠腊,這樣sentinel monitor mymaster 127.0.0.1 6379 2
邏輯上沒有問題,但是如果另外服務(wù)器上的springboot通過lettuce訪問這個redis哨兵鸣哀,則得到的master地址為127.0.0.1架忌,也就是springboot所在服務(wù)器本機(jī),這顯然就有問題了诺舔。
附springboot2.1 redis哨兵配置:
spring.redis.sentinel.master=mymaster
spring.redis.sentinel.nodes=122.xx.xxx.xxx:26379,122.xx.xxx.xxx:26380,122.xx.xxx.xxx:26381
#spring.redis.host=122.xx.xxx.xxx #單機(jī)模式
#spring.redis.port=6379 #單機(jī)模式
spring.redis.timeout=6000
spring.redis.password=xxxxxx
#spring.redis.lettuce.pool.max-active=16 #lettuce底層使用Netty鳖昌,連接共享,一般不需要連接池
#spring.redis.lettuce.pool.max-wait=3000
#spring.redis.letture.pool.max-idle=12
#spring.redis.lettuce.pool.min-idle=4
坑3:要注意配置文件.conf會被哨兵修改
redis-cli -h localhost -p 26379
低飒,可以登到sentinel上用info命令查看一下哨兵的信息许昨。
曾經(jīng)遇到過這樣一個問題,大致的信息如下
master0:name=mymaster,status=down,address=127.0.0.1:7001,slaves=2,sentinels=3
slaves莫名其妙多了一個褥赊,master的地址也明明改了真實對外的地址糕档,這里又變成127.0.0.1 !
最后,把5個redis進(jìn)程都停掉拌喉,逐個檢查配置文件速那,發(fā)現(xiàn)redis的配置文件在主從哨兵模式會被修改,master的配置文件最后邊莫名其妙多了一行replicaof 127.0.0.1 7001尿背, 懷疑應(yīng)該是之前配置錯誤的時候(見坑2)被哨兵動態(tài)加上去的端仰! 總之,實踐中一定要多注意配置文件的變化田藐。
(2)集群
當(dāng)數(shù)據(jù)量大到一定程度荔烧,比如幾十上百G,哨兵模式不夠用了需要做水平拆分汽久,早些年是使用codis鹤竭,twemproxy這些第三方中間件來做分片的,即客戶端 -> 中間件 -> Redis server
這樣的模式景醇,中間件使用一致性Hash算法來確定key在哪個分片上臀稚。后來Redis官方提供了方案,大家就都采用官方的Redis Cluster方案了三痰。
Redis Cluster從邏輯上分16384個hash slot吧寺,分片算法是 CRC16(key) mod 16384
得到key應(yīng)該對應(yīng)哪個slot,據(jù)此判斷這個slot屬于哪個節(jié)點散劫。
每個節(jié)點可以設(shè)置1或多個從節(jié)點撮执,常用的是3主節(jié)點3從節(jié)點的方案。
reshard舷丹,重新分片抒钱,可以指定從哪幾個節(jié)點移動一些hash槽到另一個節(jié)點去。重新分片的過程對客戶端透明,不影響線上業(yè)務(wù)谋币。
搭建Redis cluster
redis.conf文件關(guān)鍵的幾個配置:
port 7001 # 端口仗扬,每個配置文件不同7001-7006
cluster-enabled yes # 啟用集群模式
cluster-config-file nodes-7001.conf #節(jié)點配置文件
cluster-node-timeout 15000 # 超時時間
appendonly yes # 打開aof持久化
daemonize yes # 后臺運(yùn)行
pidfile /var/run/redis_7001.pid # 根據(jù)端口修改
dir /usr/redis-6.0.3/cluster-data/7001 # redis實例數(shù)據(jù)配置存儲位置
啟動6個集群節(jié)點
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7001/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7002/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7003/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7004/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7005/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7006/redis.conf &
[root@VM_0_11_centos redis-6.0.3]# ps -ef|grep redis
root 5508 1 0 21:25 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7001 [cluster]
root 6903 1 0 21:32 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7002 [cluster]
root 6939 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7003 [cluster]
root 6966 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7004 [cluster]
root 6993 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7005 [cluster]
root 7015 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7006 [cluster]
這時候這6個節(jié)點還是獨(dú)立的,要把他們配置成集群:
redis-cli -a xxxx --cluster create --cluster-replicas 1 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 127.0.0.1:7006
說明: -a xxxx
是因為筆者在redis.conf中配置了requirepass xxxx密碼蕾额,然后--cluster-replicas 1
中的1表示每個master節(jié)點有1個從節(jié)點早芭。
上述命令執(zhí)行完以后會有一個詢問:Can I set the above configuration?
yes同意自動做好的分片即可。
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 127.0.0.1:7005 to 127.0.0.1:7001
Adding replica 127.0.0.1:7006 to 127.0.0.1:7002
Adding replica 127.0.0.1:7004 to 127.0.0.1:7003
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: 91143d1715cb3d5234f1ab67559b621ec51475c9 127.0.0.1:7001
slots:[0-5460] (5461 slots) master
M: 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494 127.0.0.1:7002
slots:[5461-10922] (5462 slots) master
M: 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf 127.0.0.1:7003
slots:[10923-16383] (5461 slots) master
S: d8df06b10d75613328b30f38d458b0ca094fc997 127.0.0.1:7004
replicates 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494
S: 45f8a0253fc5653fdab27dc4a0455e7a92dae88d 127.0.0.1:7005
replicates 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf
S: 80350bc7927c943ffd4afdc014448be5407c258b 127.0.0.1:7006
replicates 91143d1715cb3d5234f1ab67559b621ec51475c9
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.......
>>> Performing Cluster Check (using node 127.0.0.1:7001)
M: 91143d1715cb3d5234f1ab67559b621ec51475c9 127.0.0.1:7001
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf 127.0.0.1:7003
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: d8df06b10d75613328b30f38d458b0ca094fc997 127.0.0.1:7004
slots: (0 slots) slave
replicates 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494
S: 45f8a0253fc5653fdab27dc4a0455e7a92dae88d 127.0.0.1:7005
slots: (0 slots) slave
replicates 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf
M: 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494 127.0.0.1:7002
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 80350bc7927c943ffd4afdc014448be5407c258b 127.0.0.1:7006
slots: (0 slots) slave
replicates 91143d1715cb3d5234f1ab67559b621ec51475c9
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
最后All 16384 slots covered.
表示集群中16384個slot中的每一個都有至少有1個master節(jié)點在處理诅蝶,集群啟動成功退个。
查看集群狀態(tài):
[root@VM_0_11_centos redis-6.0.3]# redis-cli -c -p 7001
127.0.0.1:7001> auth xxxx
OK
127.0.0.1:7001> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:859
cluster_stats_messages_pong_sent:893
cluster_stats_messages_sent:1752
cluster_stats_messages_ping_received:888
cluster_stats_messages_pong_received:859
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:1752
坑1:暴露給客戶端的節(jié)點地址不對
使用lettuce連接發(fā)現(xiàn)連不上,查看日志Connection refused: no further information: /127.0.0.1:7002
调炬,跟之前哨兵配置文件sentinel.conf里邊配置master地址犯的錯誤一樣语盈,集群啟動的時候帶的地址應(yīng)該是提供給客戶端訪問的地址。
我們要重建集群:先把6個redis進(jìn)程停掉缰泡,然后刪除nodes-7001.conf
這些節(jié)點配置文件刀荒,刪除持久化文件dump.rdb
、appendonly.aof
棘钞,重新啟動6個進(jìn)程缠借,在重新建立集群:
redis-cli -a xxpwdxx --cluster create --cluster-replicas 1 122.xx.xxx.xxx:7001 122.xx.xxx.xxx:7002 122.xx.xxx.xxx:7003 122.xx.xxx.xxx:7004 122.xx.xxx.xxx:7005 122.xx.xxx.xxx:7006
然后,還是連不上宜猜,這次報錯connection timed out: /172.xx.0.xx:7004
泼返,發(fā)現(xiàn)連到企鵝云服務(wù)器的內(nèi)網(wǎng)地址上了!
解決辦法姨拥,修改每個節(jié)點的redis.conf配置文件绅喉,找到如下說明:
# In certain deployments, Redis Cluster nodes address discovery fails, because
# addresses are NAT-ted or because ports are forwarded (the typical case is
# Docker and other containers).
#
# In order to make Redis Cluster working in such environments, a static
# configuration where each node knows its public address is needed. The
# following two options are used for this scope, and are:
#
# * cluster-announce-ip
# * cluster-announce-port
# * cluster-announce-bus-port
所以增加配置:
cluster-announce-ip 122.xx.xxx.xxx
cluster-announce-port 7001
cluster-announce-bus-port 17001
然后再重新構(gòu)建集群,停進(jìn)程垫毙、改配置、刪除節(jié)點文件和持久化文件拱绑、啟動進(jìn)程综芥、配置集群。猎拨。膀藐。再來一套(累死了)
重新使用Lettuce測試,這次終于連上了红省!
坑2:Lettuce客戶端在master節(jié)點故障時沒有自動切換到從節(jié)點
name這個key在7002上额各,kill這個進(jìn)程模擬master下線,然后Lettuce一直重連吧恃。我們期望的是應(yīng)該能自動切換到其slave 7006上去虾啦,如下圖:
重新啟動7002進(jìn)程,
127.0.0.1:7001> cluster nodes
4b66a7d28ebbfa149f7f2ad1dd1d3cbbc1e79659 122.51.112.187:7003@17003 master - 0 1638243049258 3 connected 10923-16383
16a3da4143ee873b9ed82d217db9819c8d945d30 122.51.112.187:7005@17005 slave bfdb90a0b0e3217fad5e5eb44ec253531930a418 0 1638243052264 5 connected
110d047d5b6c827c018dbebf83d9db350f12b931 122.51.112.187:7004@17004 slave 4b66a7d28ebbfa149f7f2ad1dd1d3cbbc1e79659 0 1638243051000 4 connected
e4406cfeb0e6944bfd6c5af82ba5f4f1ab38190d 122.51.112.187:7002@17002 slave c7cd30d3843f9f4b113614672cabce193b1bc7b9 0 1638243054267 7 connected
c7cd30d3843f9f4b113614672cabce193b1bc7b9 122.51.112.187:7006@17006 master - 0 1638243053266 7 connected 5461-10922
bfdb90a0b0e3217fad5e5eb44ec253531930a418 122.51.112.187:7001@17001 myself,master - 0 1638243052000 1 connected 0-5460
7006已成為新master,7002成為它的slave傲醉,然后Lettuce也能連接上了蝇闭。
解決辦法,修改Lettuce的配置:
import java.time.Duration;
import java.util.Arrays;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.connection.RedisClusterConfiguration;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.connection.lettuce.LettuceClientConfiguration;
import org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.serializer.RedisSerializer;
import org.springframework.data.redis.serializer.StringRedisSerializer;
import io.lettuce.core.cluster.ClusterClientOptions;
import io.lettuce.core.cluster.ClusterTopologyRefreshOptions;
@Configuration
public class RedisConfig {
@Value("${spring.redis.cluster.nodes:nocluster}")
private String clusterNodes;
@Value("${spring.redis.password:123456}")
private String password;
@Bean
public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {
RedisTemplate<String, Object> redisTemplate = new RedisTemplate<>();
redisTemplate.setConnectionFactory(factory);
RedisSerializer<String> stringRedisSerializer = new StringRedisSerializer();
redisTemplate.setKeySerializer(stringRedisSerializer);
redisTemplate.setValueSerializer(stringRedisSerializer);
return redisTemplate;
}
/**
* 如果是Lettuce集群模式則重新構(gòu)建RedisConnectionFactory并注入Spring
* */
@Bean
@ConditionalOnProperty(prefix="spring.redis.cluster" , name="nodes")
public RedisConnectionFactory redisConnectionFactory() {
ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
.enableAllAdaptiveRefreshTriggers() // 開啟自適應(yīng)刷新,自適應(yīng)刷新不開啟,Redis集群變更時將會導(dǎo)致連接異常
.adaptiveRefreshTriggersTimeout(Duration.ofSeconds(30)) //自適應(yīng)刷新超時時間(默認(rèn)30秒)硬毕,默認(rèn)關(guān)閉開啟后時間為30秒
.enablePeriodicRefresh(Duration.ofSeconds(60)) // 默認(rèn)關(guān)閉開啟后時間為60秒
.build();
ClusterClientOptions clientOptions = ClusterClientOptions.builder()
.topologyRefreshOptions(clusterTopologyRefreshOptions)
.build();
LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
.clientOptions(clientOptions)
.build();
String[] nodes = clusterNodes.split(",");
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(nodes));
redisClusterConfiguration.setPassword(password);
return new LettuceConnectionFactory(redisClusterConfiguration , clientConfig);
}
}
筆者用的是springboot 2.1 spring-boot-starter-data-redis 默認(rèn)的Lettuce客戶端呻引,當(dāng)使用Redis cluster集群模式時,需要配置一下RedisConnectionFactory
開啟自適應(yīng)刷新來做故障轉(zhuǎn)移時的自動切換從節(jié)點進(jìn)行連接吐咳。
重新測試:停掉master 7006逻悠,這次Lettuce可以正常切換連到7002slave上去了。(仍然會不斷的在日志里報連接錯誤,因為需要一直嘗試重連7006佳窑,但因為有7002從節(jié)點頂上了纫溃、所以應(yīng)用是可以正常使用的)
Redis不保證數(shù)據(jù)的強(qiáng)一致性
Redis并不保證數(shù)據(jù)的強(qiáng)一致性,也就是取CAP定理中的AP
- 主從節(jié)點之間使用的是異步復(fù)制 惠啄, 為了保證高性能,Redis主從同步使用的是異步復(fù)制的方式任内,主節(jié)點不會等從節(jié)點同步成功而是馬上就返回客戶端撵渡。這樣當(dāng)返回客戶端后還沒來得及同步從節(jié)點成功、這時候如果主節(jié)點掛了死嗦,那么就會發(fā)生數(shù)據(jù)丟失趋距。
- 發(fā)生了網(wǎng)絡(luò)分區(qū)時的一種情況,整個集群分為了隔離的兩個大小分區(qū)越除,小分區(qū)中有某個主節(jié)點A0节腐,大分區(qū)中會從A的從節(jié)點中選出新的主節(jié)點A1,但此時A0并未掛掉還是正常能接受客戶端(這個客戶端也在小分區(qū)里)請求的摘盆,這樣分區(qū)故障的這段時間針對A分片就有兩個主節(jié)點了翼雀、這就是所謂的“腦裂”現(xiàn)象。等網(wǎng)絡(luò)分區(qū)故障恢復(fù)之后孩擂,A0會稱為A1的從節(jié)點狼渊、清空自己的數(shù)據(jù)重新從A1上同步數(shù)據(jù)。這樣在小分區(qū)時代從客戶端寫入的數(shù)據(jù)就丟失了类垦。
Redis集群核心原理
1狈邑、Redis cluster沒采用一致性Hash算法,添加集群節(jié)點或刪除節(jié)點需要手工維護(hù)slot遷移蚤认,然后怎么做到熱遷移對線上業(yè)務(wù)無影響的米苹?
關(guān)于一致性Hash算法,可以參考一致性Hash算法 - 簡書 (jianshu.com)
Redis cluster使用的是hash slot算法砰琢,跟一致性Hash算法不太一樣蘸嘶,固定16384個hash槽良瞧,然后計算key落在哪個slot里邊(計算key的CRC16值再對16384取模),key找的是slot而不是節(jié)點亏较,而slot與節(jié)點的對應(yīng)關(guān)系可以通過reshard改變并通過gossip協(xié)議擴(kuò)散到集群中的每一個節(jié)點莺褒、進(jìn)而可以為客戶端獲知,這樣key的節(jié)點尋址就跟具體的節(jié)點個數(shù)沒關(guān)系了雪情。也同樣解決了普通hash取模算法當(dāng)節(jié)點個數(shù)發(fā)生變化時遵岩,大量key對應(yīng)的尋址都發(fā)生改動導(dǎo)致緩存失效的問題。
比如集群增加了1個節(jié)點巡通,這時候如果不做任何操作尘执,那么新增加的這個節(jié)點上是沒有slot的,所有slot都在原來的節(jié)點上且對應(yīng)關(guān)系不變宴凉、所以沒有因為節(jié)點個數(shù)變動而緩存失效誊锭,當(dāng)reshard一部分slot到新節(jié)點后,客戶端獲取到新遷移的這部分slot與新節(jié)點的對應(yīng)關(guān)系弥锄、尋址到新節(jié)點丧靡,而沒遷移的slot仍然尋址到原來的節(jié)點。
關(guān)于熱遷移籽暇,猜想温治,內(nèi)部應(yīng)該是先做復(fù)制遷移,等遷移完了戒悠,再切換slot與節(jié)點的對應(yīng)關(guān)系熬荆,復(fù)制沒有完成之前仍按照原來的slot與節(jié)點對應(yīng)關(guān)系去原節(jié)點訪問。復(fù)制結(jié)束之后绸狐,再刪除原節(jié)點上已經(jīng)遷移的slot所對應(yīng)的key卤恳。
2、當(dāng)主節(jié)點出現(xiàn)故障時寒矿,集群是如何識別和如何選主的突琳?
與哨兵模式比較類似,當(dāng)1個節(jié)點發(fā)現(xiàn)某個master節(jié)點故障了符相、會對這個故障節(jié)點進(jìn)行pfail主觀宕機(jī)拆融,然后會通過gossip協(xié)議通知到集群中的其他節(jié)點、其他節(jié)點也執(zhí)行判斷pfail并gossip擴(kuò)散廣播這一過程主巍,當(dāng)超過半數(shù)節(jié)點pfail時那么故障節(jié)點就是fail客觀宕機(jī)冠息。接下來所有的master節(jié)點會在故障節(jié)點的從節(jié)點中選出一個新的主節(jié)點挪凑,此時所有的master節(jié)點中超過半數(shù)的都投票選舉了故障節(jié)點的某個從節(jié)點孕索,那么這個從節(jié)點當(dāng)選新的master節(jié)點。
3躏碳、去中心化設(shè)計與gossip協(xié)議
所有節(jié)點都持有元數(shù)據(jù)搞旭,節(jié)點之間通過gossip這種二進(jìn)制協(xié)議進(jìn)行通信、發(fā)送自己的元數(shù)據(jù)信息給其他節(jié)點、故障檢測肄渗、集群配置更新镇眷、故障轉(zhuǎn)移授權(quán)等等。
這種去中心化的分布式節(jié)點之間內(nèi)部協(xié)調(diào)翎嫡,包括故障識別欠动、故障轉(zhuǎn)移、選主等等惑申,核心在于gossip擴(kuò)散協(xié)議具伍,能夠支撐這樣的廣播協(xié)議在于所有的節(jié)點都持有一份完整的集群元數(shù)據(jù),即所有的節(jié)點都知悉當(dāng)前集群全局的情況圈驼。
參考:
面試題:Redis 集群模式的工作原理能說一下么 - 云+社區(qū) - 騰訊云 (tencent.com)
深度圖解Redis Cluster原理 - detectiveHLH - 博客園 (cnblogs.com)