Redis分布式緩存搭建

花了兩天時間整理了之前記錄的Redis單體與哨兵模式的搭建與使用婚温，又補(bǔ)齊了集群模式的使用和搭建經(jīng)驗与殃，并對集群的一些個原理做了理解母赵。

1、安裝Redis

$ wget http://download.redis.io/releases/redis-6.0.3.tar.gz 
$ tar -xzf redis-6.0.3.tar.gz 
$ cd redis-6.0.3 
$ make
$ make install

筆者安裝中遇到的一些問題：

如果make報錯授药，可能是沒裝gcc或者gcc++編輯器士嚎，安裝之 yum -y install gcc gcc-c++ kernel-devel，有可能還是提示一些個c文件編譯不過悔叽，gcc -v查看下版本莱衩，如果不到5.3那么升級一下gcc：

yum -y install centos-release-scl 
yum -y install devtoolset-9-gcc devtoolset-9-gcc-c++ devtoolset-9-binutils

在/etc/profile追加一行 source /opt/rh/devtoolset-9/enable

scl enable devtoolset-9 bash

重新make clean, make

這回編譯通過了，提示讓你最好make test一下/

執(zhí)行make test 娇澎，如果提示You need tcl 8.5 or newer in order to run the Redis test

那就升級tcl笨蚁， yum install tcl

重新make test，如果還有error就刪了目錄趟庄，重新tar包解壓重新make , make test

\o/ All tests passed without errors!赚窃，表示編譯成功。

然后make install即可岔激。

2、啟動Redis

直接運(yùn)行命令： ./redis-server /usr/redis-6.0.3/redis.conf &

[root@VM_0_11_centos src]# ./redis-server /usr/redis-6.0.3/redis.conf &
[1] 4588
[root@VM_0_11_centos src]# 4588:C 22 May 2020 19:45:15.179 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
4588:C 22 May 2020 19:45:15.179 # Redis version=6.0.3, bits=64, commit=00000000, modified=0, pid=4588, just started
4588:C 22 May 2020 19:45:15.179 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.3 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 4588
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

4588:M 22 May 2020 19:45:15.180 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
4588:M 22 May 2020 19:45:15.180 # Server initialized
4588:M 22 May 2020 19:45:15.180 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
4588:M 22 May 2020 19:45:15.180 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
4588:M 22 May 2020 19:45:15.180 * Loading RDB produced by version 6.0.3
4588:M 22 May 2020 19:45:15.180 * RDB age 44 seconds
4588:M 22 May 2020 19:45:15.180 * RDB memory usage when created 0.77 Mb
4588:M 22 May 2020 19:45:15.180 * DB loaded from disk: 0.000 seconds
4588:M 22 May 2020 19:45:15.180 * Ready to accept connections

redis.conf配置文件里bind 0.0.0.0設(shè)置外部訪問是掰， requirepass xxxx 設(shè)置密碼虑鼎。

3、Redis高可用

redis高可用方案有兩種：

Replication-Sentinel 主從復(fù)制+哨兵
cluster 集群模式

常用搭建方案為1主1從或1主2從+3哨兵監(jiān)控主節(jié)點键痛，以及3主3從6節(jié)點集群炫彩。

（1）sentinel哨兵

/usr/redis-6.0.3/src/redis-sentinel /usr/redis-6.0.3/sentinel2.conf &

sentinel2.conf配置：

port 26380   #本哨兵的端口

daemonize yes

pidfile "/var/run/redis-sentinel2.pid"  #哨兵daemonize模式需要的pid文件

logfile ""

dir "/tmp"

sentinel myid 5736b9ca22cf0899276316e71810566044d75d14

sentinel deny-scripts-reconfig yes

sentinel monitor mymaster 122.xx.xxx.xxx 6379 2 #至少2個哨兵投票選舉認(rèn)為master掛了

sentinel auth-pass mymaster xxxxxxx #哨兵連接master的密碼

sentinel down-after-milliseconds mymaster 30000 #30秒無應(yīng)答認(rèn)為master掛了

sentinel failover-timeout mymaster 30000 #如果在該30秒內(nèi)未能完成failover操作，則認(rèn)為該failover失敗

sentinel config-epoch mymaster 0

protected-mode no

user default on nopass ~* +@all

sentinel leader-epoch mymaster 0

sentinel current-epoch 0

坑1：master節(jié)點也會在故障轉(zhuǎn)移后成為從節(jié)點絮短，也需要配置masterauth

當(dāng)kill master進(jìn)程之后江兢，經(jīng)過sentinel選舉，slave成為了新的master丁频，再次啟動原master杉允，提示如下錯誤：

692:S 06 Jun 2020 13:19:35.280 * (Non critical) Master does not understand REPLCONF listening-port: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * (Non critical) Master does not understand REPLCONF capa: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * Partial resynchronization not possible (no cached master)
692:S 06 Jun 2020 13:19:35.280 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:35.280 * Retrying with SYNC...
692:S 06 Jun 2020 13:19:35.280 # MASTER aborted replication with an error: NOAUTH Authentication required.
692:S 06 Jun 2020 13:19:36.282 * Connecting to MASTER 127.0.0.1:7001
692:S 06 Jun 2020 13:19:36.282 * MASTER <-> REPLICA sync started
692:S 06 Jun 2020 13:19:36.282 * Non blocking connect for SYNC fired the event.
692:S 06 Jun 2020 13:19:36.282 * Master replied to PING, replication can continue...

原因是此時的master再次啟動已經(jīng)是slave了，需要向現(xiàn)在的新master輸入密碼席里，所以需要在master.conf
中配置：

masterauth xxxx   #xxxx是在slave.conf中的requirepass xxxx 密碼

坑2：哨兵配置文件要暴露客戶端可以訪問到的master地址

在sentinel.conf配置文件的sentinel monitor mymaster 122.xx.xxx.xxx 6379 2 中叔磷，配置該哨兵對應(yīng)的master名字、master地址和端口奖磁，以及達(dá)到多少個哨兵選舉通過認(rèn)為master掛掉改基。其中master地址要站在redis訪問者（也就是客戶端）的角度、配置訪問者能訪問的地址咖为，例如sentinel與master在一臺服務(wù)器（122.xx.xxx.xxx）上秕狰，那么相對sentinel其master在本機(jī)也就是127.0.0.1上稠腊，這樣sentinel monitor mymaster 127.0.0.1 6379 2邏輯上沒有問題，但是如果另外服務(wù)器上的springboot通過lettuce訪問這個redis哨兵鸣哀，則得到的master地址為127.0.0.1架忌，也就是springboot所在服務(wù)器本機(jī)，這顯然就有問題了诺舔。

附springboot2.1 redis哨兵配置：

spring.redis.sentinel.master=mymaster
spring.redis.sentinel.nodes=122.xx.xxx.xxx:26379,122.xx.xxx.xxx:26380,122.xx.xxx.xxx:26381
#spring.redis.host=122.xx.xxx.xxx  #單機(jī)模式
#spring.redis.port=6379     #單機(jī)模式
spring.redis.timeout=6000
spring.redis.password=xxxxxx
#spring.redis.lettuce.pool.max-active=16 #lettuce底層使用Netty鳖昌，連接共享，一般不需要連接池
#spring.redis.lettuce.pool.max-wait=3000
#spring.redis.letture.pool.max-idle=12
#spring.redis.lettuce.pool.min-idle=4

坑3：要注意配置文件.conf會被哨兵修改

redis-cli -h localhost -p 26379 低飒，可以登到sentinel上用info命令查看一下哨兵的信息许昨。

曾經(jīng)遇到過這樣一個問題，大致的信息如下

master0:name=mymaster,status=down,address=127.0.0.1:7001,slaves=2,sentinels=3

slaves莫名其妙多了一個褥赊，master的地址也明明改了真實對外的地址糕档，這里又變成127.0.0.1 !
最后，把5個redis進(jìn)程都停掉拌喉，逐個檢查配置文件速那，發(fā)現(xiàn)redis的配置文件在主從哨兵模式會被修改，master的配置文件最后邊莫名其妙多了一行replicaof 127.0.0.1 7001尿背，懷疑應(yīng)該是之前配置錯誤的時候（見坑2）被哨兵動態(tài)加上去的端仰！總之，實踐中一定要多注意配置文件的變化田藐。

（2）集群

當(dāng)數(shù)據(jù)量大到一定程度荔烧，比如幾十上百G，哨兵模式不夠用了需要做水平拆分汽久，早些年是使用codis鹤竭，twemproxy這些第三方中間件來做分片的，即客戶端 -> 中間件 -> Redis server這樣的模式景醇，中間件使用一致性Hash算法來確定key在哪個分片上臀稚。后來Redis官方提供了方案，大家就都采用官方的Redis Cluster方案了三痰。

Redis Cluster從邏輯上分16384個hash slot吧寺，分片算法是 CRC16(key) mod 16384 得到key應(yīng)該對應(yīng)哪個slot，據(jù)此判斷這個slot屬于哪個節(jié)點散劫。

每個節(jié)點可以設(shè)置1或多個從節(jié)點撮执，常用的是3主節(jié)點3從節(jié)點的方案。

reshard舷丹，重新分片抒钱，可以指定從哪幾個節(jié)點移動一些hash槽到另一個節(jié)點去。重新分片的過程對客戶端透明，不影響線上業(yè)務(wù)谋币。

搭建Redis cluster

redis.conf文件關(guān)鍵的幾個配置：

port 7001  # 端口仗扬，每個配置文件不同7001-7006
cluster-enabled yes # 啟用集群模式
cluster-config-file nodes-7001.conf #節(jié)點配置文件
cluster-node-timeout 15000 # 超時時間
appendonly yes # 打開aof持久化
daemonize yes # 后臺運(yùn)行
pidfile  /var/run/redis_7001.pid # 根據(jù)端口修改
dir /usr/redis-6.0.3/cluster-data/7001 # redis實例數(shù)據(jù)配置存儲位置

啟動6個集群節(jié)點

/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7001/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7002/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7003/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7004/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7005/redis.conf &
/usr/redis-6.0.3/src/redis-server /usr/redis-6.0.3/cluster-conf/7006/redis.conf &

[root@VM_0_11_centos redis-6.0.3]# ps -ef|grep redis
root 5508 1 0 21:25 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7001 [cluster]
root 6903 1 0 21:32 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7002 [cluster]
root 6939 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7003 [cluster]
root 6966 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7004 [cluster]
root 6993 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7005 [cluster]
root 7015 1 0 21:33 ? 00:00:00 /usr/redis-6.0.3/src/redis-server 0.0.0.0:7006 [cluster]

這時候這6個節(jié)點還是獨(dú)立的，要把他們配置成集群：

redis-cli -a xxxx --cluster create --cluster-replicas 1 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 127.0.0.1:7006

說明： -a xxxx是因為筆者在redis.conf中配置了requirepass xxxx密碼蕾额，然后--cluster-replicas 1中的1表示每個master節(jié)點有1個從節(jié)點早芭。

上述命令執(zhí)行完以后會有一個詢問：Can I set the above configuration? yes同意自動做好的分片即可。

>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 127.0.0.1:7005 to 127.0.0.1:7001
Adding replica 127.0.0.1:7006 to 127.0.0.1:7002
Adding replica 127.0.0.1:7004 to 127.0.0.1:7003
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: 91143d1715cb3d5234f1ab67559b621ec51475c9 127.0.0.1:7001
   slots:[0-5460] (5461 slots) master
M: 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494 127.0.0.1:7002
   slots:[5461-10922] (5462 slots) master
M: 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf 127.0.0.1:7003
   slots:[10923-16383] (5461 slots) master
S: d8df06b10d75613328b30f38d458b0ca094fc997 127.0.0.1:7004
   replicates 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494
S: 45f8a0253fc5653fdab27dc4a0455e7a92dae88d 127.0.0.1:7005
   replicates 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf
S: 80350bc7927c943ffd4afdc014448be5407c258b 127.0.0.1:7006
   replicates 91143d1715cb3d5234f1ab67559b621ec51475c9
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.......
>>> Performing Cluster Check (using node 127.0.0.1:7001)
M: 91143d1715cb3d5234f1ab67559b621ec51475c9 127.0.0.1:7001
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf 127.0.0.1:7003
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: d8df06b10d75613328b30f38d458b0ca094fc997 127.0.0.1:7004
   slots: (0 slots) slave
   replicates 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494
S: 45f8a0253fc5653fdab27dc4a0455e7a92dae88d 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 3adde17eca1e7e7baf03fd2220565c0f2b6c7bbf
M: 04acd6b82ec44ff9ff738a1dd0c77a2efc29f494 127.0.0.1:7002
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 80350bc7927c943ffd4afdc014448be5407c258b 127.0.0.1:7006
   slots: (0 slots) slave
   replicates 91143d1715cb3d5234f1ab67559b621ec51475c9
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

最后All 16384 slots covered.表示集群中16384個slot中的每一個都有至少有1個master節(jié)點在處理诅蝶，集群啟動成功退个。

查看集群狀態(tài)：

[root@VM_0_11_centos redis-6.0.3]# redis-cli -c -p 7001
127.0.0.1:7001> auth xxxx
OK
127.0.0.1:7001> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:859
cluster_stats_messages_pong_sent:893
cluster_stats_messages_sent:1752
cluster_stats_messages_ping_received:888
cluster_stats_messages_pong_received:859
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:1752

坑1：暴露給客戶端的節(jié)點地址不對

使用lettuce連接發(fā)現(xiàn)連不上，查看日志Connection refused: no further information: /127.0.0.1:7002调炬，跟之前哨兵配置文件sentinel.conf里邊配置master地址犯的錯誤一樣语盈，集群啟動的時候帶的地址應(yīng)該是提供給客戶端訪問的地址。

我們要重建集群：先把6個redis進(jìn)程停掉缰泡，然后刪除nodes-7001.conf這些節(jié)點配置文件刀荒，刪除持久化文件dump.rdb、appendonly.aof棘钞，重新啟動6個進(jìn)程缠借，在重新建立集群：

redis-cli -a xxpwdxx --cluster create --cluster-replicas 1 122.xx.xxx.xxx:7001 122.xx.xxx.xxx:7002 122.xx.xxx.xxx:7003 122.xx.xxx.xxx:7004 122.xx.xxx.xxx:7005 122.xx.xxx.xxx:7006

然后，還是連不上宜猜，這次報錯connection timed out: /172.xx.0.xx:7004泼返，發(fā)現(xiàn)連到企鵝云服務(wù)器的內(nèi)網(wǎng)地址上了！

解決辦法姨拥，修改每個節(jié)點的redis.conf配置文件绅喉，找到如下說明：

# In certain deployments, Redis Cluster nodes address discovery fails, because
# addresses are NAT-ted or because ports are forwarded (the typical case is
# Docker and other containers).
#
# In order to make Redis Cluster working in such environments, a static
# configuration where each node knows its public address is needed. The
# following two options are used for this scope, and are:
#
# * cluster-announce-ip
# * cluster-announce-port
# * cluster-announce-bus-port

所以增加配置：

cluster-announce-ip 122.xx.xxx.xxx
cluster-announce-port 7001
cluster-announce-bus-port 17001

然后再重新構(gòu)建集群，停進(jìn)程垫毙、改配置、刪除節(jié)點文件和持久化文件拱绑、啟動進(jìn)程综芥、配置集群。猎拨。膀藐。再來一套（累死了）

重新使用Lettuce測試，這次終于連上了红省！

坑2：Lettuce客戶端在master節(jié)點故障時沒有自動切換到從節(jié)點

name這個key在7002上额各，kill這個進(jìn)程模擬master下線，然后Lettuce一直重連吧恃。我們期望的是應(yīng)該能自動切換到其slave 7006上去虾啦，如下圖：

重新啟動7002進(jìn)程，

127.0.0.1:7001> cluster nodes
4b66a7d28ebbfa149f7f2ad1dd1d3cbbc1e79659 122.51.112.187:7003@17003 master - 0 1638243049258 3 connected 10923-16383
16a3da4143ee873b9ed82d217db9819c8d945d30 122.51.112.187:7005@17005 slave bfdb90a0b0e3217fad5e5eb44ec253531930a418 0 1638243052264 5 connected
110d047d5b6c827c018dbebf83d9db350f12b931 122.51.112.187:7004@17004 slave 4b66a7d28ebbfa149f7f2ad1dd1d3cbbc1e79659 0 1638243051000 4 connected
e4406cfeb0e6944bfd6c5af82ba5f4f1ab38190d 122.51.112.187:7002@17002 slave c7cd30d3843f9f4b113614672cabce193b1bc7b9 0 1638243054267 7 connected
c7cd30d3843f9f4b113614672cabce193b1bc7b9 122.51.112.187:7006@17006 master - 0 1638243053266 7 connected 5461-10922
bfdb90a0b0e3217fad5e5eb44ec253531930a418 122.51.112.187:7001@17001 myself,master - 0 1638243052000 1 connected 0-5460

7006已成為新master，7002成為它的slave傲醉，然后Lettuce也能連接上了蝇闭。
解決辦法，修改Lettuce的配置：

import java.time.Duration;
import java.util.Arrays;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.connection.RedisClusterConfiguration;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.connection.lettuce.LettuceClientConfiguration;
import org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.serializer.RedisSerializer;
import org.springframework.data.redis.serializer.StringRedisSerializer;

import io.lettuce.core.cluster.ClusterClientOptions;
import io.lettuce.core.cluster.ClusterTopologyRefreshOptions;


@Configuration
public class RedisConfig {
    
    @Value("${spring.redis.cluster.nodes:nocluster}")
    private String clusterNodes;
    
    @Value("${spring.redis.password:123456}")
    private String password;

    @Bean
    public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {

        RedisTemplate<String, Object> redisTemplate = new RedisTemplate<>();
        redisTemplate.setConnectionFactory(factory);
        
        RedisSerializer<String> stringRedisSerializer = new StringRedisSerializer();
        redisTemplate.setKeySerializer(stringRedisSerializer);
        redisTemplate.setValueSerializer(stringRedisSerializer);
        return redisTemplate;
    }
    
    /**
     * 如果是Lettuce集群模式則重新構(gòu)建RedisConnectionFactory并注入Spring
     * */
    @Bean
    @ConditionalOnProperty(prefix="spring.redis.cluster" , name="nodes")
    public RedisConnectionFactory redisConnectionFactory() {
        
        ClusterTopologyRefreshOptions clusterTopologyRefreshOptions =  ClusterTopologyRefreshOptions.builder()
                .enableAllAdaptiveRefreshTriggers() // 開啟自適應(yīng)刷新,自適應(yīng)刷新不開啟,Redis集群變更時將會導(dǎo)致連接異常
                .adaptiveRefreshTriggersTimeout(Duration.ofSeconds(30)) //自適應(yīng)刷新超時時間(默認(rèn)30秒)硬毕，默認(rèn)關(guān)閉開啟后時間為30秒
                .enablePeriodicRefresh(Duration.ofSeconds(60))  // 默認(rèn)關(guān)閉開啟后時間為60秒 
                .build();
        ClusterClientOptions clientOptions = ClusterClientOptions.builder()
                .topologyRefreshOptions(clusterTopologyRefreshOptions)
                .build();
        LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
                .clientOptions(clientOptions)
                .build();
        
        String[] nodes = clusterNodes.split(",");
        RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(nodes));
        redisClusterConfiguration.setPassword(password);
        return new LettuceConnectionFactory(redisClusterConfiguration , clientConfig);
    }
}

筆者用的是springboot 2.1 spring-boot-starter-data-redis 默認(rèn)的Lettuce客戶端呻引，當(dāng)使用Redis cluster集群模式時，需要配置一下RedisConnectionFactory開啟自適應(yīng)刷新來做故障轉(zhuǎn)移時的自動切換從節(jié)點進(jìn)行連接吐咳。

重新測試：停掉master 7006逻悠，這次Lettuce可以正常切換連到7002slave上去了。（仍然會不斷的在日志里報連接錯誤，因為需要一直嘗試重連7006佳窑，但因為有7002從節(jié)點頂上了纫溃、所以應(yīng)用是可以正常使用的）

Redis不保證數(shù)據(jù)的強(qiáng)一致性

Redis并不保證數(shù)據(jù)的強(qiáng)一致性，也就是取CAP定理中的AP

主從節(jié)點之間使用的是異步復(fù)制惠啄，為了保證高性能，Redis主從同步使用的是異步復(fù)制的方式任内，主節(jié)點不會等從節(jié)點同步成功而是馬上就返回客戶端撵渡。這樣當(dāng)返回客戶端后還沒來得及同步從節(jié)點成功、這時候如果主節(jié)點掛了死嗦，那么就會發(fā)生數(shù)據(jù)丟失趋距。
發(fā)生了網(wǎng)絡(luò)分區(qū)時的一種情況，整個集群分為了隔離的兩個大小分區(qū)越除，小分區(qū)中有某個主節(jié)點A0节腐，大分區(qū)中會從A的從節(jié)點中選出新的主節(jié)點A1，但此時A0并未掛掉還是正常能接受客戶端（這個客戶端也在小分區(qū)里）請求的摘盆，這樣分區(qū)故障的這段時間針對A分片就有兩個主節(jié)點了翼雀、這就是所謂的“腦裂”現(xiàn)象。等網(wǎng)絡(luò)分區(qū)故障恢復(fù)之后孩擂，A0會稱為A1的從節(jié)點狼渊、清空自己的數(shù)據(jù)重新從A1上同步數(shù)據(jù)。這樣在小分區(qū)時代從客戶端寫入的數(shù)據(jù)就丟失了类垦。

Redis集群核心原理

1狈邑、Redis cluster沒采用一致性Hash算法，添加集群節(jié)點或刪除節(jié)點需要手工維護(hù)slot遷移蚤认，然后怎么做到熱遷移對線上業(yè)務(wù)無影響的米苹？

關(guān)于一致性Hash算法，可以參考一致性Hash算法 - 簡書 (jianshu.com)

Redis cluster使用的是hash slot算法砰琢，跟一致性Hash算法不太一樣蘸嘶，固定16384個hash槽良瞧，然后計算key落在哪個slot里邊（計算key的CRC16值再對16384取模），key找的是slot而不是節(jié)點亏较，而slot與節(jié)點的對應(yīng)關(guān)系可以通過reshard改變并通過gossip協(xié)議擴(kuò)散到集群中的每一個節(jié)點莺褒、進(jìn)而可以為客戶端獲知，這樣key的節(jié)點尋址就跟具體的節(jié)點個數(shù)沒關(guān)系了雪情。也同樣解決了普通hash取模算法當(dāng)節(jié)點個數(shù)發(fā)生變化時遵岩，大量key對應(yīng)的尋址都發(fā)生改動導(dǎo)致緩存失效的問題。

比如集群增加了1個節(jié)點巡通，這時候如果不做任何操作尘执，那么新增加的這個節(jié)點上是沒有slot的，所有slot都在原來的節(jié)點上且對應(yīng)關(guān)系不變宴凉、所以沒有因為節(jié)點個數(shù)變動而緩存失效誊锭，當(dāng)reshard一部分slot到新節(jié)點后，客戶端獲取到新遷移的這部分slot與新節(jié)點的對應(yīng)關(guān)系弥锄、尋址到新節(jié)點丧靡，而沒遷移的slot仍然尋址到原來的節(jié)點。

關(guān)于熱遷移籽暇，猜想温治，內(nèi)部應(yīng)該是先做復(fù)制遷移，等遷移完了戒悠，再切換slot與節(jié)點的對應(yīng)關(guān)系熬荆，復(fù)制沒有完成之前仍按照原來的slot與節(jié)點對應(yīng)關(guān)系去原節(jié)點訪問。復(fù)制結(jié)束之后绸狐，再刪除原節(jié)點上已經(jīng)遷移的slot所對應(yīng)的key卤恳。

2、當(dāng)主節(jié)點出現(xiàn)故障時寒矿，集群是如何識別和如何選主的突琳？

與哨兵模式比較類似，當(dāng)1個節(jié)點發(fā)現(xiàn)某個master節(jié)點故障了符相、會對這個故障節(jié)點進(jìn)行pfail主觀宕機(jī)拆融，然后會通過gossip協(xié)議通知到集群中的其他節(jié)點、其他節(jié)點也執(zhí)行判斷pfail并gossip擴(kuò)散廣播這一過程主巍，當(dāng)超過半數(shù)節(jié)點pfail時那么故障節(jié)點就是fail客觀宕機(jī)冠息。接下來所有的master節(jié)點會在故障節(jié)點的從節(jié)點中選出一個新的主節(jié)點挪凑，此時所有的master節(jié)點中超過半數(shù)的都投票選舉了故障節(jié)點的某個從節(jié)點孕索，那么這個從節(jié)點當(dāng)選新的master節(jié)點。

3躏碳、去中心化設(shè)計與gossip協(xié)議

所有節(jié)點都持有元數(shù)據(jù)搞旭，節(jié)點之間通過gossip這種二進(jìn)制協(xié)議進(jìn)行通信、發(fā)送自己的元數(shù)據(jù)信息給其他節(jié)點、故障檢測肄渗、集群配置更新镇眷、故障轉(zhuǎn)移授權(quán)等等。

這種去中心化的分布式節(jié)點之間內(nèi)部協(xié)調(diào)翎嫡，包括故障識別欠动、故障轉(zhuǎn)移、選主等等惑申，核心在于gossip擴(kuò)散協(xié)議具伍，能夠支撐這樣的廣播協(xié)議在于所有的節(jié)點都持有一份完整的集群元數(shù)據(jù)，即所有的節(jié)點都知悉當(dāng)前集群全局的情況圈驼。

參考：

Redis高可用方案 - 簡書 (jianshu.com)

面試題:Redis 集群模式的工作原理能說一下么 - 云+社區(qū) - 騰訊云 (tencent.com)

深度圖解Redis Cluster原理 - detectiveHLH - 博客園 (cnblogs.com)

Redis學(xué)習(xí)筆記之集群重啟和遇到的坑-阿里云開發(fā)者社區(qū) (aliyun.com)

云服務(wù)器Redis集群部署及客戶端通過公網(wǎng)IP連接問題

最后編輯于：2021.12.01 15:37:04

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末人芽，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子绩脆，更是在濱河造成了極大的恐慌萤厅，老刑警劉巖，帶你破解...
沈念sama閱讀 217,542評論 6贊 504
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件靴迫，死亡現(xiàn)場離奇詭異惕味，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)矢劲，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,822評論 3贊 394
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門赦拘，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人芬沉，你說我怎么就攤上這事躺同。” “怎么了丸逸？”我有些...
開封第一講書人閱讀 163,912評論 0贊 354
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵蹋艺，是天一觀的道長。經(jīng)常有香客問我黄刚，道長捎谨，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,449評論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任憔维，我火速辦了婚禮涛救，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘业扒。我一直安慰自己检吆，他們只是感情好，可當(dāng)我...
茶點故事閱讀 67,500評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布程储。她就那樣靜靜地躺著蹭沛，像睡著了一般臂寝。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上摊灭，一...
開封第一講書人閱讀 51,370評論 1贊 302
城市分裂傳說
那天咆贬，我揣著相機(jī)與錄音，去河邊找鬼帚呼。笑死掏缎，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的煤杀。我是一名探鬼主播御毅，決...
沈念sama閱讀 40,193評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼怜珍！你這毒婦竟也來了端蛆？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 39,074評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤酥泛，失蹤者是張志新（化名）和其女友劉穎今豆，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體柔袁，經(jīng)...
沈念sama閱讀 45,505評論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡呆躲，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,722評論 3贊 335
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了捶索。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片插掂。...
茶點故事閱讀 39,841評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖腥例，靈堂內(nèi)的尸體忽然破棺而出辅甥，到底是詐尸還是另有隱情，我是刑警寧澤燎竖，帶...
沈念sama閱讀 35,569評論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布璃弄，位于F島的核電站，受9級特大地震影響构回，放射性物質(zhì)發(fā)生泄漏夏块。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 41,168評論 3贊 328
男人毒藥：我在死后第九天來索命
文/蒙蒙一纤掸、第九天我趴在偏房一處隱蔽的房頂上張望脐供。院中可真熱鬧，春花似錦借跪、人聲如沸政己。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,783評論 0贊 22
一樁弒父案垦梆，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽匹颤。三九已至，卻和暖如春托猩，著一層夾襖步出監(jiān)牢的瞬間印蓖，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 32,918評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工京腥，沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留赦肃，地道東北人。一個月前我還...
沈念sama閱讀 47,962評論 2贊 370
代替公主和親
正文我出身青樓公浪，卻偏偏與公主長得像他宛，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子欠气，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 44,781評論 2贊 354