14 June 2017

redis 本地副本集搭建

由于工作需要,最近需要连接redis实现热点数据的存储于开发。代码通过本地简易redis服务器的方式开发后,需要跑到线上。但是向公司运维申请redis机器的时候,发现redis其实有两种常用的提供可用性的方案。
1 redis主备(副本集,master slave)模式。
这种是redis现有的提供redis可用性的方案。
2 redis cluster(集群)
这种是后面提到的集群方案。

早些时候,redis只有方案1,而且方案1中,一开始也只有一主一从的方式。后来,慢慢加入了一主多从,再往后面又加入了一主多从多哨兵的方案。最后才有了cluster集群模式。
这篇博客主要是搭建1主多从多哨兵的HA方案。cluster后面会有新的blog介绍。

差异就是,cluster模式下,所有的keys是通过hash算法,打散后放到不同的master下,也就是会有多个master,每个master也会有对应的slave。这样就提高了整体的HA,也保证了查询的速度。缺点就是,这种模式下,针对不同key进行运算的时候,可能会出现因为不同key所在的server不一致,出现奇怪的问题。这些还没有详细的处理和研究 ,留待后期分析学习。cluster模式最大的用处是拆分了keys,解决了单台物理机器内存不足导致keys的量受限制的问题,实现视频扩展。

这里的重点是one master n slaves,n sentinel(哨兵)的部署方式。其中有且只有一个master,多个n slaves,一般master可读写,slaves定期copy master,通常只读,也可以可写,slaves主要是为了分散读的压力,同时,可以在master挂掉的时候,选举出一个slave,作为新的master,并且将其余slaves 自动follow 到这个新的master上。这部分监听的功能就有sentinel是维护。sentinels监听各个服务,发现master down掉的时候,触发选举。当然,为了sentinel本身不成为单点故障,sentinel本身支持HA,推荐基数台sentinel同时运行,比如3台,当有1台sentinel挂掉的时候,其余两台sentinel仍然可以进行投票选举。

理论讲完,开始实践:

1, Pre requirement,

一台linux机器,我这里用的是centos,其实任何可以编译redis的机器都行。一个可以打开多个窗口的terminal,当然打开多个terminal也行,我是远程访问的linux,推荐使用tmux,这里的redis是3.2.8

2,download redis。 unexpress it

3,make redis

如果遇到一些小问题,请自行上网解决。

4, 准备configuration

我这里解压缩编译的目录是
/home/hunter/sources/redis-3.2.8

编译好后,可执行的redis会出现在src目录,主要是redis-server,redis-sentinel,redis-cli

cd /home/hunter/sources/redis-3.2.8
export REDIS_HOME=/home/hunter/sources/redis-3.2.8
mkdir -p hun_replication/node1
mkdir -p hun_replication/node2
mkdir -p hun_replication/node3
mkdir -p hun_replication/sentinel
touch $REDIS_HOME/hun_replication/node1/redis_5379.conf
touch $REDIS_HOME/hun_replication/node2/redis_5380.conf
touch $REDIS_HOME/hun_replication/node3/redis_5381.conf
touch $REDIS_HOME/hun_replication/sentinel/sentinel1.conf
touch $REDIS_HOME/hun_replication/sentinel/sentinel2.conf
touch $REDIS_HOME/hun_replication/sentinel/sentinel3.conf

这里,redis开头的文件,为redis的服务的配置文件,sentinel开头的文件为sentinel哨兵的配置文件。
这里有三份节点信息,node1作为master,node2,node3作为slaves,同时启动了三个sentinel3,sentinel1,sentinel2,sentinel3

接下来就是分别编辑这几个confs
vi $REDIS_HOME/hun_replication/node1/redis_5379.conf

bind 127.0.0.1
port 5379
dir "/home/hunter/sources/redis-3.2.8/hun_replication/node1"

vi $REDIS_HOME/hun_replication/node2/redis_5380.conf

bind 127.0.0.1
port 5380
dir "/home/hunter/sources/redis-3.2.8/hun_replication/node2"
slaveof 10.100.189.30 5379

vi $REDIS_HOME/hun_replication/node2/redis_5381.conf

bind 127.0.0.1
port 5381
dir "/home/hunter/sources/redis-3.2.8/hun_replication/node3"
slaveof 10.100.189.30 5379

vi $REDIS_HOME/hun_replication/sentinel/sentinel1.conf

# Host and port we will listen for requests on
bind 127.0.0.1
port 25379

#
# "mymaster" is the name of our cluster
#
# each sentinel process is paired with a redis-server process
#
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 10000

sentinel monitor mymaster 127.0.0.1 6379 2
The third "mymaster" here is the name of our repliation.

Each sentinel server needs to have the same name and will point at the master node . The final argument (2 here) is how many sentinel nodes are required for quorum when it comes time to vote on a new master. Since we have 3 nodes, we're requiring a quorum of 2 sentinels, allowing us to lose up to one machine. If we had a repliations of 5 machines,then this should be 3, which would allow us to lose 2 machines while still maintaining a majority of nodes participating in quorum.

sentinel down-after-milliseconds mymaster 5000
a machine will have to be unresponsive for 5 seconds before being classified as down thus triggering a vote to elect a new master node.

sentinel parallel-syncs mymaster 1
一次只有有一台slave进行复制
sentinel failover-timeout mymaster 10000
发生failover的时候,如果failover超过这个时间10s,就表示此次failover失败。

vi $REDIS_HOME/hun_replication/sentinel/sentinel2.conf

# Host and port we will listen for requests on
bind 127.0.0.1
port 25380

#
# "mymaster" is the name of our cluster
#
# each sentinel process is paired with a redis-server process
#
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 10000

vi $REDIS_HOME/hun_replication/sentinel/sentinel3.conf

# Host and port we will listen for requests on
bind 127.0.0.1
port 25381

#
# "mymaster" is the name of our cluster
#
# each sentinel process is paired with a redis-server process
#
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 10000

启动实例

cd $REDIS_HOME/hun_replication/node1/
$REDIS_HOME/src/redis-server redis_5379.conf

cd $REDIS_HOME/hun_replication/node2/
$REDIS_HOME/src/redis-server redis_5380.conf

cd $REDIS_HOME/hun_replication/node3/
$REDIS_HOME/src/redis-server redis_5381.conf

cd $REDIS_HOME/hun_replication/sentinel
$REDIS_HOME/src/redis-sentinel  sentinel1.conf
$REDIS_HOME/src/redis-sentinel  sentinel2.conf
$REDIS_HOME/src/redis-sentinel  sentinel3.conf

这样就完成了一个redis的master slave sentinel实例
可以干掉master节点node1,会发现redis自动切换了一个新的master

redis-cli -p 6379 debug segfault //shutdown the master node of the replciation

然后再次启动redis-server

$REDIS_HOME/src/redis-server $REDIS_HOME/hun_replication/node1/redis_5379.conf

这时候,应该会发现,master已经是node2或者node3,node1自动变成salve。并且,编辑redis_5379.conf 等文件的时候,会发现,redis已经自动修改了配置

接下来是我们用spring data redis连接下这个replication

package example.springdata.redis.sentinel;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.connection.RedisSentinelConfiguration;
import org.springframework.data.redis.connection.jedis.JedisConnectionFactory;
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.util.StopWatch;

import javax.annotation.PreDestroy;

/**
* @author hunter.xue
 */
@Configuration
public class RedisSentinelApplication {
    static String HOST = null;
    static RedisSentinelConfiguration SENTINEL_CONFIG = null;
    static boolean selfReplication = false;
    static {
        if (selfReplication) {
            HOST = "127.0.0.1";//this is my server address
            SENTINEL_CONFIG = new RedisSentinelConfiguration().master("mymaster") //
                    .sentinel(HOST, 25379) //
                    .sentinel(HOST, 25380) //
                    .sentinel(HOST, 25381);
        }
    }

    public
    @Bean
    RedisSentinelConfiguration sentinelConfig() {
        return SENTINEL_CONFIG;
    }

    public
    @Bean
    RedisConnectionFactory connectionFactory() {
        JedisConnectionFactory jedisConnectionFactory = new JedisConnectionFactory(sentinelConfig());
        jedisConnectionFactory.setPassword("NSzTQdollgGQeBsd");
        return jedisConnectionFactory;
    }


    @Autowired
    RedisConnectionFactory factory;

    public static void main(String[] args) throws Exception {

        ApplicationContext context = SpringApplication.run(RedisSentinelApplication.class, args);

        StringRedisTemplate template = context.getBean(StringRedisTemplate.class);
        template.opsForValue().set("loop-forever", "0");

        StopWatch stopWatch = new StopWatch();

        while (true) {

            try {

                String value = "IT:= " + template.opsForValue().increment("loop-forever", 1);
                printBackFromErrorStateInfoIfStopWatchIsRunning(stopWatch);
                System.out.println(value);

            } catch (RuntimeException e) {

                System.err.println(e.getCause().getMessage());
                startStopWatchIfNotRunning(stopWatch);
            }

            Thread.sleep(1000);
        }
    }

    public
    @Bean
    StringRedisTemplate redisTemplate() {
        return new StringRedisTemplate(connectionFactory());
    }




    /**
     * Clear database before shut down.
     */
    public
    @PreDestroy
    void flushTestDb() {
        factory.getConnection().flushDb();
    }

    private static void startStopWatchIfNotRunning(StopWatch stopWatch) {

        if (!stopWatch.isRunning()) {
            stopWatch.start();
        }
    }

    private static void printBackFromErrorStateInfoIfStopWatchIsRunning(StopWatch stopWatch) {

        if (stopWatch.isRunning()) {
            stopWatch.stop();
            System.err.println("INFO: Recovered after: " + stopWatch.getLastTaskInfo().getTimeSeconds());
        }
    }
}