Location>code7788 >text

redis operations and maintenance manual

Popularity:397 ℃/2024-10-11 21:13:17

catalogs
  • redis cluster resource allocation recommendations
    • Production environment
  • basic replication
    • configure
    • Features of replication
    • Network connections in replication
    • replication process
    • replication ID
    • Partial synchronization under reboot and failover
    • Read-only replica
    • Replication reliability
    • replication expire keys
    • Authentication of replica and master
  • Redis Configuration
    • Static configuration
    • dynamic configuration
    • Using redis as a cache
  • Redis Sentinel
    • activate (a plan)
    • configure
      • Main Configurations
      • Optional Configurations
        • Using IP addresses and DNS names
    • accreditation
      • ACL certification
      • Configuring password-only authentication for redis
      • Sentinel ACL Certification
      • Sentinel only configures password authentication
    • Runtime reconfiguration of sentinel
    • Add or delete nodes
      • Additions and deletions of sentinels
      • Remove replicas
    • Autodiscovery of Sentinels and replicas
    • SDOWN and ODOWN Fault States
    • Replica selection and prioritization
    • Configured epochs and passes
    • TILT model
  • Redis Cluster
    • configure
      • activate (a plan)
      • boot port
      • Configuration file parameters
    • node operation
      • Add Node
        • Adding a master node
        • Add replica
      • Remove node
      • reset node
      • upgrade node
        • Upgrade replica
        • Upgrade Master
    • data slice
      • conceptual
      • Hash-tags
      • Implementation of reshard
    • master-replica model
      • Data loss scenarios
      • usability
      • Replica migration
      • Reading data from replica
    • cluster topology
      • node connection
      • node handshake
      • Heartbeat Detection
    • Cluster current epoch and configuration epoch
      • currentEpoch
      • configEpoch
      • ConfigEpoch Conflict Resolution Algorithm
    • troubleshooting
      • Conceptual understanding
        • Point in time in troubleshooting
        • A few Epoch uses
      • fault detection
      • failover
        • replica elections
        • Replica Rankings
        • master's voting process
    • UPDATE message
      • Passing of hash slot configurations
      • Nodes rejoin the cluster
    • Client Connection
      • client redirection
        • MOVED redirection
        • ASK Redirection
  • objectification
    • RDB
      • vantage
      • drawbacks
      • Disable RDB
    • AOF
      • vantage
      • drawbacks
      • Log rewriting
    • Interaction between AOF and RDB persistence
    • Backup and recovery
      • Backing up RDB data
      • Backup AOF data
  • Security
    • ACL
      • Command categories
    • TLS
      • Certificate Configuration
      • Port Configuration
      • Client Authentication
    • Replication
      • Cluster
      • Sentinel
  • Cli
    • General
    • Configuration changes
    • DB
      • Switching db
        • select
      • Exchanging data between two DBs
      • Get the remaining ttl of a key
        • TTL
      • Use the scan command
      • Set expiration time for hash
    • replication
    • persistent
    • Sentinel
    • ACL
    • Cluster
      • A few useful commands in cluster
      • Other commands
    • diagnostic command
      • Slow log
      • Latency monitor
        • Events and time series
        • Enable latency monitoring
  • control
    • Server
      • Sentinel && Cluster
    • Client
    • Mem
      • defragmentation (computing)
    • CPU
    • Persistence
      • rdb
      • aof
    • Stats
    • replication
      • replica
  • performance optimization
    • Benchmark
    • Performance Parameters
    • Diagnosing delayed problems
      • checklist
      • Test the memory latency of the system
      • The Single-Threaded Nature of Redis
      • Delays caused by slow commands
      • Delays caused by forks
      • Delays caused by large transparent pages
      • Delays caused by swaping
      • Latency due to AOF and disk I/O
      • Memory for Redis
  • Tips
    • redis startup failure
    • Redis Upgrade
    • How to promote a replica to master
    • AOF file truncation error
    • Backup and Recovery with RDB
    • How to enable AOF while using snapshot
    • Network isolation in sentinel
    • Redis migration to k8s
    • Python client

redis cluster resource allocation recommendations

Production environment

categorization descriptive minimum requirement suggestion
Nodes per cluster At least 3 nodes are required for reliability and high availability in case of exception handling, node failure and network isolation. 3 Nodes >= 3 nodes (number of nodes must be odd)
Cores per node - 4 cores >=8 cores
Memory per node RAM size should take into account the planned redis storage capacity 15GB >=30GB
interim storage For saving RDB and cluster log files RAM x 2 >= RAM x 4
Persistent Storage For saving RDB and AOF files RAM x 3 In-memory >= RAM x 6 (Extreme 'writing' scenes) in addition to
reticulation Recommended to use multiple NICs nodes, each NIC >100Mbps 1G >=10G

basic replication

This section describes the basic replication process and mechanism of redis master-replica, which is the basis for Sentinel and Cluster. Since this model does not support failover, in the event of a master failure, it can only be performed manually (or scripted) by executing theFAILOVERPerform failover.

redis passes theleader-follower(master-replica mechanism) to achieve high availability. replica instance acts as a copy of the master instance.

configure

You can enable redis replication by adding the following configuration to the replicas configuration file. You can also enable redis replication by calling theREPLICAOFcommand sets the replica's replication configuration (192.168.1.1 6379 for the master's address and port):

replicaof 192.168.1.1 6379

Features of replication

redis Asynchronous is used by defaultreplication, i.e., the replica asynchronously acknowledges its periodically received data to the master, so the master does not wait for the replica's acknowledgement after the data is sent. The acknowledgement message from the replica may be lost in this way, but the throughput is increased.

It is highly recommended to enable persistence in masters and replicas when using the replication feature.If the master does not have persistence enabled. If the master does not have persistence enabled, its dataset will be empty after the master is restarted, and replica will corrupt its original copy of the dataset during replication.

The client can pass theWAITcommand to achieve synchronized REPLICATION of specific data.

redis replication has the following features:

  • redis uses asynchronous replication, i.e., replica asynchronously acknowledges the data sent by master.
  • A master can have multiple replicas.
  • Replicas can connect to other replicas in addition to master, in a cascading pattern (A -> B -> C). Starting with redis 4.0, all sub-replicas receive the same replication stream from master.
  • replication does not block the master, meaning that the master can continue to process requests while processing an initial synchronization or partial resynchronization of one or more replicas.
  • The replication also does not block the replica, i.e., the replica can use the old dataset to process requests while initializing (from master) the synchronized data (thereplica-serve-stale-databecause ofyes, otherwise an error is returned when the replication stream is disconnected). However, after the initial synchronization is finished, the old dataset must be deleted and a new one loaded, during which time replica blocks the inbound connection (for larger datasets, this may last several seconds).
  • Scalability can be achieved using replication, such as using multiple replicas to handle read-only requests.

Network connections in replication

The connection between master and replica is as follows:

  • If the link between master and replica is working, master sends changes to its dataset (client writes, key expirations and evictions, and master dataset changes, etc.) to replica through a command stream
  • If the link between master and replica fails, replica attempts to perform a partial synchronization by reconnecting, i.e., fetching commands that were lost during the broken link.
  • If a partial synchronization cannot be performed, the replica requests a full synchronization, which is a more complex process involving the master creating a full snapshot of the data and sending it to the replica, after which it continues to send dataset changes through the command stream.

replication process

Each redis master has a replication ID that marks a historical dataset. replica and master both have an offset (byte offset), with the master's offset indicating the latest byte offset of its current dataset, and the replica's offset indicating the byte offset from the master, and replica's offset represents the byte offset copied from

When replicas connect to the master, they will use thePSYNC(Partial Resynchronization) command sends its saved masterreplication IDand currently processedoffset, in this way. master can send only the incremental portion (partial resynchronization) needed to the replica. If there are not enough master buffers in thebacklog(repl-backlog-size), or the replica provides a replication ID that is not known to the master, a full (re)synchronization is performed.

The backlog is a buffer that accumulates replica data when replicas are disconnected for some time, so that when a replica wants to reconnect again, often a full resync is not needed, but a partial resync is enough, just passing the portion of data the replica missed while disconnected.

The bigger the replication backlog, the longer the replica can endure the disconnect and later be able to perform a partial resynchronization. The backlog is only allocated if there is at least one replica connected.

replication of a(for) instanceBelow:

  • Redis-1: replid and offset are default values, indicating that it has never performed a synchronization operation with the master node, so it is performing full synchronization;
  • Redis-2: replid master and slave nodes are consistent, replica_offset>=backlog_off and replica_offset<offset, which means that the data lost by this slave node can be retrieved by backlog, so partial synchronization is possible;
  • Redis-3: replid master and slave nodes are consistent, replica_offset<backlog_off, which means that the node has lost too much data, which cannot be retrieved by replicating the backlog, so it is a full synchronization;
  • Redis-4: replid master-slave node is consistent, before it is not a master-slave replication with the current node, so it is doing full synchronization;
image

master will start a background process toCreate an RDB file, while caching all newly received write commands from clients. When the background process finishes, master will pass that RDB file to replica, which will save it to disk and load it into memory. After that master will send all cached commands to replica via command stream.

Typically a full (heavy) sync would require the master to create on disk theanRDB file, and then loads the file from disk and sends it to the replicas.However, if the disk is slow, it may affect the master's operation.Starting with redis 2.8.18, you can set therepl-diskless-sync yesto enable diskless replication, in which case the master sends the RDB directly to the replicas over the network without having to create RDB files on disk.

replication ID

image

From the above, we can know that if two instances have the same REPLICATION ID and OFFSET, it means they have the same data. But in fact an instance has two replication IDs: main ID and secondary ID, corresponding to the current master ID and the ID of the previous master.

A replication ID represents a specific dataset, and a new replication ID is generated when an instance is initialized as a master or when a replica is elevated to master status. replicas connected to a master inherit the master's replication ID after a handshake. Thus instances with the same ID hold the same data (possibly at different points in time).

The reason redis instances have two replication IDs is that in the event of a failover, a replica is promoted to master, at which point it still holds the previous master's replication ID (secondary ID) and offset, and sets it to the MAIN ID, so that it can still use the old master's replication ID to try to partially (re)synchronize with the new master when other replicas are synchronized with the new master, it can still use the old master's replication ID to try to partially (re)synchronize. Only after that will the new master generate a new main ID and use the main ID and secondary ID toco-processingThe connection of replicas is automatically switched to the new master id after all replicas have received data (offset) from all old masters. redis reduces the probability of performing a full synchronization in the event of a failover in this way.

The reason the new master needs to switch the replication ID is that in the event of network isolation, the old master may still be working, and using the same replication ID would go against the fact that the same ID and the same offset indicate the same dataset.

127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.157.4.110,port=6379,state=online,offset=552352655438,lag=0
master_replid:f50c0dadc1005893c39bd8f3a482aa992df82786 # this node(master)(used form a nominal expression)id,mainID
master_replid2:ddcc55eeeca1a1e2e26f858524413bea798b4190 # 上一次同步(used form a nominal expression)master(used form a nominal expression)ID,secondary ID
master_repl_offset:552352659332 # this node(master)(used form a nominal expression)数据偏移量
second_repl_offset:329184800015 # 上一次同步(used form a nominal expression)master(used form a nominal expression)偏移量
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:552351610757
repl_backlog_histlen:1048576

The replication information for the replica corresponding to the above master is as follows:

# Replication
role:slave
master_host:10.157.4.40
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:552357245144
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:f50c0dadc1005893c39bd8f3a482aa992df82786 #be facing (us)master(used form a nominal expression)ID
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:552357245144 # This node copies to the data offset
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:552356196569
repl_backlog_histlen:1048576

Partial synchronization under reboot and failover

  • As described above, in the event of a failover, the new master can still be partially (re)synchronized with the existing replicas.
  • For scenarios like redis upgrades, you can pass theSHUTDOWN +SAVEcommand to save the replica's RDB file for easy partial (re)synchronization. It is important to note that theIf replica is restarted via an AOF file, partial synchronization is not possible. So you need to switch to RDB mode before rebooting, and then switch back to AOF after rebooting

Read-only replica

Starting with Redis 2.6, read-only replica mode is enabled by default and can be accessed via thereplica-read-onlyparameters are configured. In this mode, replica rejects all write commands. However, this parameter also means that replicas can be made to support write commands, but this may lead to inconsistencies between replica and master data, so theThe use of writable replicas is not recommended

Also note that since redis 4.0, the commands written by replica are only local and will not be passed to the child replicas, which will receive the same replication stream as the directly connected replica of the master. For example, in the following scenario, when B writes commands, C will not see the commands written by B. Its dataset is the same as that of master A.

A ---> B ---> C

Replication reliability

redis improves the reliability of replication with the following two parameters, i.e., when the connected replicas are no less thanmin-replicas-to-writeand lag is less thanmin-replicas-max-lagseconds before the redis master accepts the write request. By defaultmin-replicas-to-writeto 0 (i.e., the option is disabled).min-replicas-max-lagFor 10.

min-replicas-to-write 3
min-replicas-max-lag 10

Since the REPLICATION process issynchronousof the data loss, so the above features can only minimize the window of time in which data loss occurs.

replication expire keys

redis uses the following to handle expired keys in replication:

  1. The replicas don't expire keys, they wait for the master to do so, i.e., when the master expires a key, it synchronizes all the replicas with aDELCommand.
  2. Sometimes replicas have logically expired keys in memory, but because the master cannot provide the DEL command in time, the replica will use its logical clock to report to the read operation that the key has expired.
  3. Expire keys operations are not performed during lua script execution.
  4. After a replica is promoted to master, it expires keys independently and does not rely on the old master.

replicasMaxmemory

By default replicas ignoremaxmemory(replica-ignore-maxmemory yes), i.e., it relies on the master's DEL command to EXPIRE the keys, so themay cause the replica to use more memory than its setmaxmemory. So you need to make sure that the replica has enough memory to prevent OOM.

Authentication of replica and master

existConfigure authentication for replica and master in the

masterauth <password>

Redis Configuration

Static configuration

There are two ways to statically configure reids:

  1. One is throughThis approach is recommended for production

  2. The other is a direct pass through the command line, which is recommended for use in a test environment. The format of its parameters is the same as that of theis the same, just prefix the parameter with--prefix (linguistics)

    ./redis-server --port 6380 --replicaof 127.0.0.1 6379
    

dynamic configuration

This can be done byCONFIG SET cap (a poem)CONFIG GETDynamically modifying the redis configuration can also be done via theCONFIG REWRITEPersists dynamic configurations.

Using redis as a cache

If you need to use Redis as a cache, you can do the following

maxmemory 2mb
maxmemory-policy allkeys-lru

In this way, there is no need to pass theEXPIRE The command sets the TTL for keys and uses the LRU algorithm to evict keys when the amount of data reaches 2M.

maxmemory-policyThere are several strategies as follows:

  • volatile-lru: performs LRU evictions only on keys that have an expiration time set
  • allkeys-lru: use LRU to evict all keys
  • volatile-lfu: perform LFU evictions only on keys that have an expiration time set
  • allkeys-lfu: evict all keys using LFU
  • volatile-random: randomly evicts a key with an expiration time.
  • allkeys-random: randomly evicts a key
  • volatile-ttl: randomly evict a key closest to the expiration time (minimum TTL)
  • noeviction: does not evict any key, only throws an error on writes

Redis Sentinel

Sentinel continuously monitors to check if the master and replicas are working properly. If a master fails, Sentinel can initiate a failover process to promote a replica to master, then change the master field of the replicas of the original master to the new master, and inform the application of the new master address when it connects to Sentinel.

activate (a plan)

There are two ways to start up, as follows, and the default port for Sentinels is26379In additionsentinel saves the state in a configuration fileand therefore needs to be specified at startup.

redis-sentinel /path/to/
redis-server /path/to/ --sentinel

configure

Sentine's state is stored in the sentinel's configuration file (hit the nail on the head# Generated by CONFIG REWRITE), so it is safe to stop and restart the sentine process.

Main Configurations

Since sentinel itself can be problematic, to ensure the robustness of the system, it is important to deploy at least3A sentinel instance. When starting sentinel you need to specify an instance namedconfiguration file, a typical configuration is as follows:

port 26379 #sentinelports,default (setting)26379
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1

sentinel monitor resque 192.168.1.3 6380 4
sentinel down-after-milliseconds resque 10000
sentinel failover-timeout resque 180000
sentinel parallel-syncs resque 5

You only need to specify the monitoring master when configuring the sentinel, not the replicas and other sentinels (autodiscovery). The above configured sentinel monitors two redisinstance groupEach group contains a master and an indeterminate number of replicas. A group namedmymasterand another group calledresque. When a failover promotes a replica to master or discovers a new sentinel it willautomatic rewriteThe configuration.

The configuration is as follows:

sentinel monitor <master-name> <ip> <port> <quorum>

quorumSpecifies that at leastquorumAcknowledgments from Sentinels are required for a master to be considered faulty (marking the master status asODOWN), if failover is to be performed, thequorumOne of the Sentinel's is elected as leader and needs to bemost membersSentinel to authorize that leader Sentinel to perform failover.

There are 5 Sentinels, for example.quorumSet to 2:

  • If 2 Sentinels consider a master unreachable, one of the Sentinels will attempt to initiate a failover
  • Failover can be performed if there is authorization for at least 3 Sentinels

Optional Configurations

The format of the optional configuration is:

sentinel <option_name> <master_name> <option_value>

It is mainly used for the following purposes:

  • down-after-milliseconds: Define how long a master is considered down when it is unreachable (unable to respond to PINGs or returns errors).
  • parallel-syncs: Defines the number of replicas that can be reconfigured as new masters at the same time after failover, similar to a rolling upgrade. The smaller the number, the longer the failover process, but if the replicas are configured to respond to a client's read request with old data, all replicas should not be configured as new masters at the same time.
Using IP addresses and DNS names

Starting with redis version 6.2, sentinel can support hostnames, theDisabled by default. Care needs to be taken when using this to not use both the IP address and the host name at the same time.

  • replica-announce-ip <hostname>: Configure the hostname of the redis instance
  • sentinel announce-ip <hostname>: Configure the hostname of the sentinel instance

sentinel resolves the hostname and converts it to an IP address when declaring an instance and when updating the configuration.

When the client connects to the instance using TLS, it may require a hostname (rather than an IP address) to match the ASN of the certificate.

This feature may work with the sentinel clientincompatible

accreditation

image

ACL certification

Starting with redis 6, user authentication and permissions need to be managed through ACLs. After configuring ACLs, Sentinel needs to configure the following commands when connecting to a redis instance:

sentinel auth-user <master-name> <username> #The default user is default
sentinel auth-pass <master-name> <password>

<username> cap (a poem)<password>Indicates the username and password used to access a set of instances required. It is required that theAll redis instancesUp to request least privileges for Sentinel:

127.0.0.1:6379> ACL SETUSER sentinel-user ON >somepassword allchannels +multi +slaveof +ping +exec +subscribe +config|rewrite +role +publish +info +client|setname +client|kill +script|kill

Configuring password-only authentication for redis

  • In the master configurationrequirepass
  • replica configure the password required for master authenticationmasterauth

Sentinel configures the password for connecting to the redis instance just fine:

sentinel auth-pass <master-name> <password>

Sentinel ACL Certification

First disable unauthorized access. Disable the default user (or create a strong password) and create a new one with access to the Pub/Sub channel:

127.0.0.1:5000> ACL SETUSER admin ON >admin-password allchannels +@all
OK
127.0.0.1:5000> ACL SETUSER default off
OK

Sentinel will use the default user to connect to other instances, creating a new superuser in the following way

sentinel sentinel-user <username>
sentinel sentinel-pass <password>

Use ACLs to restrict client access:

127.0.0.1:5000> ACL SETUSER sentinel-user ON >user-password -@all +auth +client|getname +client|id +client|setname +command +hello +ping +role +sentinel|get-master-addr-by-name +sentinel|master +sentinel|myid +sentinel|replicas +sentinel|sentinels +sentinel|masters

Sentinel only configures password authentication

existAll SentinelYou can enable password-only authentication by adding the following parameter in the configuration of the

requirepass "your_password_here"

Runtime reconfiguration of sentinel

The following command needs to be executed on the sentinel instance.Note that configuration changes to a single sentinel are not passed on to other sentines, so when using the following command to make configuration changes, you need to execute it on all sentinel.

  • SENTINEL MONITOR <name> <ip> <port> <quorum> :: In conjunction with theconfiguration file in thesentinel monitorcommand is similar.

  • SENTINEL REMOVE <name>: Remove the specified master.

  • SENTINEL SET <name> [<option> <value> ...]: A redis-likeCONFIG SETcommand to modify the configuration of a specific master, such as the

    SENTINEL SET objects-cache-master down-after-milliseconds 1000
    

    It can also directly modify thequorumConfiguration:

    SENTINEL SET objects-cache-master quorum 5
    

Add or delete nodes

Additions and deletions of sentinels

Adding a sentinel is relatively simple. Due to sentinel's auto-discovery mechanism, you only need to start a new sentinel and monitor the currently active master. Within 10s, sentinel will get a list of other sentinels and replicas of the monitored master.

If more than one sentinel needs to be added, theSuggest adding them one by oneThis can be accomplished bySENTINEL MASTER mastername(num-other-sentinels) orsentinel sentinels masternameVerifies that the added sentinels are ready.

Removing a sentinel is relatively complex:Sentinel doesn't forget the sentinels that have been found.The following steps can be taken to remove a sentinel. A sentinel can be removed by following these steps:

  1. Stop the sentinel process that needs to be removed
  2. Sends theSENTINEL RESET *(If you only need to RESET a master, you can also set the*replace it with a specific master). Wait 30s to remove the next
  3. pass (a bill or inspection etc)SENTINEL MASTER masternameto check the currently active sentinels.

Remove replicas

To remove a replica, you can stop the replica and send theSENTINEL RESET masternamecommand, so that sentinels will refresh the replicas list within the next 10.

Autodiscovery of Sentinels and replicas

sentinels will be passed through the Redis instance'sPub/Subfunction to discover other sentinels that monitor the same masters and replicas and check each other's availability as well as interact with each other's messages and so on. The channel is called__sentinel__:hello

The auto-discovery features are listed below:

  • Each sentinel will periodically (every 2s) send a message to the Pub/Sub channel__sentinel__Publish every monitored master and replica message, including ip, port, and runid
  • Each sentinel will subscribe to the Pub/Sub channel__sentinel__For each master and replica, find the unknown sentinels associated with it. when a new sentinel is detected, add it as a sentinel for that master.
  • The Hello message contains the complete Master configuration. If the sentinel receives a configuration version number higher than the existing configuration version, the configuration is updated immediately.
  • Before adding a sentinel to a master, you need to check if there is already a sentinel with the same runid or address (IP and port pair), and if so, remove the matching sentinel and add a new one.

SDOWN and ODOWN Fault States

Redis Sentinel has two down concepts:

  • one type ofSubjectively Down condition (SDOWN), which indicates the down state reported by a particular Sentinel Instance. When a master is in thedown-after-millisecondsThis state is generated when a PING request cannot be responded to within. A valid response is as follows:

    • PING response +PONG.
    • PING response -LOADING error.
    • PING response -MASTERDOWN error.

    SDOWN is not sufficient to trigger failover; to do so, the ODOWN state needs to be reached.

  • another.Objectively Down condition (ODOWN). If, in a given time, a sufficient number (not less thanquorum) sentinels report master as unavailable (able to pass theSENTINEL is-master-down-by-addrreceive feedback from other sentinels), it elevates SDOWN to ODOWN.The ODOWN state applies only to the master, for instances that do not require sentinel participation (e.g., replica and other sentinels), there is only SDOWN state.

Replica selection and prioritization

During the failover process, the replica to be used as the new master is selected based on the following information:

  1. and master's broken link time
  2. Replica Priority
  3. Replication offset processed
  4. Run ID

A replica is considered unsuitable for failover if the duration of the broken link between the replica and the master exceeds the sum of 10 times the configured master timeout plus the amount of time that the master is considered unavailable by the Sentinel.

(down-after-milliseconds * 10) + milliseconds_since_master_is_in_SDOWN_state

Select from the remaining replicas in the following order:

  1. on the basis ofconfigured in thereplica-priority carries out aSort.the smaller the better
  2. If the priority is the same, check the replication offset of the replica processing and select the replica that receives the most data from the master
  3. If multiple replicas have the same priority and replication offset, select a small run ID in dictionary order

In most cases, there is no need to set replica-priority. if replica-priority is 0, it means that the replica will never be selected as a master by the sentinel.

Configured epochs and passes

When a sentinel is authorized to perform a failover, it is given a list of all the functions associated with themaster-relatedsoleconfiguration epoch , this value is used to mark a new version of the configuration after the end of the failover.

There is a rule in failover: if a Sentinel authorizes another Sentinel to perform a failover of a master, that Sentinel waits a certain amount of time to perform a failover of the same master again, which isdefined infailover-timeoutTwo times the size of (2 * failover-timeout), which meanssentinels will not failover the same maser at the same timeIf the first authorized Sentinel fails to failover, another Sentinel will try to failover again some time later, and so on.

Once a sentinel successfully completes the failover of a master, it broadcasts the new configuration so that other sentinels can update the configuration associated with this master once they receive it.

Failover success requires Sentine to send the selected replica theREPLICAOF NO ONEcommand, and theINFOSee the switched master in the output.

Each sentinel uses the redis Pub/Sub (__sentinel__:hello Pub/Sub channel) continuously broadcasts its version of the master configuration, while all sentinels wait to see the configurations declared by other sentinels. Since different configurations have different version numbers (configuration epoch), prioritizing the higher version of the configuration so that all sentinels end up using the higher version.

TILT model

TILT is a guard mode. It works as follows: sentinel will call clock interrupt 10 times per second, so the time difference between two clock interrupt calls should be around 100ns. sentinel will register the time of the last called clock interrupt and compare it with the current call, and if the time difference is negative or relatively large (e.g., 2s or larger), it will enter TILT mode.

In TILT mode, sentinel will still continue to monitor, but it will not work at all and will have no effect on theSENTINEL is-master-down-by-addrResponse Negative.

If the normal state is restored for 30s, the TILT mode will be exited.

When the TILT mode is entered, it can be accessed through thesentinel_tilt_since_secondsCheck the time of entry into TILT mode, or display -1 if not in TILT mode.

$ redis-cli -p 26379
127.0.0.1:26379> info
(Other information from Sentinel server skipped.)

# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=127.0.0.1:6379,slaves=0,sentinels=1

Redis Cluster

Automatic data sharding and horizontal scaling across multiple nodes is possible with Redis Cluster. This is possible with Redis Cluster:

  • Slicing and dicing datasets across multiple nodes
  • Ability to continue to operate the remaining cluster when a portion of the nodes fail or are unable to communicate

There are a few things to keep in mind when using Redis Cluster:

  • Redis Cluster does not support multiple DBs like standalone Redis, it only supports DB 0

    127.0.0.1:6379> info keyspace
    # Keyspace
    db0:keys=33880932,expires=24332094,avg_ttl=0
    
  • Cluster nodes do not forward commands to the correct node, thus requiring clients to redirect to the correct node after receiving redirection errors (MODE and ASK), thus requiring clients to be able to connect to all nodes.

  • Maximum number of nodes recommended 1000

  • When multiple keys need to be processed in a single command, you can use theHash-tagsEnsure that all keys for an operation are in the same hash slot

configure

Redis Cluster requires a minimum of 3 master nodes and recommends 6 nodes: 3 masters and 3 replicas.

activate (a plan)

Start a Cluster node using the following, which can be added laternodal

redis-server ./

cluster-config-file: This parameter specifies the location of the cluster configuration file. Each node maintains a cluster configuration file during operation; whenever the cluster information changes (such as adding or removing nodes), all nodes in the cluster will update the latest information to this configuration file; when the node is restarted, it will re-read the configuration file to get the cluster information, which can be easily rejoined to the cluster.

That is, when a Redis node is started in cluster mode, it first looks to see if there is a cluster configuration file, and if there is one, it starts using the configuration in the file, and if there is not, it initializes the configuration and saves it to a file.Cluster configuration files are maintained by Redis nodes and do not need to be modified manually

Good editing.cluster-config-filefile, use theredis-servercommand to start the node:

redis-server <cluster-config-file>

boot port

Each Redis Cluster node needs to have two TCP connections open: a TCP port (e.g., 6379) to connect with clients; and a TCP port namedcluster bus portport, the defaultcluster bus portbecause ofData port plus 10000(e.g., 16379) and, of course, through thehit the nail on the headcluster-portOverrides the default configuration.

cluster busIt is a communication channel between nodes for exchanging information between nodes such as node discovery, failure detection, configuration updates, failover authorization, etc. In order for a Redis Cluster to work properly, it is required:

  • Open the client communication port (usually 6379) to all clients and other clusters that need to use the client port for key migration.
  • Open to other nodescluster bus port

Configuration file parameters

The configuration file for the Redis Cluster is namedThe main configuration items are as follows:

  • cluster-enabled <yes/no>: whether to enable Redis Cluster, if not, as a separate redis instance
  • cluster-config-file <filename>: Optionally, Redis Cluster automatically persists the cluster configuration (state) upon cluster changes.Users cannot modify
  • cluster-node-timeout <milliseconds>: If a master node is unable to connect to most masters during this time, the node is unavailable and the node will stop receiving requests.This parameter is used to allow the master to stop receiving client requests.
  • cluster-slave-validity-factor <factor>: If 0, the replica will always try to failover to a master via failover without considering the broken link time between the master and the replica. If positive, and the node is a replica, the replica will always try to failover to a master by failover when the broken link time between it and the master node is more thancluster-node-timeout *factorIf a master does not have a replica and this parameter is non-zero, a master failure will cause the Redis Cluster to become unavailable (until the master rejoins the cluster).This parameter is used to initiate failover
  • cluster-migration-barrier <count>: Define the minimum number of replicas that a master needs to be connected to so that redundant replicas can be migrated to a replica with a master that does not have any Working replica. see [replica-migration](#Replica migration).
  • cluster-require-full-coverage <yes/no>: DefaultyesIf it isyes, the Redis Cluster will stop receiving requests when the cluster detects that at least 1 hash slot is UNCOVERED (there is no node corresponding to that hash slot). The cluster will stop receiving requests if the hash slot forno, then you can continue to receive requests for the remaining keys.
  • cluster-allow-reads-when-down <yes/no>: DefaultnoIf it isno, i.e., when a node is unable to connect to most masters or there are UNCOVERED hash slots, the cluster will stop receiving any traffic.

node operation

When adding a node, you need to equalize the original slots to the new node, which can be done with theredis-cli --cluster rebalance <ip:port> --cluster-use-empty-mastersrealization

To delete a node, you need to reshard the slots of the node that needs to be deleted, and thenredis-cli --cluster rebalance <ip:port>Rebalancing slots

See also:Hash Slot Resharding and Rebalancing for Redis Cluster

Add Node

Adding a master node

To add a master, first add an empty node and then transfer the data to that node.

  1. Assuming a new node 127.0.0.1:7006 is added, first create a directory for it

  2. utilizationadd-nodeAdding nodes.127.0.0.1:7000for a node that already exists in a cluster, redis-cli will pass theCLUSTER MEET command to add it to the cluster

    redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000
    
  3. utilizationreshardMigrate data to this node

Add replica

There are three ways to add replica as follows:

  • When a replica is added using the following command, Redis randomly selects a master with the fewest replicas since no master is specified:

    redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-slave
    
  • You can add a replica for a specific master using the following:

    redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-slave --cluster-master-id 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e
    
  • It is also possible to start an empty master node and then pass theCLUSTER REPLICATE command sets the node as a replica of some master, typically used to move an added node to a different master:

    CLUSTER REPLICATE node-id
    

Remove node

  • To remove a replica node, just use thedel-nodecommand.127.0.0.1:7000for a known node of the cluster.node-idis the id of the reolica to be deleted:

    redis-cli --cluster del-node 127.0.0.1:7000 `<node-id>`
    
  • If a master node is to be removed, theThen you must ensure that the master node is empty, so first you need to reshard all the data to other nodes and then execute the command as above.

  • If you want to remove amalfunctionsnode, then you cannot use thedel-nodeBecausedel-nodewill try to connectallnode, which returns theconnection refusederror, you can callCLUSTER FORGET to remove the failed node:

    redis-cli --cluster call 127.0.0.1:7000 cluster forget `<node-id>`
    

    CLUSTER FORGET The command will execute as followsmanipulate

    1. Remove the node with the specified ID from the node table
    2. Re-adding nodes with the same ID is prohibited for 60s. This is to prevent redis from rediscovering nodes that need to be removed again via gossip

reset node

When the need arisesWhen adding an existing node to another clusterThis can be done byCLUSTER RESET command resets the node, which has the following two forms:

  • CLUSTER RESET SOFT
  • CLUSTER RESET HARD

There are the following rules:

  1. soft and hard: if node is replica, make it master and discard dataset. if node is master and contains keys, interrupt reset operation
  2. soft and hard: release all slots, reset manual failover state
  3. soft and hard: remove other nodes from the node table so that the node will not be recognized by other nodes
  4. HARD only: willcurrentEpoch, configEpochcap (a poem)lastVoteEpochSet to 0
  5. hard only: change the node ID to a new random ID

If master contains non-empty datasets, it cannot be reset. It can be reset with theFLUSHALL command clears the data.

upgrade node

Upgrade replica

Upgrading a replica node is as simple as stopping the node and restarting it with the new version. If a client is performing a read operation through the replica, it will reconnect to another replica when the node becomes unavailable.

Upgrade Master

Upgrading the MASTER is more complicated and the recommended process is as follows:

  1. utilizationCLUSTER FAILOVER Manually trigger a failover to change the master that needs to be upgraded to a replica (data security operation)
  2. Wait for master to become replica
  3. Then upgrade that node
  4. After the upgrade is complete, you can manually trigger failover again to turn the upgraded node into a master

data slice

conceptual

Instead of using a consistent hash, the Redis Cluster uses a method calledHash slotThe Redis Cluster has the16384A hash slot is passed through theHASH_SLOT = CRC16(key) mod 16384to compute akeyThe slot in which it is located.

Each node in a Redis Cluster is responsible for a portion of the hash slot, if there are 3 nodes:

  • Node A hash slots from 0 to 5500
  • Hash slots for node B are 5501~11000
  • The hash slots for node C are 11001~16383

When a node D is added, some of the hash slots need to be shifted from A, B, and C to D. Similarly, to remove a node A, the hash slots need to be shifted from A to B and C. A can be deleted when the hash slot of A is empty.Note: A, B, C, and D here denote master nodes.

Moving hash slots from one node to another doesn't require stopping any operations, so there is no downtime required for either adding or removing nodes or changing a node's hash slot percentage.

Hash-tags

In Redis Cluster, different keys may fall into different hash slots, and if you operate keys in different hash slots at the same time in one command (Multi-keys operations) and the hash slots corresponding to those keys are located on different nodes, a redirection error is returned. For more, seeclient redirection

image

[Redis] Multi-key command in cluster mode (feat. CROSS-SLOT)

This can be done byhash tagsLet multiple keys be assigned to the same hash slot. The way this works is that if the key contains "{...}" , then only the hash slot will be calculated from the string inside the curly braces. If a key has more than one curly brace, the hash slot will only be calculated from the string inside the first occurrence of the brace.

  • As the key{user1000}.followingand key{user1000}.followersknow how to useuser1000Compute Hash Slots
  • key foo{bar}{zap}utilizationbarhash
  • key foo{{bar}}utilizationbarhash
  • key foo{}{bar}Since there are no characters inside the first curly brace, the entire string is hashed.

Implementation of reshard

  • Execute the following command to start the interactive reshard. Just specify a node in the command, and redis-cli will automatically discover other nodes:

    redis-cli --cluster reshard 127.0.0.1:<cport>
    
  • A non-interactive reshard can also be performed, as follows--cluster-yesIndicates non-interactive:

    redis-cli --cluster reshard <host>:<port> --cluster-from <node-id> --cluster-to <node-id> --cluster-slots <number of slots> --cluster-yes
    

    --cluster-fromYou can specifyall, indicating that other nodes are used as source hash slots.

At the end of reshard, thecluster nodesView hash slot assignments.

master-replica model

There are 1 (master) to N replicas (N-1 replica nodes) per hash slot in Redis Cluster. If there are three nodes A, B and C, if B fails, it will not be able to serve hash slots 5501 to 11000 at this time.

But if you add a replica node for each master when creating the cluster, the cluster will contain nodes A, B, and C and their replica nodes A1, B1, and C1, and if B fails, then B1 will act as the new master. but if B and B1 fail at the same time, then the Redis Cluster will not be able to continue to operate.

Data loss scenarios

Redis Cluster does not guarantee strong consistency of data (asynchronous replication), i.e.In some cases, data written by the client may be lost. there are two data loss scenarios for Redis:

  1. Suppose a client writes to master B. Master B responds OK to the client and then needs to send the written data to its replicas B1, B2, and B3. Since B does not wait for an answer from B1, B2, and B3 before replying to the client, a failure can cause data loss if a replica fails to send that data to a replica after B confirms the write to the client and before the node times out. replicas before sending that data, a failure occurs, at which point one of the replicas is promoted to master after the node times out, resulting in data loss.
  2. When the client writes to a master with network isolation, and the master belongs to a small number of masters that have been isolated, the write data for this period will be lost after the network is restored.

Redis Cluster has an important configurationcluster-node-timeout, if a master node is unable to connect to most masters (Ping/Pong) during this time it will enter an error state where it will stop receiving write operations as a way to limit the window of data loss.

The client can be accessed via theWAIT command ensures that no data is lost.

usability

When network isolation occurs, if the side of the network with the majority of masters contains (at least) 1 replica of unreachable masters, theNODE_TIMEOUTAdd the time (1~2s) for replica to switch to master and then restore the cluster.

Replica migration automatically migrates redundant replicas to a master with no replicas, thus improving cluster stability.

Replica migration

A Redis Cluster will become unavailable if both the master and its replicas fail at the same time. The simplest way to improve cluster availability is to add replicas for each master in the cluster, but this approach also has cost issues.

Another way to get the cluster to work is by creating asymmetric masters and replicasautomationChange Layout. If a master does not have any Working status replica, Replica migration can automatically migrate a replica to that master's replica.

For example, a cluster has 3 masters A, B, and C, where A and B each have 1 replica and C has 2 replicas: C1 and C2.

  • Master A fails, A1 promoted to master
  • Since A1 doesn't have any replica, C2 migrates to A1's replica
  • After 3 hours A1 also malfunctioned and C2 was upgraded to a new master to replace A1
  • Cluster uptime

arithmetic

Since the migration algorithm operates on the replica and does not involve changes to the configEpoch, theNo cluster authorization required

Upon detecting the presence of at least one master in the cluster without a Working replica, each replica performs the migration algorithm, but thein generalOnly one replica will actually perform the migration: a replica from the master with the most replicas that is in a non-FAIL state and has the smallest node ID.

When the cluster is unstable, there may be contention, where more than one replica at the same time believes it has the smallest node ID (unlikely to happen in practice), causing multiple replicas to migrate to the same master at the same time. However, this situation is not harmful, and if the competition results in no replicas for a particular master, the migration algorithm will be executed again to migrate the replica back after the cluster is stabilized.

cluster-migration-barrierparameter controls the minimum number of replicas that a master should keep before the remaining replicas can be migrated.

Reading data from replica

Normally when a client connects to a replica node, the replica node redirects the client to its master, but this can be done by using theREADONLY command for client to read data from replica.

When the master of the replica to which the client is connected does not contain the hash slot involved in the command, a redirect message is sent to the client.

This can be done byREADWRITE command to cancel readonly mode.

cluster topology

node connection

A Redis Cluster is a grid of interconnected nodes in an N-node cluster with N-1 outgoing per node (cluster bus port) TCP connection with N-1 incoming (cluster bus port) connections, and these TCP connections are long connections. The nodes avoid exchanging too much information with each other through the gossip protocol and a configuration update mechanism.

node handshake

The cluster will admit a node in one of the following two ways:

  • The node is connected to the network via a MEET message (CLUSTER MEET ) message announces this node. the MEET message type PING message but forces the receiving node to accept it as part of the node.MEET messages can only be sent through the system administratorCLUSTER MEET ip port
  • A node will admit nodes that have been admitted by a trusted node.

Heartbeat Detection

The Redis Cluster's nodes use Ping and Pong messages toDetecting Heartbeat. Typically a node willRandomly ping a fixed number of nodesand each node also makes sure to Ping those nodes that are not in theNODE_TIMEOUT/2The node that sent a Ping or received a Pong during the time.

Ping and Pong messages contain a common first part:

  • Node ID
  • Send the node'scurrentEpochcap (a poem)configEpoch
  • node flag, i.e. replica, master or other node information
  • The bitmap of the hash slot that the sending node is responsible for.
  • The client port of the sending node
  • Cluster port of the sending node
  • Cluster state from the sending node's point of view, down or ok
  • Master node ID of the sending node (if the sending node is replica)

Ping and Pong messages also contain a gossip section that contains only descriptive information from the sending node to other nodes for fault detection and node discovery:

  • Node ID.
  • IP and port of the node
  • Node flags.

Cluster current epoch and configuration epoch

Cluster current epoch is also known ascurrentEpoch, configuration epoch is also known as configEpoch.

currentEpoch

The newly created node (master/replica) has a currentEpoch of 0.

When a message is received from another node, update currentEpoch to the sender's epoch if the epoch in the first part of the sent cluster bus message is higher than the local node's epoch.Eventually all nodes in the cluster are updated to the largest currentEpoch

This value is currently only used to promote replica to master (which willnext section) Scene.

configEpoch

Each master always declares its configEpoch and the hash slot bitmap it is responsible for in Ping and Pong messages. replica also declares its configEpoch in Ping and Pong messages, but the value is the master's epoch value at the time of the most recent interaction with the master. A newly created master node has a configEpoch of 0.

A new configEpoch is generated at replica election time.The replica will increase the epoch of the faulty master and try to get authorization from most of the masters, once the replica is authorized, a unique configEpoch is created, after which the replica will be converted to a master using the new configEpoch.

configEpoch can also be used to solve problems caused by multiple nodes declaring divergent configurations (network isolation case).

ConfigEpoch Conflict Resolution Algorithm

existfailoverIn this case, when a replica is promoted to master it is necessary to ensure that a unique configEpoch value is obtained. However, in the following two scenarios the configEpoch is created in an insecure manner, i.e., it is created by adding only the local node'scurrentEpoch, potentially leading to configEpoch conflicts.Both are triggered by the system administrator

  1. utilizationCLUSTER FAILOVER TAKEOVERManually promote a replica node to master, at which point theDoes not require authorization consent from most masters, suitable for configuring multiple data centers.
  2. Migrating slots also generates a new configEpoch in the local node, and for performance reasons, alsoConsent of other nodes is not required

When manually resharding a hash slot from node A to node B, it forces B to update its configEpoch to the cluster maximum and add 1 (unless it is already the largest configEpoch) without the consent of the other nodes. In practice, a reshard usually involves hundreds of hash slots, and generating a new configEpoch for each of them would lead to performance degradation; for this reason, it is only necessary to generate a new configEpoch when transferring the first hash slot.

In addition to the above two reasons, software bugs or file system problems may also cause multiple nodes to have the same configEpoch. to solve this problem, a conflict resolution algorithm is used to ensure that different nodes have different configEpochs:

  1. If a master node detects that it has the same configEpoch as other master nodes
  2. and the ID of this node is smaller compared to other nodes with the same configEpoch
  3. Then add 1 to the currentEpoch of this node as configEpoch

If there are multiple nodes with the same configEpoch, all nodes except the one with the largest ID will increase the configEpoch by performing the above steps, which ultimately ensures that all nodes have a unique configEpoch (the replica's configEpoch is the same as its master).

When initializing a cluster, a cluster can be initialized with the redis-cliCLUSTER SET-CONFIG-EPOCH commandBefore joining a clusterConfigure a different configEpoch for each node.

troubleshooting

Conceptual understanding

Failover in Redis Cluster involves two points that are more difficult to understand: one is the existence of multiple time intervals for the replica and master during failover, and the other is that thecurrentEpoch/configEpoch/lastVoteEpochThe concepts.

Point in time in troubleshooting

In failover, aReplicaThe following time periods need to be experienced, where the Election Delay isElection Delay = 500 milliseconds + random delay between 0 and 500 milliseconds + REPLICA_RANK * 1000 milliseconds, Voter cycle may also exist more than one (in the case of a failed replica vote request).

image

For the master, there is only one file of sizeNODE_TIMEOUT * 2A master will only vote for one replica under the same master in a voting cycle.

A few Epoch uses
  • lastVoteEpoch: Information about the last polling cycle recorded on the MASTER side.
    • Purpose: Used only by master to verify the validity of replica voting requests.
      • If the request has acurrentEpochLess than the master recordlastVoteEpoch, indicating that the request belongs to an old polling cycle and does not start a new polling cycle, ignore the request.
    • Timing of update: After the master has successfully voted for the replica, it will move thelastVoteEpochUpdated to the request'scurrentEpoch
  • currentEpoch: All nodes of the cluster should have the samecurrentEpoch
    • Purpose: Mainly used to indicate the number of failover polling cycles currently in progress. Used to prevent replica from incorrectly calculating the number of votes for master.
      • The replica sends a poll request withcurrentEpochIf the master finds that the requestedcurrentEpochLess than the master'scurrentEpoch, then the master ignores the request.
      • If replica receives the master's vote in thecurrentEpochis smaller than the replica'scurrentEpochThe results of this poll will also be ignored
    • Update Timing:
      • Nodes in a Redis Cluster propagate heartbeat messages through thecurrentEpoch, if a node'scurrentEpochmessage received in the heartbeat messagecurrentEpochthen update the localcurrentEpoch, eventually all nodes will be updated with the samecurrentEpoch
      • The Replica executes the polling request with thecurrentEpoch + 1
  • configEpoch: Different masters should have different values.
    • Uses:
      • Used to indicate the current master's configuration period. When multiple masters claim ownership of the same hash slots, they can be configured with theconfigEpochsettle a dispute
      • The polling process will not work if the hash slots in the request'sconfigEpochis smaller than the corresponding hash slot of the master's local record.configEpochIf the request is not received, then the request is ignored.
    • Update Timing:
      • Updating the replica via heartbeat messagesconfigEpoch
      • Update the rejoin node with an UPDATE message on theconfigEpoch
      • replica generates a new one after a successful election.configEpoch
      • ConfigEpoch conflict resolution algorithm automatically resolves multiple nodesconfigEpochConflict issues
      • fulfillmentCLUSTER FAILOVER TAKEOVER
      • Perform reshard hash slots

fault detection

When Redis Cluster detects that a master or a replica node cannot be connected by the majority of masters, it tries to promote a replica to master, and if failover fails, the cluster throws an error and stops receiving requests from clients.

The Redis Cluster has two failure states, PFAIL and FAIL. pFAIL indicates that there is no failure in theNODE_TIMEOUTA Pong message is not received from the opposite end within a time period, and it is assumed that the opposite end may be faulty but has not yet been acknowledged. a FAIL indicates that there is a failure in theover time(NODE_TIMEOUT), most masters confirmed the node failure.

When a node (whether master or replica) is not able toNODE_TIMEOUTIt is marked as PFAIL when it connects within the time. but PFAIL is only a local description of each node's state to the other nodes and needs to be elevated to the FAIL state if it is to trigger a failover.

In a Redis Cluster, each node passes node state information to the other nodes via heartbeat messages (gossip messages) so that each node receives every other node'snode flagset (multiple copies). to notify other nodes about the faulty node it has found. The process is as follows:

  • Suppose node A marks node B as PFAIL
  • Node A will broadcast the above message to the cluster via gossip
  • existNODE_TIMEOUT * FAIL_REPORT_VALIDITY_MULTtime (validity factor is 2 in the current implementation), most masters mark the node as PFAIL or FAIL

After that, node A will mark node B as FAIL and then send a heartbeat message (state as FAIL) to all the reachable nodes. After the other nodes receive the FAIL message, they will force the B node to be set as FAIL.

PFAIL-> FAIL is unidirectional, but the FAIL flag can be cleared in the following scenario (node recovery):

  • node is reachable as replica. replica does not perform failover at this time
  • The node is reachable, acts as a master, but does not contain any hash slots. Since an empty master does not really participate in the cluster, it needs to be configured to join the cluster.
  • nodes are reachable as master, but a long time in the past (N times theNODE_TIMEOUT), and no promoted replica was detected.

Note that the PFAIL->FAIL transformation is based on a convention, but that convention is weaker:

  1. Nodes take some time to collect reports from other nodes, which come from different nodes at different points in time, so there may be old failure reports. For this the reports that are not within the time window need to be discarded
  2. Each node that detects a FAIL condition forces the other nodes to apply the FAIL message, but there is no guarantee that the message reaches all nodes.

In the event of a brain split (network isolation) two scenarios may occur: the majority of nodes consider a node to be in a FAIL state; a small number of nodes consider the node to be in a non-FAIL state:

  • Scenario 1: If most masters mark a node as FAIL, eventually all other nodes will set that node as FAIL
  • Scenario 2: If only a small percentage of masters mark a node as FAIL, the replica will not be promoted to master and each node will clear the FAIL status according to the rules above.

The purpose of the PFAIL tag is simply to trigger a failover.

failover

This can be done byCLUSTER FAILOVERcommand to manually perform failover, in which case no data is lost because replica waits for the data to match that of the master before switching over.

replica elections

In order to promote a replica to master, it first has to go through an election and win it. If the master is in the FAIL state, thepossessThe replicas will all start an election. But in the end, only one of them will win the election and be promoted to master.

When the following conditions are met, replica will start the election:

  • The master status of the replica is FAIL.
  • master contains non-empty slots
  • The replica and the master's replication are not disconnected for more than a certain period of time as a way to ensure that the data in the replica being promoted is relatively new.

In order to be selected, replica will firstIncrease its currentEpoch counterand then by broadcastingFAILOVER_AUTH_REQUESTmessage to all masters to request a poll. Then wait (NODE_TIMEOUT * 2, at least 2s) master's response.

Once a master responds to a replica'sFAILOVER_AUTH_ACKrequest, it would not be in theNODE_TIMEOUT * 2time to vote for a replica under the same master, nor will it respond to authorization requests from the same master.

replica will discard less than currentEpoch'sAUTH_ACKresponsive, avoiding the counting of votes from the previous election.

If replica gets a majority of masters' ACKs, it wins the election; otherwise, it interrupts the election with aNODE_TIMEOUT * 4Retry the election afterward (at least 4s).

Replica Rankings

Once the master changes to the FAIL state, the replica waits for a period of time before the election and this delay is calculated as follows:

DELAY = 500 milliseconds + random delay between 0 and 500 milliseconds +
        REPLICA_RANK * 1000 milliseconds.

The reason for the replica to delay waiting is to allow the FAIL to be delivered to the cluster, otherwise if the master does not sense that FAIL status it may result in a rejection of the vote. Also since different ranked replica has differentREPLICA_RANKThe same can be said forPrevent all replica elections from starting at the same time

REPLICA_RANKIt will be based on the replica from theAmount of data replicated by masterto rank the replica: the replica with the latest replication offset is 0, the second newest is 1, and so on. This way the replica with the most recent data is prioritized for election. If the election of a higher ranked replica fails, the other replicas will try to elect within a short time.

Once a replica wins the election, it gets a unique and incremental configEpoch (greater than the other existing masters), and announces its master role and the hash slots it owns via Ping and Pong. If the old master rejoins the cluster, it will realize that a higher configEpoch already exists, at which point it will update the configuration and replicate the message from the new master.

master's voting process

After the master receives the replicas'FAILOVER_AUTH_REQUESTWhen requested, replica is voted for when the following conditions are met:

  1. In an epoch, a master will only poll once and will not process old polling cycles (epochs). Each master has alastVoteEpochfield in the request if thecurrentEpochless thanlastVoteEpoch, then the vote is rejected. After a master votes for a request, it updates the correspondinglastVoteEpoch

  2. A master will only vote for its master's replica if the replica's master is marked as FAIL

  3. If the request has acurrentEpochLess than mastercurrentEpochIf the request is not received, then the request is ignored. master will only send a request to theequalcurrentEpochof the replica to vote. If that replica initiates another vote request, currentEpoch is incremented.This is to prevent replica from receiving old polling information in new polls due to delays.

    Without Rule 3, the following would occur:

    • mastercurrentEpochFor 5.lastVoteEpochis 1 (indicating that only a few faulty elections have occurred)
    • replicacurrentEpochfor 3
    • replica attempted an election using epoch 4(3+1), and master returned ok (currentEpoch 5), but that response is delayed
    • The replica did not receive the request and tried again using an epoch 5(4+1) election, at which point it received a delayed master response (currentEpoch 5), replica considers the response valid. This miscalculates the number of masters voting.
  4. After voting for a master's replica, masters won't be able to vote on it later in theNODE_TIMEOUT * 2Vote again for the replica under the same master within the time period. Since 2 replicas will not be elected at the same time within an epoch, this value is mainly used in practice to allow enough time for the elected replica to notify the other replicas and avoid the other replica to be elected again and start an unnecessary failover process.

  5. masters do not try to select the best replica. since masters do not vote again within the same epoch for replicas under a master that has already voted, the best replica is more likely to be the first to initiate an election and win it.

  6. If the master refuses to vote for a replica, it will simply ignore the vote request

  7. If a replica sends aconfigEpochis smaller than any of the slots recorded in the master's hash slot mapping table with the replica declaring theconfigEpoch, then the master refuses to vote for it, indicating that the replica's configuration is not up-to-date and that it will be updated subsequently with an UPDATE message. See alsoPassing of hash slot configurations

UPDATE message

Passing of hash slot configurations

One of the key features of a Redis Cluster is how the hash slot information of the nodes is passed. This is critical for scenarios such as starting a new cluster, failover, and nodes rejoining the cluster.

There are two ways to pass the slot configuration:

  • Heartbeat message: the sender's Ping or Pong message will contain information about the hash slots it owns
  • UPDATE message: since the heartbeat message contains the sender's configEpoch and hash slots, if the heartbeat message receiver realizes that the sender's information is not up-to-date, it will send it a message containing the latest information to force it to update its information.

A new Redis Cluster node is created with an empty hash slot and a value of NULL, indicating that the hash slot is not bound to any node:

0 -> NULL
1 -> NULL
2 -> NULL
...
16383 -> NULL

The passing of hash slots needs to satisfy the following rules:

Rule 1: If a hash slot is NULL, when a known node declares ownership of that hash slot, this node modifies the local hash slot table to associate it with the node they are on. The following indicates that node A is responsible for hash slots 1 and 2 and configEpoch is 3:

0 -> NULL
1 -> A [3]
2 -> A [3]
...
16383 -> NULL

However, this mapping is not fixed, and the mapping of hash slots is changed during both failover and manual reshard.

Rule 2: If a hash slot has already been allocated and a known node declares ownership of the hash slot using a configEpoch larger than the configEpoch of the master associated with the hash slot, the hash slot query is bound to the new node. As a result of this rule, all nodes in the cluster will eventually agree on the ownership of the hash slot by the node with the largest configEpoch.

B is the replica of A. After failover, it broadcasts its configuration information via heartbeat messages, and the receivers update their respective local hash slot mapping tables when they receive heartbeat messages from node B announcing the ownership of configEpoch 4 and hash slots 1 and 2:

0 -> NULL
1 -> B [4]
2 -> B [4]
...
16383 -> NULL

Nodes rejoin the cluster

When a node A rejoins the cluster, it will announce the hash slots it owns via a heartbeat message and using the old epoch, and the message receiver will find that the hash slots associated with node A have a higher configEpoch associated with them, it will send an UPDATE message to node A containing the latest configurations of these hash slots, and then A will update its according to the rule 2 above. configuration according to rule 2 above.

It may be more complicated in practice; if node A rejoins the cluster after a long period of time, the hash slots it originally owned may already be owned by multiple nodes. After rejoining the cluster, node A will switch roles according to the following rules:

master as the replica of the node that stole its last hash slot

In most cases, the node that rejoins the cluster acts as a replica of the new master in a failover.

Client Connection

client redirection

A Redis client can connect to any node in the cluster (including replica), and the node will analyze the node where the hash slots corresponding to the keys in the request are located and process the hash slots belonging to that node, or else it will check its internal mapping table of hash slots to nodes and return a MOVED error to the client:

GET x
-MOVED 3999 127.0.0.1:6381

The above error indicates the endpoint:port where the hash slot corresponding to key(3999) is located, after which the client needs to resend the request to the endpoint:port of that node.

A good Redis client should be able to record the mapping of hash slots to nodes, which can be done via theCLUSTER SHARDSmaybeCLUSTER SLOTSGet the cluster layout.

MOVED redirection
ASK Redirection

TODO

objectification

Redis supports the following persistence modes:

  • RDB(redis database): used to periodically persist a point-in-time snapshot of the dataset
  • AOF(append only file): AOF persists all write operations received by the server. The original dataset can be rebuilt by replaying these operations when the server is started.
  • No persistence: Disable persistence.
  • RDB+AOF: Enabling both AOF and RDB on the same instance

RDB

RDB isanVery compact, binary snapshot file that can represent redis data at a point in time, with a file name of(dbfilename ), which can be accessed through theSAVE maybeBGSAVE Set the conditions for Snapshotting generation. Such as the following indicates checking every 60s and dump dataset if at least 1000 keys are changed:

save 60 1000

RecommendedBGSAVEnon-blocking mode.SAVEExecuting the synchronize command blocks Redis.

The RDB file is created in thecourse of eventsIn this case, redis first writes the dataset to a temporary RDB file, and when the write is finished, removes the old RDB file (rename operation).

It can be obtained in two waysThe location of the file:

  • defined indirparameters
  • pass (a bill or inspection etc)redis-cli -p 6379 CONFIG GET dircommand acquisition

vantage

  • Periodic backups can be achieved with RDB
  • Great for disaster recovery. Can transfer RDB files to data centers, such as Amazon S3.
  • The RDB is created in a child process and does not affect the performance of the main process.
  • For larger datasets, RDB restarts faster than AOF
  • RDB supports partial synchronization in replication

drawbacks

  • RDB is not guaranteed to minimize data loss in the event of redis downtime.
  • RDB needs fork() child process to persist to disk, if datesset is relatively large, fork() may take a long time, resulting in redis not being able to process clients request within a certain milliseconds or even 1 second. And although AOF also needs to execute fork(), its fork() operation is not as frequent as RDB.

Disable RDB

config set save ""

AOF

Enable AOF with the following configuration so that you can restore the state via AOF after a redis reboot:

appendonly yes

vantage

  • AOF has a different fsync policy: no fsync is performed (appendfsync no, which relies on OS refresh), executing fsync once per second (appendfsync everysec), each request performs fsync(appendfsync always). By default, fsync is executed once per second, i.e., up to 1 second of write data is lost
  • AOF is an append-only log file that will not cause corruption problems even if a power failure occurs. If for some reason the last command in the log is incomplete, it is also possible toredis-check-aofTool for repair
  • AOF will execute rewrite in the background as the file gets bigger.
  • AOF format is easy to understand and parse

drawbacks

  • For the same dataset, the AOF file is usually larger than the RDB file
  • Depending on the specific fsync policy (e.g., theappendfsync always), AOF may be slower than RDB.

Log rewriting

As write operations are performed, the AOF becomes larger and larger, for example, if you increase a counter 100 times, you will get a final value in the dataset, but in the AOF file there will be 100 records of the operation, and 99 of them don't have any effect when restoring the redis state.The purpose of rewrite is to remove redundant operation records and keep only the final value.

The rewrite process is safeWhen rewrite is executed, redis can continue to append data to the old file. rewrite uses a completely new file and writes the minimum set of operations needed to create the current dataset, after which redis switches to the new file and begins appending commands to the new file.

Starting with Redis 7.0.0, when an AOF rewrite is performed, the master process opens a new incremental AOF file to continue the write operation, and the child process performs the rewrite logic to generate a new base AOF. redis keeps track of the incremental AOF and the base AOF using a temporary manifest file. base AOF. when these two files are ready, the atomic replacement operation can be performed through the manifest file. After the rewrite takes effect, redis deletes the old base file and the useless incremental file.

If rewrite fails, it is also possible to use the old base and increment files (if they exist), as well as the newly created increment file to represent the complete updated dataset.

The rewrite execution policy can be debugged using the following two parameters.

  • auto-aof-rewrite-percentage: If the current AOF file is a hundred percent larger than the AOF file after the last rewrite, then you can rewrite the current AOF file to a larger AOF file.auto-aof-rewrite-percentageand the AOF file is not smaller thanauto-aof-rewrite-min-size, then the rewrite operation is performed.
  • auto-aof-rewrite-min-size: Only if the AOF file is not smaller than this value, it is allowed under theauto-aof-rewrite-percentageExecute rewrite. this is to prevent unnecessary rewrites.

Interaction between AOF and RDB persistence

In Redis >= 2.4, to avoid high disk I/O, theWill avoid simultaneous AOF rewrite and RDB snapshot operations. If snapshoting is in progress and the user at that point passes theBGREWRITEAOFcommand initiates an AOF rewrite, the server will tell the user that the operation has been scheduled via the ok status code, and will start rewriting when snapshotting is complete.

If both AOF and RDB are enabled, after a redis restart theprioritizationUse an AOF file to rebuild the original dataset (AOF has complete information).

Backup and recovery

Backing up RDB data

RDB files do not change after they are generated, so you can copy the RDB file directly to a safe place.

Backup AOF data

Starting with Redis 7.0.0, AOF files are sliced intoappenddirname() directory for multiple files, you can use the copy/tar command to create a directory backup. However, if at this point arewriteIf you do, you may get an invalid backup, so you need to disable the AOF rewrite feature when backing up:

  1. Disable AOF rewrite:CONFIG SET auto-aof-rewrite-percentage 0
  2. Make sure there are no rewrite: commands runningINFO persistencein the outputaof_rewrite_in_progressis 0.
  3. copy (loanword)appenddirnamecatalogs
  4. At the end of the backup, re-enable AOF rewrite:CONFIG SET auto-aof-rewrite-percentage <prev-value>

In redis versions less than 7.0.0, you can copy the AOF file directly.

Security

ACL

ACLs have a number of configuration rules, seeacl-rules. There are two ways to create ACLs: using theACL SETUSER command; use theACL LOADLoad the ACL file.

The following creates creates a file namedaliceuser, which is not used hereSETUSERSpecify the rules:

> ACL SETUSER alice
OK

At this point alice has the following permissions:

  1. With ACLs turned off, it is not possible to use theAUTHconfigurealice
  2. The user has not set a password
  3. Unable to access any commands. The default user created has command permissions of-@all
  4. No access to any key
  5. No access to any Pub/Sub channels

followingaliceA password is configured and only allows the use ofGET command to get the data in the form ofcached:The key at the beginning.

> ACL SETUSER alice on >p1pp0 ~cached:* +get
OK

utilizationACL GETUSERYou can view the user's permissions:

> ACL GETUSER alice
1) "flags"
2) 1) "on"
3) "passwords"
4) 1) "2d9c75..."
5) "commands"
6) "-@all +get"
7) "keys"
8) "~cached:*"
9) "channels"
10) ""
11) "selectors"
12) (empty array)

Can continue to useSETUSERAdding permissions for users.SETUSERNew permissions can be appended to users:

> ACL SETUSER alice ~objects:* ~items:* ~public:*
OK
> ACL LIST
1) "user alice on #2d9c75... ~cached:* ~objects:* ~items:* ~public:* resetchannels -@all +get"
2) "user default on nopass ~* &* +@all"

Command categories

Adding directories one by one for users is cumbersome. You can add permissions to multiple commands directly to users via categories.

Use the following commands to view the supported categories, and the commands contained under each category. Note that a command may be in more than one category:

ACL CAT -- Will just list all the categories available
ACL CAT <category-name> -- Will list all the commands inside the category

An example of how to add an ACL using categories is as follows:

> ACL SETUSER antirez on +@all -@dangerous >42a979... ~*

TLS

Certificate Configuration

tls-cert-file /path/to/
tls-key-file /path/to/
tls-ca-cert-file /path/to/
tls-dh-params-file /path/to/

Port Configuration

port 0Indicates that non-TLS ports are disabled

port 0
tls-port 6379

Client Authentication

This can be done bytls-auth-clients noDisable client authentication

Replication

The master needs to be configured with thetls-port cap (a poem)tls-auth-clients

The replica side needs to be settls-replication yesto enable TLS for connections to the master.

Cluster

set uptls-cluster yesto enable TLS for the cluster bus.

Sentinel

Sentinel also passestls-replicationto determine whether connections to the master are TLS-enabled, this parameter is also used to determine whether accepted connections from other Sentinels are TLS-enabled. only if you enable thetls-replicationThetls-port

Cli

General

  • info memory
    • used_memory_human: Indicates the size of the cached user data.
  • memory stats: show the memory usage of the service in an array format
  • memory doctor: Memory Diagnostic Tool
  • FAILOVER:: In conjunction with theCLUSTER FAILOVER Similarly, for non-Cluster redis

Configuration changes

  • CONFIG GETCONFIG GET parameter [parameter ...]. Get the redis configuration (). This can be accomplished byCONFIG get *View all current redis configurations
  • CONFIG SET: Setting up redis configuration at runtime without restarting Redis
  • CONFIG REWRITE: This command will persist the configuration being used by redis to thefile, if you use theCONFIG SETcommand, the generated configuration send file may not match the original file.

DB

  • CONFIG GET databases: Check the number of databases in redis, there are 16 by default.

  • INFO keyspace: Shows the home directory of each db in the following format. wherekeysindicates the total number of keys in the DB.expiresindicates the total number of expired keys.avg_ttlindicateThe average ttl in milliseconds obtained from random sampling, a value of 0 means that no expiration time is set for all sampled keys.

    dbXXX: keys=XXX,expires=XXX,avg_ttl=XXX,subexpiry=XXX
    
  • FLUSHALL [ASYNC | SYNC]: Delete all DB keys.

Switching db

  • select <db_num>

Exchanging data between two DBs

  • SWAPDB index1 index2: Exchanging data from two db's so that clients connected to a specific db will immediately see data from the other db. For exampleSWAPDB 0 1will exchange data between db0 and db1, all clients connected to db0 will immediately see the exchanged db1 data, and the same clients connected to db1 will see the original data belonging to db0.

Get the remaining ttl of a key

  • TTL <key_name>

utilizationscancommand

  • Gets a portion of the keys in the database:SCAN cursor [MATCH pattern] [COUNT count] [TYPE type]e.g.SCAN 0 COUNT 100 MATCH NEW_MD*Indicates that obtaining the name of the file that starts withNEW_MDThe 0th to 100th data at the beginning.
  • Get keys of a specific type:SCAN 0 type <type>e.g.scan 0 type hash. It is possible to usetypecommand to see the type of key

It is also possible to use scan without specifyingcountscanwill return two values: the first value (17) is the starting position of the next call to scan, and the second is the match. If the first value returned by scan is 0, it means the end of scan, i.e., complete iteration.

redis 127.0.0.1:6379> scan 0
1) "17"
2)  1) "key:12"
    2) "key:8"
    3) "key:4"
    4) "key:14"
    5) "key:16"
    6) "key:17"
    7) "key:15"
    8) "key:10"
    9) "key:3"
   10) "key:7"
   11) "key:1"
redis 127.0.0.1:6379> scan 17
1) "0"
2) 1) "key:5"
   2) "key:18"
   3) "key:0"
   4) "key:2"
   5) "key:19"
   6) "key:13"
   7) "key:6"
   8) "key:9"
   9) "key:11"

Set expiration time for hash

The HSET command does not have a direct parameter to set the expiration time, there are two ways to set the expiration time as follows:

  • One is to put the expiration time into the hash value and delete it in a program-controlled way, but this approach relies on program execution and may lead to redis memory leaks

  • Another way is as follows, i.e., after configuring a hash, then call the EXPIRE command to set its TTL, which is recommended:

    redis> HSET myhash field1 "helloworld"
    (integer) 0
    
    redis> EXPIRE myhash 60
    (integer) 1
    

replication

  • Info replication: View replication information. To view the status of its replicas on the master.connected_slavesIndicates the number of connected replicas.slave0:ip=10.150.208.73,port=6380,state=online,offset=455191,lag=0notestatecap (a poem)lagTwo fields

    # Replication
    role:master
    connected_slaves:1
    slave0:ip=10.150.208.73,port=6380,state=online,offset=455191,lag=0
    master_replid:d550cc0b17bba2e43d5fe6caa57bb74ea23dc755
    master_replid2:8c561bcd6cf98616cf802f646cbb5d87dba0a452
    master_repl_offset:455191
    second_repl_offset:2792
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:660
    repl_backlog_histlen:454532
    
  • role: Displays the role of the current instance:masterslaveneverthelesssentineland replication offset. the master can also see its data offset and the data offset information of the slave.

    127.0.0.1:6379> role
    1) "master"
    2) (integer) 11483751172473
    3) 1) 1) "10.157.4.26"
          2) "6379"
          3) "11483751172187"
    
  • REPLICAOF:REPLICAOF <host port | NO ONE>

    • If an instance is already a replica of some master, theREPLICAOF host portwill stop copying data from the old master and discard the old dataset and start copying data from the new master.
    • REPLICAOF NO ONEwill stop replication and promote the instance to master, but will not drop the dataset.
  • PSYNCPSYNC replicationid Replication. Execute this command on the replica node to request a replication stream from the master node.

    • replicationidIndicates the id of the master;offsetIndicates the last offset received by replica. Used for performing partial (re)synchronization
    • This can be done byPSYNC ? -1Performs complete (re)synchronization.
  • WAITWAIT numreplicas Client Usage. This command blocks the current client until it receives at leastnumreplicasAcknowledgement messages from replicas ortimeout(ms)。timeoutA value of 0 means blocking forever.

    The WAIT command returns the number of replicas acknowledged after 1s of command execution (the ping interval between replicas and master is used) and after a timeout.

    Note that WAIT does not make redis a strongly consistent store, and there is still the possibility of data loss. For example, after the master sends a write command to the replicas after the failure, but the replicas for some reason (such as network) did not receive this command, the subsequent in a relica to upgrade to mater will be lost after the message. But WAIT improves data security to some extent.

  • SHUTDOWNSHUTDOWN [NOSAVE | SAVE] [NOW] [FORCE] [ABORT]. Shut down the redis service with the following procedure:

    1. If there is a lag in the replication
      • pass (a bill or inspection etc)CLIENT PAUSE Suspend write operations on clients
      • Wait for the shutdown command to timeout (default 10s) to allow replicas to catch up with the replication offset
    2. Stop all clients
    3. If a SAVE point is configured, a single blocking SAVE is performed
    4. Flush AOF file if AOF is enabled
    5. Withdrawal of services

    The parameters are as follows:

    • SAVE: If no save point is configured, a DB save operation is forced (RDB)
    • NOSAVE: The DB save operation is not performed even if the save point is configured
    • NOW: no need to wait for delayed replicas, i.e. skip the first step of shutdown
    • FORCE: Ignore all errors that prevent the service from exiting, such as failing to save the RDB file.
    • ABORT: Cancel the ongoing shutdown operation.

    If you need to shut down the redis instance as soon as possible, you can use theSHUTDOWN NOW NOSAVE FORCEcommand. Prior to 7.0, you could use theCONFIG appendonly norespond in singingSHUTDOWN NOSAVEto close AOF and exit Resis.

    After Redis 7.0, during a shutdown, the default wait time is 10s to allow replicas to catch up with the master's offset as much as possible, as a way to reduce the amount of data lost if no savepoints are configured and AOF is disabled.Prior to redis 7.0, a master node could be shutdown via theFAILOVER (orCLUSTER FAILOVER) Use the current master to demote and promote another replica to master.

persistent

  • info persistence:: View rdb and aof information

    127.0.0.1:6379> info persistence
    # Persistence
    rdb_changes_since_last_save:58287
    rdb_changes_since_last_save:58287
    rdb_bgsave_in_progress:1 #bgsave, saving file in background
    rdb_last_save_time:1723595190
    rdb_last_bgsave_status:ok #status of last bgsave execution
    rdb_last_bgsave_time_sec:215
    rdb_current_bgsave_time_sec:137
    rdb_last_cow_size:56147968
    aof_enabled:0 #Mark whether AOF is enabled or not
    aof_rewrite_in_progress:0 #Marks whether AOF rewrite is in progress or not
    aof_rewrite_scheduled:0 #Tag if AOF rewrite is pending, after scheduling, need to consider if RDB dump is being executed, etc., and will not be executed immediately.
    aof_last_rewrite_time_sec:-1
    aof_current_rewrite_time_sec:-1
    aof_last_bgrewrite_status:ok
    aof_last_write_status:ok #Status of last AOF rewrite execution
    aof_last_cow_size:0
    module_fork_in_progress:0
    module_fork_last_cow_size:0
    
  • BGREWRITEAOF : start the AOF rewrite process in the background

  • BGSAVE: Generate rdb dump files in the background.

Sentinel

  • info sentinel: Displays information about Redis Sentinel mode: .

    • sentinel_masters: The number of masters of this Sentinel instance monitor.
    • sentinel_tiltA value of 1 indicates that the Sentinel is in TILT mode.
    • sentinel_tilt_since_seconds: Duration of the current TILT
    • sentinel_running_scripts: the number of scripts currently executed by the sentinels
    • sentinel_scripts_queue_length: Length of the queue of user scripts waiting to be executed
    • sentinel_simulate_failure_flags: SENTINEL SIMULATE-FAILURE Marking of commands
  • SENTINEL MASTER : To display the status of a particular master, the main points to note are as follows:

    • num-other-sentinels: indicates that there are several more instances of sentinel
    • flags: Normally only themasterIf the master is down, it may displays_downmaybeo_down(See [SDOWN and ODOWN fault states](#SDOWN and ODOWN fault states))
    • num-slaves: Indicates the number of replicas belonging to the master.

    More details can be viewed with the following two commands:

    • SENTINEL REPLICAS <master name> (>= 5.0): show the information of replicas of a specific master, you can view the current status through the flag
    • SENTINEL SENTINELS <master name>: display information about the sentinels of a particular master, you can view the current status through the flag
  • SENTINEL MASTERS: show the status of all masters

  • SENTINEL FAILOVER : If the master is unreachable, enforce failover without asking for consent from other sentinels.

  • Runtime reconfiguration of sentinel

  • SENTINEL RESET <pattern>: Used to reset masters with matching names. the reset process clears all previous state of a master (including ongoing failover) and removes the replica and sentinel associated with that master. used toRemove sentinelcap (a poem)Remove replica

ACL

  • ACL LIST
  • ACL GETUSER
  • ACL CAT : View ACL category

Cluster

A few useful ones in clustercommand

  • redis-cli --cluster check <ip:port> -a <password>Check the Cluster status:
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
localhost:6379 (891a4f01...) -> 33773876 keys | 5462 slots | 1 slaves.
10.157.4.111:6379 (66fb9464...) -> 33783136 keys | 5461 slots | 1 slaves.
10.157.4.41:6379 (0107b5fb...) -> 33775733 keys | 5461 slots | 1 slaves.
[OK] 101332745 keys in 3 masters.
6184.86 keys per slot on average.
>>> Performing Cluster Check (using node localhost:6379)
M: 891a4f01875f855e0a3b13ce2121128270ef4e38 localhost:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: 66fb9464ff92d2373e72655da4fa54c96487f120 10.157.4.111:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
M: 0107b5fb8b5e1325aec42404753b7884061cb496 10.157.4.41:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 825faec7fc7c6bdc0ef6b7ede81477ea58d0e1c9 10.157.4.109:6379
   slots: (0 slots) slave
   replicates 0107b5fb8b5e1325aec42404753b7884061cb496
S: 463d0bda350bb5b1ceddb2318aed5e06e3162ff5 10.157.4.42:6379
   slots: (0 slots) slave
   replicates 66fb9464ff92d2373e72655da4fa54c96487f120
S: 6f0e1ad522a0bb9021b4ee6be2acb7e04e767b45 10.157.4.110:6379
   slots: (0 slots) slave
   replicates 891a4f01875f855e0a3b13ce2121128270ef4e38
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
  • redis-cli --cluster reshard <host>:<port> --cluster-from <node-id> --cluster-to <node-id> --cluster-slots <number of slots> --cluster-yes: Redistribute slots

  • redis-cli --cluster rebalance <ip:port>: innonempty (set)If you need to equalize slots on an empty node (e.g., a newly added node), you need to use the parameter--cluster-use-empty-masters

  • redis-cli --cluster fix <ip:port>": Used to repair clusters, which can be repaired in the following two cases:

    • --cluster-search-multiple-owners : Fix duplicate allocation of slots. When slots in a cluster are duplicated during migration (which can be fixed by theredis-cli --cluster checkcheck), the problem can be fixed with this parameter
    • --cluster-fix-with-unreachable-masters : Repair slots on unreachable master nodes. For example, a master node in the cluster has failed and failover has failed. You can use this parameter to restore all slots on that master node to the surviving master node (The previous data will be lost, simply restoring the slot)。

Other commands

  • info cluster: Displays information about Redis Cluster mode

  • CLUSTER SET-CONFIG-EPOCH: When initializing a cluster, you can use the commandBefore joining a clusterConfigure a different configEpoch for each node.

  • CLUSTER NODES : An administrator's view showing the cluster status of the Redis Cluster in the following format:

    <id> <ip:port@cport[,hostname]> <flags> <master> <ping-sent> <pong-recv> <config-epoch> <link-state> <slot> <slot> ... <slot>
    
    • id: Node ID
    • ip:port@cport: Node IP:client_port@cluster_port
    • hostnamecluster-annouce-hostnameSet hostname
    • flags
      • myself: denotes this node
      • master: master node
      • slave: replica nodes
      • fail?: PFAIL A node with a status indicating a node that the current node cannot connect to
      • fail: FAIL state of the node. When multiple nodes are unable to connect, elevate the PFAIL state to FAIL
      • handshake:: Untrustworthy nodes in the handshake phase
      • noaddr: Nodes with location addresses
      • nofailover:: Replica that does not perform failover
      • noflags: no flag
    • master: If the node is a replica and its master is known, use the master's node ID, otherwise use "-"
    • ping-sent: CurrentmaneuverUnix time of sending ping, 0 means no pending ping, in milliseconds.
    • pong-recv: Unix time when pong was last received, in milliseconds
    • config-epoch: configuration epoch, indicating the node'scurrentEpoch
    • link-state: node tocluster busThe connection status of theconnectedmaybedisconnected
    • slot: Hash slot number or range indicating the hash slot a node is responsible for.
  • CLUSTER SLOTS : client perspective.CLUSTER NODES After Redis 7.0.0, a subset of theinvalidate. The first two lines of each element indicate the beginning and end of the slot:

    127.0.0.1:6379> cluster slots
    1) 1) (integer) 10923  # slots 10923-16383
       2) (integer) 16383
       3) 1) "10.157.4.111"
          2) (integer) 6379
          3) "66fb9464ff92d2373e72655da4fa54c96487f120"
       4) 1) "10.157.4.42"
          2) (integer) 6379
          3) "463d0bda350bb5b1ceddb2318aed5e06e3162ff5"
    2) 1) (integer) 0      # slots 0-5460
       2) (integer) 5460
       3) 1) "10.157.4.41"
          2) (integer) 6379
          3) "0107b5fb8b5e1325aec42404753b7884061cb496"
       4) 1) "10.157.4.109"
          2) (integer) 6379
          3) "825faec7fc7c6bdc0ef6b7ede81477ea58d0e1c9"
    
  • CLUSTER SHARDS: Introduced with Redis 7.0.0. Used to replaceCLUSTER SLOTScommand. A shard represents a set of master-replicas nodes with the same hash slots, each shard has only one master, and 0 or more replicas. this command returns two parts: "slots" and "nodes". the former represents the slots in the shard, and the latter represents the nodes in the shard. The former represents the slots in the shard and the latter represents the nodes in the shard.

    "nodes" ofendpointfield indicates where the connection should be made when requesting a specific slot.

  • CLUSTER FAILOVERCLUSTER FAILOVER [FORCE | TAKEOVER]. Manual execution of failover must be performed on one of the replicas corresponding to the master that needs to perform failover. The process is as follows:

    1. The replica notifies the master to stop processing requests for clients.
    2. The master responds to this replica with the current replication offset.
    3. The replica waits for the local replication offset to match the master to ensure data synchronization.
    4. The replica initiates failover, gets a new configuration epoch from most masters, and broadcasts this configuration epoch
    5. The old master will unblock clients after receiving the configuration update and start responding to redirect messages.

    FORCE options (as in computer software settings): There will be no handshake with the master at this point (the master may already be unreachable), and failover starts directly from step 4.FORCEStill need authorization from most masters to failover and generate a new configuration epoch for that replica

    TAKEOVER options (as in computer software settings): Sometimes a replica needs to be promoted to master, and theNo cluster authorization required(e.g., data center switchover), it unilaterally generates a new configuration epoch. if its configuration epoch is not the maximum in that master's instance group, it needs to increase its configuration epoch, then allocate all of the master's hash slots to itself and pass the new configuration to the other nodes with the new configuration.due toTAKEOVERThere is no guarantee that a replica's configuration is up-to-date, so be careful using the

  • CLUSTER REPLICATECLUSTER REPLICATE node-id. Sets a node to the replica of the specified master. if the node is not a replica. the following conditions need to be met:

    • The node does not have any hash slots
    • The node is empty and does not have any keys in its keyspace
  • CLUSTER FORGET : for removing a failed node, seeRemove node

  • READONLY: Support for clientReading data from replicas. It is possible to use theREADWRITE Cancel the mode.

diagnostic command

Slow log

The Slow log allows you to view commands that have exceeded a certain execution time. Note that this execution time does not include I/O operations, such as interaction with the client, but only command execution time.

  • slowlog-log-slower-than: When the command execution time exceeds this value, log it to the slow log
  • slowlog-max-len: Specifies the maximum number of records in the slow log.

The command to view the slow log is as follows:

  • SLOWLOG GETSLOWLOG GET [count]. By default, the latest 10 logs will be displayed, which can be accessed via thecountAdjusts the number of logs displayed. The fields that are displayed are as follows:
    • Identifier for each log (+1 each time)
    • Recorded unix timestamps of executed commands
    • Time spent executing commands, in microseconds
    • Parameters of the command
    • Client's IP and address
    • pass (a bill or inspection etc)CLIENT SETNAMEThe name of the set client

Latency monitor

Latency monitoring encompasses the following concepts:

  • Latency hooks for capturing different latency-sensitive code paths
  • Record time series of peaks with different time delays
  • Reporting engine for pulling raw data from time series
  • Analytics engine that provides readable reports and prompts based on measurements
Events and time series

A latency spike is when the runtime of an event exceeds the configured latency threshold (latency-monitor-threshold).

The different code paths monitored have different names and become events. For examplecommandis an event that is used to measure the peak latency of a possible slow command execution.fast-commandthen it is the name of the event for monitoring the O(1) and O(log N) commands. Other commands are used to monitor characteristic operations performed by redis, such asforkEvents.

Each monitored time is associated with a separate time series. Time series work in the following way:

  • Whenever a delay spike is generated, it is recorded into the appropriate time series
  • Each time series contains 160 elements
  • Each element consists of contains a peak delay expressed as a Unix timestamp and an event execution time expressed using milliseconds.
  • If multiple delay spikes are generated in the same second, the maximum delay is selected.
  • Record the maximum delay time for each element.

The supported events are as follows:

  • command: regular commands.
  • fast-command: O(1) and O(log N) commands.
  • fork: the fork(2) system call.
  • rdb-unlink-temp-file: the unlink(2) system call.
  • aof-fsync-always: the fsync(2) system call when invoked by the appendfsync allways policy.
  • aof-write: writing to the AOF - a catchall event for write(2) system calls.
  • aof-write-pending-fsync: the write(2) system call when there is a pending fsync.
  • aof-write-active-child: the write(2) system call when there are active child processes.
  • aof-write-alone: the write(2) system call when no pending fsync and no active child process.
  • aof-fstat: the fstat(2) system call.
  • aof-rename: the rename(2) system call for renaming the temporary file after completing BGREWRITEAOF.
  • aof-rewrite-diff-write: writing the differences accumulated while performing BGREWRITEAOF.
  • active-defrag-cycle: the active defragmentation cycle.
  • expire-cycle: the expiration cycle.
  • eviction-cycle: the eviction cycle.
  • eviction-del: deletes during the eviction cycle.
Enable latency monitoring

Use the following command to enable latency monitoring. which means that the latency threshold is 100 ms. the default value is 0, which means that latency monitoring is turned off.

CONFIG SET latency-monitor-threshold 100

View latency monitoring:

  • LATENCY LATEST : Return an updated sample of delays from all incidents. The report is as follows:
    • Event Name
    • Unix timestamp of the latest delayed event occurrence
    • Execution time of the latest delay event in milliseconds
    • Maximum latest latency since redis instance startup
  • LATENCY HISTORY : Returns a time-delayed time series at a specific time
  • LATENCY RESET: reset the delay time series of one or more events
  • LATENCY GRAPH : Sample latency for rendering an event using an ASCII graph
  • LATENCY DOCTOR: return a human-readable latency analysis report

control

The monitoring data for redis exporter comes from the redisinfocommand, most of the metrics names are just added to the original redis metrics with theredis_Prefixes. The main categories are as follows:

  • server: General information about the redis server
  • clients: Client connection information
  • memory: Information about memory usage
  • persistence:: RDB and AOF information
  • stats:: General status information
  • replication:: Master/replica replication information
  • cpu: CPU usage information
  • commandstats: Redis Command Status
  • latencystats: Redis command latency percentage distribution statistics
  • sentinel:: Redis Sentinel information
  • cluster: Redis Cluster Information
  • modules: Modules Information
  • keyspace: Database-related statistics
  • errorstats: Redis error statistics

The important indicators for each section are given below:

Server

  • redis_uptime_in_seconds: The number of seconds since the Redis server started
  • io_threads_active: Indicates whether I/O multithreading is enabled. This is achieved by configuring theio-threadsparameter enables the feature, it is disabled by default.

Sentinel && Cluster

  • redis_sentinel_masters : Number of redis masters monitored by this Sentinel instance
  • redis_cluster_enabled: Whether to enable redis Cluster

Client

  • redis_connected_clients: Number of clients connected (excluding connections from replicas)
  • redis_blocked_clients: PENDING is called on a blocking call (BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, BZPOPMIN, BZPOPMAX) Number of clients on
  • redis_clients_in_timeout_table: number of clients in the client timeout table

Mem

  • redis_memory_max_bytes: Corresponds to the memory limit set by the container, or 0 if it is not set.

  • redis_memory_used_bytes: Total memory used by redis

  • redis_memory_used_peak_bytes: peak memory usage in redis

  • redis_memory_used_rss_bytes: physical memory requested by redis

  • redis_memory_used_startup_bytes: Memory consumed at redis startup

  • redis_memory_used_overhead_bytes: redis manages the memory occupied by its internal data structures

  • redis_memory_used_dataset_bytes: The size of the dataset.redis_memory_used_bytes-redis_memory_used_overhead_bytes

  • used_memory_dataset_percused_memory_datasetPercentage of net memory usage.

defragmentation (computing)

Ideally.used_memory_rssIt should be slightly larger thanused_memoryIf rss is much larger than used, it means that there may be (external) memory fragmentation, which can be solved by theallocator_frag_ratiocap (a poem)allocator_frag_byteCheck the evaluation. If used is much larger than rss, it means that redis is doing memory swapping, which may cause severe latency.When redis frees memory, it returns the memory to the allocatorand allocator may or may not return memory to the system, so there may be a discrepancy between the memory usage reported by the operating system and the memory actually used by redis, which can be determined by using theused_memory_peakCheck this out.

  • redis_active_defrag_running: Whether memory defragmentation is in progress.

  • redis_mem_fragmentation_ratio: equal toredis_memory_used_rss_bytes/redis_memory_used_bytes, a value between 1 ~ 1.5 is healthy, more than that, it means more memory fragmentation, need to defragment memory. Containsallocator_*Indicator's memory.

  • redis_mem_fragmentation_bytesredis_memory_used_rss_bytescap (a poem)redis_memory_used_bytesThe increment betweenin the event thatredis_mem_fragmentation_bytesSmaller (a few MB), theredis_mem_fragmentation_ratioGreater than 1.5 is not a problem

  • redis_allocator_frag_ratio: equal toallocator_active/allocator_allocated.. Indicates an external memory fragmentation indicator.

  • redis_allocator_frag_bytes: equal toallocator_active-allocator_allocated

  • redis_defrag_hits: Number of reallocations processed by the defragmentation process

  • redis_defrag_key_hits: Number of keys successfully defragmented

  • redis_defrag_misses: Number of reallocations interrupted by the defragmentation process

  • redis_defrag_key_misses: Number of keys skipped by the defragmentation process

  • redis_defrag_hits:: Number of values reassigned during the defragmentation process

  • redis_defrag_misses:: Number of values not reassigned during the defragmentation process

  • redis_defrag_key_hits:: Number of keys processed in defragmentation

  • redis_defrag_key_misses:: Number of keys skipped in defragmentation

CPU

  • redis_cpu_sys_seconds_total: System CPU used by the Redis server, including the total user CPU used by all threads in the Server process (main and background threads).
  • redis_cpu_user_seconds_total: User CPU used by Redis server, including the total user CPU used by all threads in the server process (main and background threads)
  • redis_cpu_sys_children_seconds_total:: System CPU occupied by background processes
  • redis_cpu_user_children_seconds_total: User CPU occupied by background processes
  • redis_cpu_sys_main_thread_seconds_total: System CPU used by the Redis server master process
  • redis_cpu_user_main_thread_seconds_total: User CPU used by the Redis server master process.

Persistence

  • redis_loading_dump_file: Indicates if a dump file is being loaded.

If the file is being loaded, there is also the following indicator (Cli command line only):

  • loading_start_time: Start time of document loading
  • loading_total_bytes: Size of loaded file

rdb

  • redis_rdb_bgsave_in_progress: Whether an RDB save operation is in progress
  • redis_rdb_last_bgsave_status: Status of last bgsave
  • redis_rdb_last_bgsave_duration_sec: Duration of the last bgsave
  • redis_rdb_current_bgsave_duration_sec: the duration of the ongoing RDB save operation
  • redis_rdb_last_cow_size_bytes: the size of memory used by the COW in the last RDB save operation
  • redis_rdb_changes_since_last_save: Number of changes since the last dump

aof

  • redis_aof_enabled: Whether to enable AOF
  • redis_aof_rewrite_in_progress: whether an AOF rewrite operation is in progress
  • redis_aof_last_rewrite_duration_sec: Duration of the last AOF rewrite operation
  • redis_aof_current_rewrite_duration_sec: Duration of the AOF rewrite operation currently in progress
  • redis_aof_last_write_status: the status of the last AOF rewrite operation
  • redis_aof_last_cow_size_bytes: the size of the memory used by the COW in the last AOF rewrite operation

Stats

  • redis_connections_received_total: Total number of connections received by the Server

  • redis_commands_processed_total: Total number of commands processed by Server

  • redis_net_input_bytes_total: Total number of bytes read from the network

  • redis_net_output_bytes_total: Total number of bytes sent to the network

  • redis_rejected_connections_total: AchievementmaxclientsTotal number of connections denied due to restrictions

  • redis_total_reads_processed: Total number of read events processed

  • redis_total_writes_processed: Total number of write events processed

  • redis_replica_resyncs_full: Number of replicas for full resync

  • redis_replica_partial_resync_accepted: Number of requests received for partial resynchronization

  • redis_replica_partial_resync_denied: Number of requests for partial resynchronization rejected

  • redis_expired_keys_total: Total number of key expiration events

  • redis_expired_stale_percentage: Percentage of expired keys inside the

  • redis_evicted_keys_total: Total number of keys that reached maxmemory and were evicted

  • evicted_clients: Introduced in Redis 7.0, as a result of reaching themaxmemory-clientsClients evicted due to configuration

  • redis_keyspace_hits_total: Number of keys successfully found in the home directory

  • redis_keyspace_misses_total: Number of keys that could not be successfully found in the home directory

  • redis_latest_fork_usec: time spent in the last fork operation, in microseconds

  • total_forks: Total number of forks

  • acl_access_denied_auth: Total number of authentication failures

replication

  • redis_instance_info: Display redis instance status, such as redis version, mode, role, etc.
  • master_failover_state: Whether failover is in progress

replica

  • redis_master_sync_in_progress: Indicates that the master is synchronizing data to the replica.
  • master_link_status: Link status to master
  • redis_connected_slaves: Number of replica connections

If a SYNC operation is being performed, the following message (Cli) also appears:

  • master_sync_total_bytes: the size of the data master needs to transfer, if the size is unknown, may be 0
  • master_sync_read_bytes: Size of data already transferred
  • master_sync_left_bytes: Data not read before synchronization is complete (if themaster_sync_total_bytesis 0 and may be negative)

The following field (cli) will also appear if the link before master and replica is broken:

  • master_link_down_since_seconds: Time of link disconnection

performance optimization

Benchmark

utilizationredis-benchmarkCalibrating redis performance

Performance Parameters

  • io-threads: redis is mostly single-threaded, but there is a specific operation such as UNLINK, slow I/O access, etc. that can be performed on other threads. Multi-threading is disabled by default. It is recommended that this feature be enabled on machines with more than 4 cores, and it is recommended that this feature be enabled only when you are really experiencing performance problems.This feature can increase redis performance by up to 2x. Since redis takes up a large portion of CPU time, it is recommended that, if the machine has 4 cores, it is recommended that this parameter be set to 2 or 3, and if the machine has 8 cores, it is recommended that this parameter be set to 6.
  • io-threads-do-reads: When I/O threads are enabled, only write operations will be handled by threads, and this parameter can be used to make threads support read operations.
    Note: Normally threaded reads do not provide much of a performance boost. This parameter can no longer be set at runtime via CONFIG SET and does not take effect when SSL is enabled.

Diagnosing delayed problems

checklist

If you are experiencing delays, try the following first:

  1. Viewing Slow Commands with Slog Log
  2. Disable the transparent large page feature:echo never > /sys/kernel/mm/transparent_hugepage/enabled
  3. If a virtual machine is used, there may be a latency factor inherent in using the./redis-cli --intrinsic-latency 100beta (software)
  4. (computing) enable (a feature)Latency monitor functionality

Test the memory latency of the system

When redis is running on a virtual machine, there may be some latency due to the virtual machine. The "100" parameter in the following command indicates the length of the test. You can see that the intrinsic latency is 0.115 milliseconds (or 115 microseconds), which is good.

$ ./redis-cli --intrinsic-latency 100
Max latency so far: 1 microseconds.
Max latency so far: 16 microseconds.
Max latency so far: 50 microseconds.
Max latency so far: 53 microseconds.
Max latency so far: 83 microseconds.
Max latency so far: 115 microseconds.

The Single-Threaded Nature of Redis

Redis uses themost membersis a single-threaded design.It uses a single thread to handle all client requests, i.e., Redis will process a request for a certain amount of time and then process the other requests in turn.

it saysmost membersThe reason for this is that starting with Redis 2.4 it is possible to use multithreading to handle some of the slow I/O operations (mostly related to disk I/O) in the background, but this doesn't change the fact that Redis uses a single thread to handle all requests.

Delays caused by slow commands

Due to the single-threaded nature, if Redis is unable to process a request in a timely manner, other clients will have to wait for the request to finish processing. redis can handle requests likeGETSETLPUSH commands like this, but commands likeSORT, LREM, SUNION Such a command may operate on many elements, resulting in a long operation time.

The slow command can be monitored through [slow log](#Slow log).

critical: A common cause of slow commands is the use of theKEYSCommand.

Delays caused by forks

In order to generate RDB files in the background, or to rewrite AOF files when AOF is enabled, redis needs to fork the background process, and the fork operation (the main thread) causes delays. This can be accomplished with the info command'slatest_fork_usecView the time used by the most recent sequential fork.

Delays caused by large transparent pages

When a Linux kernel is enabled for transparent large pages, redisforkGenerate a large delay after the call (for persistence to disk). Large pages can cause the following problems:

  1. The call to fork creates two processes that share a large page
  2. In a busy instance, running a number of event loops will result in commands associating thousands of pages, resulting in COWing the entire process memory
  3. This leads to higher latency and memory usage
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Delays caused by swaping

The way to view swap information for a redis process is as follows:

  1. Get the redis instance PID:redis-cli info | grep process_id
  2. in-process/procFile System:$ cd /proc/<redis_pid>
  3. View the value of the Swap field in the file:$ cat smaps | grep 'Swap:'.. If all are 0kB or occasionally 4k, all is well.
  4. If the Swap field value is greater than 4k, you need to compare the actual size with the size of the swap:$ cat smaps | egrep '^(Swap|Size)', which is fine if the actual value is much larger than the exchange large value.

Latency due to AOF and disk I/O

AOF uses two system calls to do its job: write(2), which applies writing data to the AOF file, and fdatasync(2), which flushes the kernel file buffer to disk.

This causes write(2) to block if the system is syncing or if the output buffer is full and the kernel needs to flush the disk to receive a new write.

The latency introduced by fdatasync(2) is even more severe, and many kernels and filesystems can take anywhere from milliseconds to seconds when calling this method (especially when other processes are in the middle of a process I/O operation). For this reason, starting with Redis 2.4, fdatasync(2) is called in a different thread.

AOFappendfsyncThe impact of configuration options on performance is as follows:

  • When set tonoWhen it does, redis does not perform fsync operations, and only write(2) introduces latency. This mode is not commonly used.
  • If set toeverysecIf fysnc is being executed, redis will delay the execution of write(2) by buffering it (max 2s). However, if fysnc takes too long to execute, it will cause write(2) to be executed at the same time as fysnc, resulting in delayed
  • When set toalwaysInstead, it performs a fsync after each write operation (before responding ok to the client). This mode is very low performance and is usually recommended for high-speed disks.

The latency of fsync and wirte in redis can be viewed in the following way.

sudo strace -p $(pidof redis-server) -T -e trace=fdatasync,write -f

Since write(2) contains a lot of data that is not related to disk I/O (e.g., data written to client sockets), you can use the following command to show only the slow system calls:

sudo strace -f -p $(pidof redis-server) -T -e trace=fdatasync,write 2>&1 | grep -v '0.0' | grep -v unfinished

Memory for Redis

  • Starting with Redis 2.2, to optimize memory space, redis sets the maximum number of many data types to a fixed value. If the data exceeds the defined maximum, redis converts it to a common encoding. For example:

    #Redis <= 6.2
    hash-max-ziplist-entries 512
    hash-max-ziplist-value 64
    zset-max-ziplist-entries 128 
    zset-max-ziplist-value 64
    set-max-intset-entries 512
    
  • When removing some keys, Redis does not always free this memory to the OS. when configuring memory for redis, you should configure the memory according to thepeak memory, rather than average memory.

  • If you do not set themaxmemoryIf you do not have a redis configuration, then redis will continue to request the memory it sees until all of the memory is taken up by redis. It is therefore recommended to configure themaxmemory

Tips

redis startup failure

If redis startup fails and there are no useful logs, you can directly execute the redis startup command to see the results of the execution:

$ /usr/bin/redis-server /etc/redis/ --daemonize no --supervised systemd 

*** FATAL CONFIG FILE ERROR (Redis 6.0.10) ***
Reading the configuration file, at line 8
>>> 'sentinel myid 922e4ec063ea8d860811f575f08e5ac696073f52'
sentinel directive while not in sentinel mode

Redis Upgrade

To support partial resynchronization, seePartial synchronization under reboot and failover

How to promote a replica to master

Non-sentinel and cluster

  1. First disconnect replicas from master:

    redis-cli -h <replica_host> -p <replica_port> replicaof no one
    
  2. Verify that the replica has been promoted to master. use theinfo replicationcommand to view it:

    redis-cli -h <new_master_host> -p <new_master_port> info replication
    
  3. Execute on each replicareplicaof <new_master_host> <new_master_port>command to connect to the new master:

    redis-cli -h <slave_host> -p <slave_port> replicaof <new_master_host> <new_master_port>
    

AOF file truncation error

If the server crashes while writing the AOF file, it is possible that the last command is truncated so that the following error occurs when redis loads the AOF file:

* Reading RDB preamble from AOF file...
* Reading the remaining AOF tail...
# !!! Warning: short read while loading the AOF file !!!
# !!! Truncating the AOF at offset 439 !!!
# AOF loaded anyway because aof-load-truncated is enabled

You can use the following command to repair and restart redis:

$ redis-check-aof --fix <filename>

Backup and Recovery with RDB

  1. Backup: callBGSAVECreate an RDB backup file byinfo persistencehit the nail on the headrdb_bgsave_in_progresscap (a poem)rdb_last_bgsave_statusferret outBGSAVEpresent state

  2. Recovery:

    1. You need to make sure that the redis server is down before restoring, so that the restoration will be done on the first node (master) that starts up

      redis-cli shutdown
      
    2. Disable the AOF feature of redis in the configuration file

      appendonly no
      
    3. pass (a bill or inspection etc)CONFIG get dirGet the directory where redis stores the backup files. Stop redis, put the backup file in that directory, and change the backup file permissions:

      chmod 660 /home/redis/
      
    4. Restart redis. if AOF is enabled, re-enable the AOF feature

How to enable AOF while using snapshot

Redis >= 2.2

  1. Backup the latestfile
  2. Enable AOF:redis-cli config set appendonly yes
  3. If you want to disable rdb:redis-cli config set save ""
  4. Make sure the write command is appended to the AOF file
  5. critical: Updatesfile that matches the above operation, otherwise restarting redis will result in loss of configuration and hence data.
  6. pass (a bill or inspection etc)INFO persistencecommand to wait for the AOF to finish:aof_rewrite_in_progressrespond in singingaof_rewrite_scheduledis 0 andaof_last_bgrewrite_statusbecause ofokIf all is well, restart the redis service. If everything is fine, restart the redis service
  7. After restarting redis, verify that the database memory matches the previous one

Network isolation in sentinel

For the following scenario (M: master, S: sentinel, R: replica, C:client)

       +----+
       | M1 |
       | S1 |
       +----+
          |
+----+    |    +----+
| R2 |----+----| R3 |
| S2 |         | S3 |
+----+         +----+

Configuration: quorum = 2

If the old master has network isolation, this may lead to the emergence of 2 master nodes, at this time, because the client (C1) is still writing data to M1, when the network is restored, the old master will become the new master's replica, resulting in data loss.

         +----+
         | M1 |
         | S1 | <- C1 (writes will be lost)
         +----+
            |
            /
            /
+------+    |    +----+
| [M2] |----+----| R3 |
| S2   |         | S3 |
+------+         +----+

This can be done by configuring the following parameters in the masteralleviate (pain)The above problem, i.e., if the master is unable to write to at least 1 replica within 10s, it will become unavailable state, and after the network is restored, the client will be able to get a valid configuration. However, this approach means that if two replicas are down, it will result in the master not being able to process the write request

min-replicas-to-write 1
min-replicas-max-lag 10

Redis migration to k8s

redis issingle-threadeds, it tends to use CPUs with larger caches rather than CPUs with more cores. k8s, on the other hand, generally uses a CPU share approach, which takes CPU execution time from multiple CPUs, and thus introduces the problem of process context switching.

Python client