In-depth analysis and solution: cache and database double write inconsistency problem

We explored common problems with Redis last time, and this chapter will dive into analyzing the finer details, such as how to effectively deal with double-write inconsistencies between the cache and the database from a business perspective. Next, let's dive into this topic.

key reconstruction optimization

Developers often use a "cache + expiration time" strategy to speed up data reads and writes, but also to ensure that data is updated regularly. This model basically meets most of the requirements. However, when the following two problems occur together, it can have a serious impact on the application:

The emergence of hot key: The current key is a hot key, such as a popular entertainment news, resulting in a very large number of concurrent requests. This situation will cause the cache read requests to concentrate on this hot key, resulting in a significant increase in the pressure on the cache.
The Complexity of Cache Reconstruction: When the cache is invalidated, the process of rebuilding the cache cannot be completed in a short period of time. Rebuilding the cache may involve complex computational tasks, such as performing complex SQL queries, multiple I/O operations, and handling multiple data dependencies. This complex rebuild process may lead to system performance degradation, which in turn affects the user experience.

At the moment of cache failure, if a large number of threads start cache rebuild operations at the same time, it will lead to a sharp increase in the backend load and may even crash the application. This situation can significantly affect the stability and performance of the system. To solve this problem, the key is to avoid a large number of threads performing cache rebuilds at the same time.

An effective solution is to use a mutually exclusive locking mechanism, which ensures that only one thread is allowed to perform a cache rebuild operation at any given moment. The other threads will have to wait for the rebuild thread to complete the cache rebuild before they can re-fetch the data from the cache. This strategy not only reduces the stress on the back-end system, but also avoids performance bottlenecks caused by concurrent rebuilds and significantly improves system stability and responsiveness.

Example pseudo-code:

String get(String key) {

    // Get the data from Redis
    String value = (key);

    // If value is null, start refactoring the cache
    if (value == null) {

        // Generate a unique mutex key to ensure only one thread can rebuild the cache
        String mutexKey = "mutex:key:" + key;

        // Try to set the mutex key, using NX (set only if it doesn't exist) and EX (set expiration time)
        boolean isMutexSet = (mutexKey, "1", "ex 180", "nx");

        if (isMutexSet) {
            try {
                // Get the data from the data source
                value = (key);

                // Write the data back to Redis and set the timeout.
                (key, timeout, value);

            } finally {
                // Delete the mutex key to ensure that other threads can continue to rebuild the cache.
                (mutexKey); }
            }

        } else {
            // Other threads wait 50 milliseconds and retry.
            (50); value = get(key); } else {
            value = get(key);
        }
    }

    return value; }
}

Cache and database double write inconsistency

In highly concurrent scenarios, simultaneous database and cache operations may trigger data inconsistency problems. Specifically, when multiple threads or processes try to update the cache and the database at the same time, it may lead to data mismatch between the cache and the database.

Bi-lingual inconsistencies

The following problems may occur when multiple threads or processes are performing cache and database updates at the same time:

Inconsistency between cache and database data: For example, two threads update the database at the same time, but only one thread updates the cache, which can lead to inconsistency between the data in the cache and the data in the database.
Latency issues: Even if operations are performed when updating both the cache and the database, the state between the cache and the database may not be synchronized due to network latency or other factors.

Read-write concurrency inconsistency

Read-write concurrency inconsistency is a concurrency scenario where multiple threads or processes perform read and write operations on the same data, which may result in data inconsistencies or errors.

Here are some common solutions for inconsistent read and write concurrency:

For data with a low chance of concurrency：
- For individual dimension order data, user data, etc., there are fewer concurrent operations and the requirement for data consistency is relatively loose. For this kind of data, the data inconsistency between the cache and the database can be solved by setting the expiration time of the cache. The specific approach is to set a reasonable expiration time in the cache, and the cached data will automatically expire after the expiration time. Whenever the cache expires, the system will automatically read the latest data from the database and update the cache. This strategy is simple and effective and can greatly reduce the chance of cache inconsistency.
Cache Data Consistency in High Concurrency Scenarios：
- Even in business scenarios with high concurrency, most business requirements can still be met by setting the cache expiration time if inconsistencies in cached data (e.g., product names, product category menus, etc.) can be tolerated for a short period of time. By setting the expiration time reasonably, although the cached data may be inconsistent for a short period of time, this inconsistency usually does not have a serious impact on the business. Therefore, the cache expiration policy is still an effective solution.
For scenarios where cached data inconsistency cannot be tolerated：
- If the business has strict requirements on the consistency of cached data, distributed read and write locks can be used to ensure the sequentiality of concurrent read and write operations. This is done by using a distributed locking mechanism to ensure that only one operation can be executed when a write operation is performed, thus avoiding write-write conflicts. As for read operations, they can usually be performed without locking to improve performance. Distributed locking can effectively control concurrent write operations and ensure data consistency, although it may have some impact on system performance.
Introduction of middleware to maintain data consistency：
- You can use Ali's open source Canal tool to update the cache in a timely manner by listening to the database binlog log. This approach automatically updates the cache when data changes, thus reducing consistency issues between the cache and the database. However, introducing Canal or similar middleware adds complexity to the system, so the additional complexity it introduces needs to be weighed against the enhancement to system consistency. When using this option, the maintenance, configuration, and potential performance impact of the middleware should be considered to ensure system stability and reliability.

summarize

The above solution mainly targets the scenario of many reads and few writes by introducing cache to improve performance. However, for the case where there are many writes and many reads and inconsistency of cached data cannot be tolerated, we need to reconsider the strategy of cache usage. The following are optimization suggestions for this situation:

Avoiding the use of caching：
- In scenarios with frequent write operations and more read operations, if the business has very high requirements for data consistency, using cache may not be the best choice. At this point, direct operation of the database can avoid inconsistencies between the cached data and the database data, because all data operations are performed directly in the database, thus ensuring data consistency and accuracy.
Database as primary storage：
- If the database is under high load pressure but still needs to handle a large number of read and write operations, consider using the cache as the primary storage for the data and the database as a backup. This is done by writing all read and write operations to the cache first, and the cache will asynchronously synchronize the data to the database. In this way, the cache can provide fast response in highly concurrent read and write operations, while the database is used for long-term data storage and backup. This strategy improves the read and write performance of the system while maintaining the data integrity of the database.
Data types to which the cache applies：
- Use caching for data that does not have particularly high real-time and consistency requirements. For example, data such as product categorization information and system configuration can be cached because these data change less frequently and do not have very high consistency requirements. Caching can significantly improve access speed, but in the case of data inconsistency, it has less impact on the business. Avoid using caching for business-critical data with high consistency requirements to minimize complexity and risks arising from caching.
Avoid over-design：
- When designing a caching system, it is important to avoid over-design and complex control to ensure absolute consistency. This over-design will not only increase the complexity of the system, but may also affect the performance of the system. Caching strategies should be selected reasonably according to actual business requirements, balancing performance and consistency requirements, and avoiding unnecessary complexity and resource waste.

In summary, the choice of whether to use caching and its design needs to be weighed against business scenarios and data consistency requirements. Caching should mainly be used to improve the performance of read operations, while for scenarios with many writes and reads and high consistency requirements, it may be necessary to rely on the capabilities of the database itself or use other strategies to deal with data consistency issues.

I'm Rain, a Java server-side coder, studying the mysteries of AI technology. I love technical communication and sharing, and I'm passionate about the open source community. I am also an excellent author of Nuggets, a Tencent Cloud Creative Star, an expert blogger of Ali Cloud, and an expert of Huawei Cloud.

💡 I won't be shy about sharing my personal explorations and experiences on the path of technology, in the hope that I can bring some inspiration and help to your learning and growth.

🌟 Welcome to the effortless drizzle! 🌟