Optimizing System Performance: An In-Depth Look at the Challenges and Countermeasures of Web Tier Caching and Redis Applications

Web tier caching is critical to improving application performance by speeding up response times by reducing repetitive data processing and database queries. For example, if a user requests data that has been cached, the server can return results directly from the cache, avoiding complex calculations or database queries for each request. This not only improves the response time of the application, but also reduces the burden on the back-end system.

Redis is a popular in-memory data structure storage system commonly used to implement an efficient caching layer. It supports a variety of data structures such as strings, hashes, lists, collections, etc., enabling rapid access to data. By caching commonly used data into Redis, applications can drastically reduce the database burden while improving the user experience.

Caching Issues Explained

In this chapter, instead of delving into the basic caching mechanisms of Redis, we will focus on how to guard against the unnecessary losses that can be caused by Redis failures. We will discuss in detail the causes of cache penetration, cache hit, and cache avalanche problems and their resolution strategies. Let's get started with a deeper dive into these.

cache passthrough

Cache penetration refers to a situation where a query for a non-existent piece of data fails to get a hit in both the cache and storage tiers. This is usually done for fault tolerance reasons, and if the storage layer fails to find the data, the system usually does not write it to the cache layer. As a result, every time a request is made for non-existent data, the system needs to access the storage layer directly to query it, thus defeating the purpose of the cache to protect the back-end storage. This not only increases the burden on the storage layer, but also reduces the overall performance of the system.

There are two main underlying causes of cache penetration:

Own business code or data issues: This type of problem usually stems from flaws in business logic or data inconsistencies. For example, if the business code fails to handle certain data queries correctly, or if the data source itself is flawed (e.g., missing data, data errors, etc.), it may result in the requested query consistently failing to find the corresponding data in the cache or storage tier. In this case, the cache layer is unable to store and return query results efficiently, resulting in the need to access the storage layer directly for each request.
Malicious attacks or crawling behavior: Malicious attackers or automated crawlers may launch a large number of requests trying to query a large amount of non-existent data. As these requests keep hitting the cache and storage layers, resulting in a large number of null hits (i.e., the query result is always null), which not only consumes a large number of system resources, but also may lead to a significant increase in the pressure on the cache and storage layers, thus affecting the overall performance and stability of the system.

Solution - Cache Empty Objects

One of the effective solutions to cache penetration is to cache null objects. This approach involves storing "empty" tokens or objects in the cache layer to indicate that specific data does not exist. In this way, when subsequent requests query for the same data, the system can fetch the "empty object" directly from the cache layer without having to revisit the storage layer. This not only reduces frequent accesses to the storage layer, but also improves the overall performance and responsiveness of the system, thus effectively alleviating the cache penetration problem.

String get(String key) {
    // Get the data from the cache
    String cacheValue = (key);

    // Cache hit
    if (cacheValue ! = null) {
        return cacheValue; }
    }

    // Cache miss, get data from storage
    String storageValue = (key);

    // If storage is empty, set cache and set expiration time
    if (storageValue == null) {
        (key, ""); // Store the empty object tag
        (key, 60 * 5); // set expiration time (300 seconds)
    } else {
        // Cache the data if it exists in storage
        (key, storageValue); }
    }

    return storageValue; }
}

Solution - Bloom Filter

For cache-penetration problems caused by requesting large amounts of non-existent data in malicious attacks, a Bloom filter can be used for initial filtering. A Bloom filter is a spatially efficient probabilistic data structure that is effective in determining whether an element is likely to exist in a collection. Specifically, when the Bloom filter indicates that a value may exist, the actual situation may be that the value exists, or it may be a misjudgment of the Bloom filter; however, when the Bloom filter indicates that a value does not exist, it is certain that the value does not exist.

Bloom filters are efficient probabilistic data structures consisting of a large bit array and multiple independent unbiased hash functions. Unbiased hash functions are characterized by their ability to evenly distribute the hash values of input elements into the bit array, reducing hash conflicts. When adding a key to a Bloom filter, the key is first hashed using these hash functions, each of which generates an integer index value. These index values are then modulo the length of the byte array to determine specific locations in the byte array. Next, the values at these locations are set to 1, marking the presence of the key.

When querying a Bloom filter for the existence of a key, the process is similar to adding a key. First, multiple hash functions are used to hash the key to obtain multiple positional indexes. Then, the bit array positions corresponding to these indexes are checked. If the value of all relevant positions is 1, then it can be presumed that the key may exist; otherwise, if any of the positions has a value of 0, it can be determined that the key must not exist. It is worth noting that even if the value of all the relevant positions are 1, it only means that the key "may" exist, but can not be absolutely confirmed, because these positions may have been set to 1 by other keys. by adjusting the size of the byte array and the number of hash functions, you can optimize the performance of the Bloom filter, to achieve a better balance between accuracy and efficiency. Balance.

This approach is particularly suitable for application scenarios with low data hit rates, relatively fixed datasets, and low real-time requirements, especially when the dataset is large, and Bloom filters can significantly reduce cache space usage. Although the implementation of Bloom filters may increase the complexity of code maintenance, the advantages of memory efficiency and query speed it brings are usually worth the investment.

The effectiveness of Bloom filters in such scenarios is due to their ability to handle large-scale datasets while taking up only a small amount of memory space. To implement Bloom filters, you can use Redisson, a Java client that supports distributed Bloom filters. To introduce Redisson in your project, you can add the following dependencies:

<dependency>
    <groupId> </groupId>
    <artifactId>redisson</artifactId>
    <version>3.16.2</version> <! -- Please select the appropriate version as needed -->
</dependency>

Example pseudo-code:

package ;

import ;
import ;
import ;
import ;

public class RedissonBloomFilter {

    public static void main(String[] args) {
        // Configure the Redisson client to connect to the Redis server.
        Config config = new Config();
        ().setAddress("redis://localhost:6379");

        // Create the Redisson client
        RedissonClient redisson = (config);

        // Get the Bloom Filter instance with the name "nameList"
        RBloomFilter<String> bloomFilter = ("nameList");

        // Initialize the bloom filter with an expected number of elements of 100,000,000 and an error rate of 3%
        (100_000_000L, 0.03); // Initialize the Bloom filter with an expected number of elements of 100,000,000 and an error rate of 3%.

        // Insert the element "zhuge" into the Bloom filter
        ("xiaoyu");

        // Query the Bloom filter to check if the element exists
        ("Contains 'huahua': " + ("huahua")); // should be false
        ("Contains 'lin': " + ("lin")); // should be false
        ("Contains 'xiaoyu': " + ("xiaoyu")); // should be true

        // Close the Redisson client
        ().
    }
}

When using a Bloom filter, it is first necessary to insert all the expected data elements into the Bloom filter in advance so that it can efficiently detect the existence of the elements through its bit array structure and hash function. The Bloom filter must also be updated in real time as data insertions are made to ensure its data accuracy.

The following pseudo-code example of Bloom Filter cache filtering shows how to manipulate the Bloom Filter during initialization and data addition:

// Initialize the bloom filter
RBloomFilter<String> bloomFilter = ("nameList");

// Set the desired number of elements and error rate for the bloom filter
(100_000_000L, 0.03); // set the desired number of elements and error rate for the bloom filter.

// Insert all data into the Bloom filter
void init(List<String> keys) {
    for (String key : keys) {
        (key); // Insert all data into the Bloom filter.
    }
}

// Get the data from the cache
String get(String key) {
    // Check if key exists in the Bloom filter
    if (! (key)) {
        return ""; // if not present in the Bloom filter, return the empty string
    }

    // Get the data from the cache
    String cacheValue = (key);

    // If the cache value is empty, fetch from storage
    if ((cacheValue)) {
        String storageValue = (key); // If the cache value is empty, get it from storage.
        if (storageValue ! = null) {
            (key, storageValue); // store non-null data to cache
        } else {
            (key, 300); // set the expiration time to 300 seconds
        }
        return storageValue; } else { (key, 300); // Set the expiration time to 300 seconds }
    } else {
        // If the cache value is not null, return it
        return cacheValue; }
    }
}

Note: Bloom filters cannot delete data, if you want to delete you have to reinitialize the data.

Cache failure (breakdown)

Since a large number of cache failures at the same time may result in a large number of requests penetrating the cache at the same time and accessing the database directly, this situation may cause the database to be instantly under excessive pressure and may even trigger a database crash.

Solution - Randomized expiration time

To alleviate this problem, we can adopt a strategy: when adding caches in bulk, set the cache expiration time of this batch of data to a different time within a time period. Specifically, a different expiration time can be set for each cache item, which prevents all cache items from expiring at the same moment, thus reducing the impact of transient requests on the database.

Here is the specific sample pseudo-code:

String get(String key) {
    // Get the data from the cache
    String cacheValue = (key);

    // If the cache is empty
    if ((cacheValue)) {
        // Get the data from storage
        String storageValue = (key); // if the cache is empty if ((cacheValue)) { // get the data from storage

        // If the data in storage exists
        if (storageValue ! = null) {
            (key, storageValue);
            // Set an expiration time (a random value between 300 and 600 seconds)
            int expireTime = 300 + new Random().nextInt(301); // Random range: 300 to 600
            (key, expireTime);
        } else {
            // Set the default expiration time for the cache when there is no data in storage (300 seconds)
            (key, 300); }
        }
        return storageValue;
    } else {
        // Return the data in the cache
        return cacheValue; } else { // Return the data in the cache.
    }
}

Cache Avalanche

Cache avalanche refers to the phenomenon of overloading or downtime of the storage layer caused by a large number of requests pouring directly to the back-end storage layer in case of cache layer failure or overload. Usually, the role of the cache layer is to effectively carry and share the request traffic and protect the back-end storage layer from the pressure of highly concurrent requests.

However, when the cache tier is unable to continue to provide services for some reasons, such as experiencing mega-concurrency impacts or poor cache design (e.g., accessing a very large cache entry bigkey causes the cache performance to degrade drastically), a large number of requests will be forwarded to the storage tier. At this point, the amount of requests to the storage tier will increase dramatically, which may cause the storage tier to become overloaded or down, leading to system-level failures. This phenomenon is called "cache avalanche".

prescription

In order to effectively prevent and solve the cache avalanche problem, you can start from the following three aspects:

Ensure high availability of cache layer services：
Ensuring high availability of the cache tier is a critical measure to avoid cache avalanches. Cache availability can be achieved using tools such as Redis Sentinel or Redis Cluster, which provides automated failover and monitoring to automatically promote a slave node to a new master node in the event of a problem with the master node, thus maintaining service continuity. Redis Cluster further improves system availability and scalability through data sharding and node-to-node replication. The Redis Cluster further improves system availability and scalability through data sharding and replication between nodes, so that even if some nodes fail, the system can still operate normally and continue to process requests.
Relies on isolated components for current limiting, fusing and degradation：
Utilizing flow-limiting and fusion mechanisms to protect back-end services from bursty requests can effectively alleviate the stress caused by cache avalanches. For example, use flow-limiting and fusion components such as Sentinel or Hystrix to implement flow control and service degradation. Different processing strategies can be adopted for different types of data:
- Non-core data: For example, product attributes or user information in an e-commerce platform. If this data in the cache is missing, the application can return predefined default degradation messages, null values, or error alerts directly instead of querying the back-end storage directly. This approach reduces the pressure on the back-end storage while providing some basic feedback to the user.
- Core data: For example, the inventory of goods in an e-commerce platform. For these critical data, you can still try to query from the cache, or read from the database if the cache is missing. In this way, even if the cache is unavailable, the reading of core data can still be guaranteed, avoiding the loss of system functionality due to cache avalanche.
Advance rehearsal and plan development：
Before the project goes live, conduct sufficient rehearsals and tests to simulate the application and back-end load after the cache layer goes down, identify potential problems and formulate appropriate plans. This includes simulating cache failures, back-end service overloads, etc., observing system performance, and adjusting system configurations and strategies based on test results. Through these exercises, system weaknesses can be identified and appropriate contingency measures can be formulated to cope with unexpected situations in the actual production environment. This not only improves the robustness of the system, but also ensures that the system can quickly return to normal operation when a cache avalanche occurs.

By using a combination of these measures, the risks associated with cache avalanches can be significantly reduced and system stability and performance can be improved.

summarize

Web tier caching significantly improves application performance and speeds up response time by reducing repetitive data processing and database queries.Redis, as an efficient in-memory data structure storage system, plays an important role in realizing the caching tier, which supports a wide variety of data structures and enables rapid access to data, thus reducing the burden on databases and improving the user experience.

However, caching mechanisms also face challenges such as cache-penetration, cache-strike-through, and cache-avalanche. Cache penetration is addressed by caching null objects, which avoids accessing the database for every query, and Bloom filters, which effectively minimize the impact of malicious requests. Cache puncture is mitigated by setting a random expiration time, which prevents a large number of requests from flooding the database at the same time. For cache avalanches, ensuring high availability of the cache layer, employing flow-limiting and melting mechanisms, and making adequate contingency plans are key.

Effective cache management not only improves system performance, but also enhances system stability. Understanding and solving these caching problems can ensure that the system maintains efficient and stable operation in a highly concurrent environment. Careful design and implementation of caching policies is the basis for optimizing application performance. Continuously focusing on and adjusting these policies can help the system cope with various challenges and maintain a good user experience.

I'm Rain, a Java server-side coder, studying the mysteries of AI technology. I love technical communication and sharing, and I'm passionate about the open source community. I am also an excellent author of Nuggets, a content co-creator of Tencent Cloud, an expert blogger of Ali Cloud, and an expert of Huawei Cloud.

💡 I won't be shy about sharing my personal explorations and experiences on the path of technology, in the hope that I can bring some inspiration and help to your learning and growth.

🌟 Welcome to the effortless drizzle! 🌟