Location>code7788 >text

System design: elimination of slow interfaces

Popularity:276 ℃/2024-08-04 15:24:36

1. Introduction

The response time of some interfaces is significantly slower or even times out. This part of the interface to the entire system on the overall throughput rate and availability will have an impact, of course, will also affect the user experience.

Targeted optimization needs to be done for core interfaces with high traffic access, for example:

  1. Asynchronous processing, or add concurrent processing to avoid synchronous blocking

  2. If frequent accesses to the database are considered, add caching

  3. Batch access to avoid network overhead caused by for loop calls to the database

  4. Avoiding interfaces that return too much data at once

Error interface part of this nothing to say, belongs to the hard and fast rules, eliminate it, if there are interfaces with error rates higher than 0.1% or frequent error logs printed that must belong to the program level problems.

2. Generic

Observe interface throughput, time consumption, error rate. Acquisition can be done through various middleware, buried points.

3. Slow interface optimization

3.1 Downstream issues/network jitter

Slow downstream interfaces result.

3.2 Coding issues

  • Are there invalid fields being populated? I.e., some fields are not needed by the process and there is extra overhead when querying
  • Is there an amplification call?
  • Is it possible to optimize it for batch queries and batch fills?
  • Is there a slow SQL?

3.3 Common approaches to asynchronous processing

First look at the scenarios that need to be considered for use:

  • Programming Interface Ease of Use
  • Execution environment: single JVM or cluster?
  • Performance and stability, persistent or not: processing progress may be lost if the machine suddenly fails/restarts. Need to support idempotency if you want to be able to recover or redo it.
  • How to get the result of asynchronous execution.

List some common ways of asynchronization

typology connector Execution environment Persistent or not Performance and Stability note
Java Threading Model native support single JVM self-control stand-alone
Java Concurrency Utility Class (JUC) native support single JVM self-control stand-alone
Spring Thread Pool The API is simple and can be annotated single JVM self-control stand-alone Using annotations that do not specify a thread pool may confuse the
Eventbus(Guava) API simplicity, event modeling single JVM Interruptions cannot be recovered and cluster scheduling is not supported stand-alone Need to be aware of cross-bundle dependencies for event classes
redis queue Need to access Redis, generate external dependencies, and write access code for event delivery and consumption single JVM Interruptions cannot be recovered and cluster scheduling is not supported high performance Possible single point of failure, requiring high availability design
mircotask Feature-rich, including monitoring, etc., with some access costs clustering to be supplemented to be supplemented
Implementing an Asynchronous Event Model with MQ Event model, need to access the MQ middleware, generate external dependencies, need to write their own event processing framework Cluster deployable with cross-application processing May need to be persisted to be supplemented
Implementing an Asynchronous Event Model with Timed Task Middleware Event model, need to access the timed task middleware, generate external dependencies, need to write their own event processing framework Cluster deployable with cross-application processing Requires persistence Controlled execution speed

3.4 Caching

"Everything in sight is cached. "

Explanation: For any thing, when an observer goes to observe it, because it takes time for the signal to propagate, the signal received by the observer will always be the one it sent in the past. And while this signal is propagating, the thing being observed may have changed. The signal received by the observer can be viewed as a snapshot of its past, and all judgments made by the observer based on this signal can be thought of as caching the snapshot.

3.4.1 Cache Selection

  • Proximal Cache
  • Remote Cache
Cache Selection an actual example application scenario access cost limitation
JVM Cache HashMap、BloomFilter、WeakReference、SoftReference broader easy realization The standalone machine needs to be warmed up and the JVM memory is limited to approximately
distributed cache Redis、Memcache broader Introducing additional dependencies Reliability and degradation strategies need to be considered
browser cache Use client resources to conserve server resources restrictive Only part of the experience can be solved, not controllable for back-end developers
CDN Caching - restrictive Significantly optimized for large object access speeds, with additional cloud infrastructure costs

3.4.2 Caching FAQs

(by chatgpt)

Consistency problem: The problem of data inconsistency between the cache and the database, especially in the case of frequent data updates under high concurrency. Need to use appropriate cache update strategy and cache invalidation mechanism to ensure data consistency.
Penetration problem: refers to a large number of requests directly accessing the database, resulting in the cache not being able to play its proper role. The reason for cache penetration may be that there is no data in the cache that needs to be accessed, and mechanisms such as Bloom filters need to be used to prevent cache penetration.
Avalanche problem: A large amount of data in the cache is invalidated at the same time, resulting in requests directly accessing the database, thus causing excessive pressure on the database. Distributed locking, flow limiting and other mechanisms need to be used to prevent cache avalanche.
Memory Leakage Problems: If the data in the cache is not accessed all the time or the cache is not cleared, it may lead to memory leakage problems. A suitable cache elimination strategy is required to avoid memory leakage problems.
Capacity issues: You need to set the appropriate cache capacity according to the actual business requirements and hardware resources.

3.4.3 Scenarios where caching is not appropriate

(generated by chatgpt)

  • High data real-time requirements: If the data real-time requirements are high and need to be updated in a timely manner, then caching cannot be used. Examples include online payment systems, stock trading systems, etc.
  • Cache update cost is high: there are some data update frequency is very high, and each update needs to pay a high price, in this case the use of cache instead of reducing performance, because the cache needs to be updated in a timely manner, the high cost of updating will lead to a lower cache hit rate. For example, live video system, game ranking system.
  • High business complexity: There are some businesses that involve interactions between multiple systems with high business complexity, plus the cache also needs to consider the cache consistency and update strategy, which in turn will increase system complexity. For example, distributed transaction systems, complex financial transaction systems and so on.
  • Low access: For applications with low access, the use of caching does not significantly improve performance, but adds additional system complexity and development costs. Examples include internal management systems, small portals, etc.

During the design phase, be sure to think ahead about how you will use caching.

3.5 Avoid returning too much data at once

Adverse effects:

Elevated JVM memory footprint when constructing return requests
Long interface response times lead to elevated RTs
network bandwidth usage
May exceed the limitations of the browser, server configuration (HTTP protocol itself does not limit the size)

4. Elimination of erroneous interfaces

Error rate statistics caliber: the interface return value is not 2XX.

Therefore, you only need to handle the exception within the application, and return HttpStatus=200 externally, and the code, suitess in the Result class will be returned according to the actual situation, and will not be counted in the error rate.

Special case: some SEO specifications, can not find the data interface to be processed to return 404.