herald
Original article, reprinted with attribution./boycelee/p/18055933
The Dockworker's Thousand and One Nights is a blogger who specializes in sharing technical dry goods. Following the blogger's posts, you will gain insights into the latest technology trends in the industry, as well as share practical experiences in the field of Java development and security. Whether you're a developer or a hobbyist interested in reverse engineering, you'll find valuable knowledge and insights at The Dockworker's Thousand and One Nights.
Configuration Center Article Series
[Architect's Perspective Series] Reflections on Configuration Center Design under Wind Control Scenarios/boycelee/p/18355942
Architect's Perspective Series: Architectural Design of Apollo Configuration Center (I)/boycelee/p/17967590
Architect's Perspective Series] Apollo Configuration Center's Client Side (II)/boycelee/p/17978027
Architect's Perspective Series] Apollo Configuration Center Server (ConfigSevice) (III)/boycelee/p/18005318
Architect's Perspective Series] QConfig Configuration Center Series of Architectural Design (I)/boycelee/p/18013653
Architect's Perspective Series] QConfig Configuration Center Series Client Side (2)/boycelee/p/18033286
Architect's Perspective Series] QConfig Configuration Center Series Server Side (III)/boycelee/p/18055933
1. What is a configuration center?
Configuration center is an important component of a distributed system, its purpose is to centralize the management and distribution of configuration files needed for services. Its purpose is to simplify and unify the management of configurations in a distributed system.
1.1. Characteristics
1.1.1 Centralized management
Centralized storage and management of distributed system configuration files, and a unified management platform interface for operation and maintenance personnel to view and modify the configuration. Avoid the problem of data inconsistency due to the inconsistency of data configuration locations.
1.1.2 Dynamic updating
The Configuration Center supports dynamic configuration updates. When the configuration changes, the Configuration Center immediately pushes the updated configuration to all subscribed service instances without restarting the service instances.
2、Why do we need a configuration center?
Risk control scenarios usually require frequent modification of the policy for attack and defense confrontation, generally the policy management platform and the policy execution engine are two services, the purpose is to decouple, so that changes in business requirements to minimize the impact of the policy execution engine execution. Usually, the policy engine obtains the policy configuration in the following ways: shared storage, remote invocation or configuration center.
2.1 Program strengths and weaknesses
2.1.1, shared storage (e.g. shared Redis)
Pros:
- Efficient access: Shared storage can provide fast data access speeds, especially with low local or network latency.
- simple realization: Storing policy data directly in shared storage such as Redis simplifies the implementation of policy fetching.
Drawbacks:
- Data Size Limit: The size of the stored values needs to be controlled to avoid the big-key problem. For example, if the stored policy is configured to be too large, it may lead to performance degradation of Redis and increase the complexity of management and maintenance.
- coupling problem: The coupling between the policy management platform and the policy execution engine is high. If the add, delete, check, and modify operations of the policy management platform are not considered for performance optimization, the stability and efficiency of the policy execution engine may be affected. The policy execution engine may become abnormal due to operational delays or performance problems of the management platform.
2.1.2. Remote calls (e.g. HTTP calls)
Pros:
- decoupled: The policy management platform and the policy enforcement engine interact through remote calls, reducing the direct dependency between them. Instead of directly accessing the shared storage, the policy enforcement engine obtains policy data through APIs, which improves the flexibility and scalability of the system.
- topicality: Real-time data acquisition and policy updates can be realized, and policy data can be quickly delivered to the policy enforcement engine as long as the network is stable.
Drawbacks:
- Stability dependence: Remote calls are dependent on the stability of the configuration management service. The Configuration Management Service must remain highly available and stable, otherwise the performance of the policy enforcement engine will be affected. If the configuration management service is unstable or there are network problems, policy data fetching may fail or be delayed.
- network latency: Remote calls involve network transmissions, which may introduce some latency, especially in the case of poor network conditions. This may pose a challenge for scenarios that require a quick response.
2.1.3 Configuration Center
Pros:
-
unified management: Centralizing all configuration data simplifies the process of maintaining and managing configurations. Administrators can modify and manage all configurations in one place.
-
dynamic update (Internet): It supports real-time updating of configurations, and changes in policies can take effect instantly without restarting the service, which improves the flexibility and adaptability of the system.
-
caching mechanism: Many configuration centers implement caching mechanisms to improve read speeds and response times and to reduce the strain on back-end storage.
-
distributed architecture: With a distributed architecture, the configuration center is able to handle large amounts of configuration data and provide high availability and fault tolerance.
-
System decoupling: The Configuration Center decouples configuration management from application logic, and application services do not require direct access to databases or shared storage, thus reducing the coupling between systems.
Drawbacks:
-
single point of failure (SPF): The Configuration Center can become a single point of failure in the system, and if the Configuration Center goes down, it can affect all the services that depend on it. High-availability and fault-tolerant designs are needed to minimize this risk.
-
performance bottleneck: In the case of frequent configuration requests or very large amounts of configuration data, the configuration center can become a performance bottleneck and needs to be optimized and scaled.
-
O&M costs: High availability, distributed architectures, and large-scale data storage can all increase the O&M costs of a configuration center, especially in large-scale systems.
3. How to design a configuration center? (How to do it)
3.1. Architecture diagram
3.2 Architectural layering
Classified into client layer, network layer, service layer, and data layer
3.2.1 Client layer
-
Client
Provides real-time configuration acquisition and update
-
Admin
Provides a web interface for managing configurations
3.2.2 Network layer
-
Nginx
Traffic distribution, with a list of backend servers configured via a simple configuration file. Of course, it is also possible to use registries such as Zookeeper, Eureka, etc., to dynamically register and discover services.
3.2.3 Service layer
-
Config Service
Provides a configuration acquisition interface
-
Admin Service
Provide configuration management interface (configuration update, configuration release)
3.2.4. Data layer
-
Config DB
Storing configuration-related data in the configuration center
-
Admin DB
Storing Configuration Center non-configuration related data
3.3. Timing diagrams
There are three main components involved, which areboot sector、push segmentas well asConfiguration synchronization section。
3.3.1 Start-up section
- Step 1: Start configuration warmup. When the Configuration Center is started, a warm-up is performed to load zero-version data into the local cache.
-
Step 2:
checkUpdateAndLoadData
. The client carries a space ID (namespace) to access the Configuration Center backend. - Step 3: Returns the latest version number and configuration data. The Configuration Center returns the latest version number cached during warm-up and the configuration data under that space.
3.3.2. Push component
-
Step 4:
checkUpdate
(long polling). The client sends the namespace ID and local version number to the Configuration Center and maintains a long poll to listen for state changes in the space. - Step 5: Detect the latest version number (polling + real-time). Polling or querying the local cache through a timed task to detect whether the version number in the current request is the same as the latest version in the configuration center.
- Step 6: Version number mismatch (return immediately). If the version number of the client does not match the version number of the configuration center, the version number of the latest configuration is returned directly.
- Step 7: Time-wheel push. In order to avoid the "herd effect", when a configuration change occurs, the time wheel mechanism is used to push the update to the listening clients in batches.
- Step 8: Version number match (wait). If the version number of the client matches the version number of the configuration center, the client and the configuration center continue to maintain a long polling listen.
-
Step 9:
loadData
. When the client discovers thecheckUpdate
If the returned version number does not match the local version, call theloadData
interface with the local version number and the latest version number to get the latest configuration data. - Step 10: Interval version number configuration. The server uses the local version number and the latest version number provided by the client to query the cache or database to return configuration information for that version interval.
3.3.3 Configuration synchronization section
- Step 11: Adding, modifying, or publishing configurations is done through the Configuration Center Manager interface.
- Step 12: Synchronization notification to synchronize configuration changes of third-party applications to the Configuration Center.
- Step 13: Synchronization Status, Configuration Center Response Status and provide the corresponding Configuration ID.
- Step 14: Configuration Release, submits the ID returned by the Configuration Center to the Configuration Center to complete the configuration release.
- Step 15: The result of the release, the Configuration Center returns information about the result of the configuration release.
3.4. Design Difficulties
3.4.1 Configuration center startup
Scene Description
The startup is stressed. Validation target is 3 namsapce, each 100,000 data volume, single data 2KB, total 600M
Description of the problem
If, within 1 second, 20 configuration center instances simultaneously request the database for the full amount of configuration, each requesting 200M of data, such a highly concurrent request will inevitably lead to database overload or even downtime.
Program description
-
preheating phase: After the Configuration Center is started, it goes through a warm-up period. The length of this period depends on the number of instances deployed in the configuration center. To avoid excessive database query pressure, the configuration center queues the instances to query the database in order to obtain the required configuration data through a distributed locking mechanism.
-
caching mechanism: The Configuration Center is designed with two caching mechanisms: a zero-version cache and an incremental cache. The zero-version cache is used to store the full amount of configuration data, while the incremental cache stores incremental data. The reason for this design will be explained in detail in the subsequent content.
3.4.2. Three-way application launching
Scene Description
1000 three-way applications start simultaneously and pull the full amount of configuration data within 1s, and a single unit pulls 200m of data.
Description of the problem
If 1,000 third-party applications are launched at the same time and each pulls the full 200M of configuration data in 1 second, such highly concurrent requests will put a huge strain on the system.
Program description
-
local cache: The Configuration Center flushes the full configuration data to the "zero-version cache" at regular intervals or on startup, and actively flushes the "incremental cache" each time a configuration change is made. This is designed to ensure that configuration pull requests do not directly hit the database, thus reducing the burden on the database.
-
Cache invalidation: In extreme cases, if the local cache fails, the system ensures that only one thread per Configuration Center instance is able to access the database during the same time period to avoid overstressing the database by receiving a large number of requests at the same time.
3.4.3 Configuration release
Scene Description
When a new configuration is released, 2000 third-party application instances listen for configuration changes, and measures need to be taken to avoid a 'surprise herd effect'.
Description of the problem
If all third-party application instances listening for configurations are notified at the same time and all applications access the Configuration Center, this may trigger resource contention, which can lead to a degradation of the Configuration Center's performance.
Program description
-
time scan: The Configuration Center instance scans the database every 30 seconds to check for new configuration updates. When the latest version number is found to be inconsistent with the registered Client version, a push is triggered.
-
Configuration Push: A simple time-wheel mechanism is implemented to push 5 third-party application instances every 100 milliseconds, enabling 2,000 application instances to be pushed in 40 seconds for a second response.
3.4.4 Cache design
Scene Description
Full data needs to be pulled at startup (version zero) and incremental data needs to be pulled when new configuration changes are pushed (incremental)
Description of the problem
When pulling full-volume configuration data, you need to do so based on the version interval. For example, when the current version of the client is 1001, the version interval is 0 to 1001. If the version number is used as the key, each time you pull the full-volume data, you need to iterate through these version numbers to get the corresponding configuration.
Program description
-
zero-version configuration: Use the version interval as a key to refresh the cached data every hour to ensure that the gap between the zero-version configuration and the latest version is kept within reasonable limits, while avoiding excessive incremental storage of data.
-
incremental configuration: Use the version number as the key. Each time the data is fetched, the corresponding configuration is obtained by traversing the version number. Since the number of version numbers is small, the impact on performance is relatively controllable, and since the number is small, the storage time is shorter, and direct querying puts less pressure on the database.
summarize
The configuration center is more lightweight compared to Apollo and Qconfig. It uses an incremental update mechanism, whereas Apollo and Qconfig pull the full amount of data under the space for each configuration update, which can lead to performance pressure in scenarios with large configuration volumes. This configuration center is characterized by support for a single configuration store and a larger capacity under a single space, which better meets the needs of large-scale configuration management.
III. FINAL
The Dockworker's Thousand and One Nights is a blogger who specializes in sharing technical dry goods. Following the blogger's posts, you will gain insights into the latest technology trends in the industry, as well as share practical experiences in the field of Java development and security. Whether you're a developer or a hobbyist interested in reverse engineering, you'll find valuable knowledge and insights at The Dockworker's Thousand and One Nights.
Don't know much, do too little. Welcome to criticize and correct.