Location>code7788 >text

[Project Scenario] What's wrong with the test environment taking 1 second longer than the production environment when requesting data?

Popularity:150 ℃/2024-11-11 18:32:30

Wei brother, the first time to send you a message is to ask for advice on a problem it, recently I am optimizing the system query, in the test environment optimization up to standard, but released to the production, found that from the client initiated the request to receive the response, it took an extra 1 second (the library of the test environment and the production of the same amount of data). The library of the test environment is on the intranet, and the library of the server is the RDS Hangzhou node of AliCloud. Our server room is in Chengdu, and the majority of the group using the system is also in the Chengdu area. After grabbing packets on the server, a mysql request and response, a difference of 500 milliseconds, a list of queries back and forth is about 1 second. A list query round trip is about 1 second. Scripts run on RDS only take less than 100ms. Therefore, I suspect that the hundreds of milliseconds in between are wasted on network transmission overhead, and I can see via Prometheus that the bandwidth in our server room isn't stretched either. My suggestion is to migrate the database to the intranet to come or to the RDS in the Chengdu area.
Please advise Vigo, is there a better way to do this? Without migrating the database (which is larger).
Thanks for taking your valuable time to view this message, Will is free to help check it out yet.

This is a fan (Xie students) to V's message, thank you for a long time to V's attention, to give you a few optimization suggestions to try to see if it can be solved:

Without migrating the database, you can consider several optimization strategies:

1. Database connection optimization

  • Connection Pool Settings: Ensure that you are using a proper database connection pool (e.g. HikariCP or DBCP) and adjust its configuration (e.g. connection pool size, connection timeout, maximum free connections, etc.). If the connection pool is not properly configured, it may result in frequent connection establishment and release, which in turn increases latency.
  • persistent connection: For services that access the database frequently, try increasing the persistence of the database connection to reduce the time it takes to establish a connection on each request.

Database connection optimization is crucial to improve system performance and reduce latency. For the scenario mentioned (latency difference between test and production environments, database in AliCloud RDS Hangzhou node, server in Chengdu), V Brother would like to give a detailed optimization strategy and specific steps from several aspects to achieve better connection optimization without database migration.

1. Database Connection Pool Optimization

The use of database connection pooling can effectively reduce the connection creation and destruction overhead, especially in the case of high concurrency, database connection pooling can reuse connections to reduce the connection establishment time for each request. Commonly used database connection pools areHikariCPDBCP cap (a poem)C3P0HikariCP is one of the best performing connection pools.

Steps:

  1. Choosing the right connection pool

    • RecommendedHikariCPBecause it is efficient in performance and simple to configure.
    • If you're using the Spring Framework, you can simplify configuration with Spring Boot's built-in HikariCP connection pooling.
  2. Configuring Database Connection Pooling
    Take Spring Boot and HikariCP as an example, the specific configuration steps are as follows:

    • exist maybe Configure HikariCP in the configuration file
   # Data source configuration
   =jdbc:mysql://your-database-url:3306/weige_db?useUnicode=true&characterEncoding=utf-8&useSSL=false
   =weige
   =wg123123

   # HikariCP configuration
   -pool-size=20 # Set the maximum number of connections in the pool.
   -idle=5 # Set the minimum number of idle connections in the pool.
   -timeout=30000 # Set the maximum idle time for a connection (in milliseconds)
   -lifetime=60000 # Set the maximum lifetime of a connection (in milliseconds)
   -timeout=30000 # Set the connection timeout (in milliseconds)
   -timeout=3000 # Set connection verification timeout (in milliseconds)
   -detection-threshold=15000 # Set leak detection timeout (in milliseconds)
  • Annotation Description:
    • maximum-pool-size: Controls the maximum number of connections in the connection pool.
    • minimum-idle: Controls the minimum number of idle connections in the connection pool.
    • connection-timeout: The maximum wait time when fetching a connection from the connection pool.
    • validation-timeout: Timeout for connection authentication timeout.
  1. Dynamically Adjusting Connection Pool Parameters

    • Dynamically adjust the parameters of the connection pool according to the actual load to avoid the connection pool being too large or too small leading to performance bottlenecks or waste of resources.
    • Use a monitoring tool such asPrometheus maybeAliyun CloudMonitor) to monitor the status of the database connection pool in real time and adjust the connection pool parameters according to the real-time situation.
  2. Optimize the connection pool creation and destruction process

    • Avoid frequent creation and destruction of database connections. Connection pools should try to maintain a certain number of active connections. Frequent connections and destructions increase the pressure on the database.
    • Set a reasonable connection pool size to meet high concurrency requirements without wasting resources.

2. Database connection configuration optimization

Ensure that the database connection is configured optimally to minimize delays and errors when connecting.

Steps:

  1. Connections use the intranet instead of the public network
    • If your application servers and databases are on AliCloud, you should use an AliCloud intranet connection instead of a public network connection whenever possible.
    • Intranet connections have much lower latency than public connections and are more stable.
    • Configure the connection URL to use an intranet IP address:
     =jdbc:mysql://<internal-ip>:3306/weige_db?useUnicode=true&characterEncoding=utf-8&useSSL=false
  • In this way, the connection between the application and the database does not need to go through the public network, reducing the latency of network transmission.
  1. Connection Validation for Database Connection Pooling
    • Under high load, databases may shut down after a period of connection idleness, so connection validation needs to be enabled to ensure that the connection is available each time it is fetched.
    • In HikariCP you can configure theconnection-test-query to periodically verify the connection:
     -test-query=SELECT 1
  1. Enable connection retry mechanism

    • In case of connection loss or network instability, you can enable the connection retry mechanism to ensure that the connection is automatically retried within a certain period of time.
  2. Use long connections (if necessary)

    • If your application scenario is one that requires frequent access to a database, consider using a long connection to the database connection to reduce the connection establishment and destruction overhead for each request.

3. Database Query Optimization

Although you mentioned that the test and production environments have the same amount of data, you still need to ensure that the database queries themselves are efficient. Optimizing database queries can reduce database access times and therefore overall response times.

Steps:

  1. Using Database Indexes
    • Make sure that the fields involved in the query are indexed, especially those that are often used as query conditions.
    • utilizationEXPLAIN to see the execution plan of the query and make sure that Full Table Scan is not being used.
   EXPLAIN SELECT * FROM mytable WHERE my_column = 'value';
  1. Avoiding the N+1 query problem

    • If your query involves multiple table joins or querying the same data multiple times, you should try to avoid the N+1 query problem. This can be avoided by appropriate SQL joins (JOINs) or batch queries.
  2. Query Paging Optimization

    • For paging queries, make sure to use theLIMIT cap (a poem)OFFSET Optimize paging query performance to avoid querying too much data.
    • Pagination queries for large data volumes are recommended to use ID- or time-based range queries (instead of theOFFSET)。
  3. Using the Query Cache

    • If certain queries are very frequent and the results change infrequently, you can use theRedis maybeMemcached to cache query results and avoid accessing the database every time.
   // typical example:utilization Redis Cache query results
   String cacheKey = "user:" + userId;
   User user = ().get(cacheKey);
   if (user == null) {
       user = (userId);
       ().set(cacheKey, user);
   }
   return user;

4. Database connection pooling combined with query optimization

  • While using connection pooling, ensure that each query after obtaining a connection is efficient, avoiding creating a new connection on each query or wasting time on unnecessary queries.
  • Regularly analyze database slow query logs to identify performance bottlenecks and perform targeted optimization.

To summarize the above

  • Select and configure a suitable database connection pool (such as HikariCP).
  • Optimize the configuration of database connections to ensure that intranet connections are used and connection validation is enabled.
  • Optimize database queries to ensure efficient query execution plans, avoid N+1 queries, and use caching to reduce frequent queries.
  • Dynamically adjust connection pool parameters and monitor the operational status of the database connection pool.

With these optimization strategies, you are able to significantly reduce the latency between request and response, improving the responsiveness and stability of your system.

2. network optimization

  • Accelerated network connectivity: You mentioned that Prometheus monitors bandwidth usage, so you can check if there are any network bottlenecks. For example, is there a routing issue with more network hops and higher latency, or is the link between Aliyun and your server room itself not stable enough.
  • CDN Acceleration: While CDNs are typically used for static resource acceleration, some services also support database request optimization (e.g., by accelerating specific types of HTTP requests, etc.). Consider using Aliyun'sCloud Link(cloud link acceleration) to optimize connectivity across geographies.
  • TCP optimization: TCP window size, retransmission policy, etc. can be adjusted to reduce network latency when using the TCP protocol in the connection between the database and the application server.

Specific implementation steps for network optimization strategy

Network optimization is an important tool to improve application performance and reduce latency. For the cross-region access latency problem you mentioned (the databases of the test and production environments are in different regions), network optimization strategies can help reduce bottlenecks and latency in network transmission.

1. Optimize network architecture and communication paths

Network architecture and communication paths affect latency when accessing across geographies, so there is a need to ensure that communication paths are as simple and fast as possible.

Steps:

  1. Use of intranet communication

    • Ensure that the database and application servers are connected to each other using theintranet communication, rather than connecting through the public network. Especially in the AliCloud environment, the latency of intranet communication is much lower than public network communication.
    • Provided through AliCloudVPC(The Virtual Private Cloud (VPC) creates a virtual network within the intranet, ensuring that communication between databases and applications goes entirely over the intranet and not through the public network.

    procedure

    • Log in to the AliCloud console and go toVPC Management Console
    • Create a VPC (Virtual Proprietary Network) and assign intranet IPs to the application servers and database servers.
    • Configure the routing rules for the VPC to ensure that the application and database instances are on the same intranet to avoid cross-area access.
    • Connect using the database intranet address instead of the public address.
   # Example: Connecting to the database using an intranet IP address
   =jdbc:mysql://<my IP>:3306/weige_db?useUnicode=true&characterEncoding=utf-8&useSSL=false
  1. Optimize region selection

    • If your application server and database are located in different regions of AliCloud (e.g., the application server is in Chengdu and the database is in Hangzhou), there may be high network latency. To minimize this cross-region latency, consider deploying the database and application servers in the same region or using AliCloud's cross-region acceleration service.
    • Cross-Regional Acceleration with AliCloud(e.g.CloudLink) can optimize inter-regional network communications and reduce network latency.

    procedure

    • Evaluate the latency between different regions. You can evaluate the latency between different regions by using the AliCloud-providedPing maybeTraceroute tool to test network latency.
    • When choosing a database or application server to deploy, prioritize deploying them in the same data center or in the same region.
    • Using AliCloudCloudLink or similar cross-region acceleration services.
  2. Network Routing Optimization

    • Network latency is not only related to physical distance, but also to routing paths. By using thespecial-purpose phone line or communications link(Direct Connect) andVPC Peering, which can optimize the routing of data streams to avoid going through unnecessary transit nodes and reduce latency.

    procedure

    • configureDirect Connect: Aliyun provides a dedicated connection service, which can optimize your data transmission path through a dedicated line to avoid the latency caused by the ordinary Internet transmission path.
    • configureVPC Peering: If your application and database are located in different VPCs, you can connect to different VPCs through VPC Peering to reduce cross-VPC routing delays.

2. Adjusting network protocol and TCP parameters

Network transmission performance depends not only on the physical connection but also on the efficiency of the transmission protocol. In particular, the use ofTCP/IP When protocols are used, proper parameter tuning can significantly improve network performance.

Steps:

  1. Optimizing TCP/IP Configuration
  • TCP window size: In a network environment with high bandwidth and high latency, the TCP window size has a large impact on network transmission efficiency. Network throughput can be improved by adjusting the TCP window size of the operating system.

  • TCP Retransmission and Timeout Settings: Reduces the timeout for TCP connection retransmissions, helping to reduce transmission delays when the network is unstable.

    procedure

    Modify the TCP parameters for the operating system:

    • Linux:: Editorial/etc/ file, set the following parameters: (Windows usesnetsh command to adjust the TCP buffer size.)
.rmem_max = 16777216 # Set the maximum value of the receive buffer
.wmem_max = 16777216 # Set the maximum value of the send buffer
net.ipv4.tcp_rmem = 4096 87380 16777216 # Set the minimum, default, and maximum values for the TCP receive buffer
net.ipv4.tcp_wmem = 4096 16384 16777216 # Set the minimum, default, and maximum values for the TCP send buffer
  1. Enable TCP Fast Open (TCP Fast Open)

    • TCP Fast Open allows data to start being sent during three handshakes, reducing the time it takes for a connection to be established. Particularly suitable for low-latency, high-throughput network applications.

    procedure

    • Enable TCP fast open on Linux systems:
     echo 3 > /proc/sys/net/ipv4/tcp_fastopen
  1. Using the Keep-Alive Mechanism

    • TCP Keep-Alive You can keep connections active and avoid frequent connection establishment and destruction. In high-latency or cross-region environments, using Keep-Alive reduces the latency of connection establishment.

    procedure

    • Enable the TCP Keep-Alive mechanism in your application to ensure that connections remain active when there is no data transfer for an extended period of time.
    • Java TCP Keep-Alive can be enabled by setting up connection pooling in the
     -test-query=SELECT 1
     -timeout=30000
     -time=300000 # Set connection idle hold time

3. Optimization with Content Delivery Networks (CDNs)

(go ahead and do it) without hesitatingCDN Primarily used to accelerate the loading of static resources, but some advanced CDN services can also optimize network requests, especially when accessed across geographic regions.

Steps:

  1. Configuring a CDN to Accelerate Cross-Region Requests

    • utilizationAliCloud CDN maybeCloudflare CDNThe CDN can be used to accelerate the response time of requests across geographies. Even non-static resources can be accelerated by CDNs.

    procedure

    • Create and configure a CDN acceleration service in the AliCloud console to forward HTTP requests that need to be accelerated (e.g. API requests) through the CDN.
    • Configure a caching policy to ensure that frequently used data is cached by the CDN, avoiding the need to fetch it from the origin server each time.
  2. Optimizing CDN Caching Rules

    • For dynamic requests, you can configure caching for shorter periods of time or configure it to cache frequently requested data. For example, data returned by an API request can have a caching policy set to avoid repeated requests.

4. Increased bandwidth and network monitoring

Insufficient bandwidth can be a bottleneck for network latency, so monitoring bandwidth usage and increasing bandwidth appropriately can effectively improve network performance.

Steps:

  1. Monitor bandwidth usage

    • utilizationAliyun's CloudMonitor maybePrometheus Monitor bandwidth usage. Monitoring allows you to clearly see bottlenecks in network traffic.
  2. Increased bandwidth

    • Based on the monitoring results, if the bandwidth has reached the upper limit, you can consider increasing the bandwidth. Aliyun provides flexible bandwidth expansion options to dynamically adjust bandwidth according to demand.

To summarize the above

By optimizing network architecture, adjusting network protocols, using CDN acceleration, and increasing bandwidth, network latency can be effectively reduced and system response speed can be improved:

  1. Optimize network architecture with intranet connectivity and cross-region acceleration services.
  2. Adjust TCP parameters to enable TCP fast-open and Keep-Alive mechanisms.
  3. Configure CDN acceleration and caching policies to optimize cross-region requests.
  4. Monitor bandwidth usage to ensure adequate bandwidth.

3. Database Query Optimization

  • Query Performance Analysis: Even though the database queries perform well in the test environment, the execution plan of the queries may be different in the production environment due to the volume of data or the frequency of queries. You can use MySQL'sEXPLAINto see the execution plan of the query and ensure that there are no inefficient operations such as full table scans. Based on the execution plan, you can add indexes, or optimize SQL statements.
  • query cache: Ensure that your query has the appropriate caching policy by using theRedisand other caching services to store frequently queried data and reduce the number of direct database accesses. You can cache frequently used list queries and update the cache regularly to avoid needing to read from the database each time.

4. asynchronous processing

  • asynchronous query: If it is a list of queries and other non-real-time requirements of the request is very high , you can consider changing the request to asynchronous processing , the client can continue to deal with other operations in the background , the query results can be completed in the background and pushed to the client through the message queue . This will not block the client's main thread, but also improve the user experience.
  • message queue: If your query is a batch query or relies on the results of multiple database queries, it can be processed asynchronously via a message queue (e.g., Kafka, RabbitMQ), thus reducing direct synchronization wait time.

5. Database intranet communications

  • RDS Intranet Connection: Try to ensure that you use Aliyun intranet communication instead of public network communication between your application server and database. If your application and database are both on AliCloud, using an intranet connection instead of a public network connection will reduce network latency and lower latency for cross-region access.

5. Specific implementation steps for database intranet communication policies

In cloud environments, especially when using cloud database services such as AliCloud RDS, try to ensure that application servers and databases use intranet communication instead of public network communication, which can significantly reduce network latency and improve communication efficiency. The following are specific steps on how to configure and optimize database intranet communication.


1. Ensure that application servers and databases are on the same intranet

First of all, make sure that your application servers and databases are deployed within the same AliCloud VPC (Virtual Private Cloud) so that you can ensure intranet communication instead of public network communication.

Steps:

  1. Check that the application server and database are in the same VPC

    • Login toAliCloud ConsoleSelectionECS maybeRDS service to see the VPC where the application servers and database instances reside.
    • Ensure that they are in the same VPC or that interoperability between VPCs has been established.
  2. Creating a VPC and Deploying Resources to a VPC

    • If you don't already have a VPC, you can add it to theAliCloud Console Create a new VPC and ensure that both the application server and database are deployed to that VPC.
  3. Verify that intranet access is enabled for the database instance

    • existAliCloud RDS Console In , confirm that the database instance is enabled for intranet access. Most RDS instances provide dual intranet and public network access by default.

    procedure

    • show (a ticket)RDS Console → Select the target database instance → Configure the access method to beintranet

2. Connect to the database using the VPC intranet IP address

Once you have ensured that both the application server and the database are within the same VPC, you can use an intranet IP address to connect to the database, reducing the latency and bandwidth overhead associated with public network communications.

Steps:

  1. Get the intranet IP address of the database instance

    • Login toAliCloud RDS Console, select your database instance.
    • locateIntranet connection informationThe database is used to record the intranet IP address and port number of the database.
  2. Configuring database connections in the application server

    • Modify the database connection configuration in your application to ensure that the connection uses the database's intranet IP address instead of the public IP address. Example:
     =jdbc:mysql://<RDS intranet IP>:3306/your_database_name
     =your_db_username
     =your_db_password
  • Ensure that the application server's network has access to the RDS's intranet IP address. If the application server and the database instance are on different subnets, make sure there are no problems with network routing between them.

3. Use of private links in proprietary networks (VPC)

To enhance security and minimize latency for cross-region access, Aliyun supports the use of thePrivate links to proprietary networks (VPC) Exposing the RDS instance to other services in the same VPC ensures that data communication is done within the proprietary network.

Steps:

  1. Creating Proprietary Network Links

    • Login toAliCloud Console → SelectVPC → Selectprivate link
    • Creates a new private connection to the RDS instance as a service provider, which can be accessed by other ECS instances through the private link.
  2. Configuring Proprietary Network Access

    • In the application server, connect to the RDS instance using the DNS domain name or intranet IP address provided by the private link.
    • Ensure that the security group rules for the database instance allow intranet access and that private DNS is used in the connection string for access.

4. Configuring Security Groups and Network ACLs

Ensure that intranet traffic is not blocked by network security configurations such as security groups or network ACLs (access control lists). Security group configurations allow control of traffic for intranet communications.

Steps:

  1. Check and configure security groups

    • Login toAliCloud ConsoleSelectionECS cap (a poem)RDS The security group in which the instance resides.
    • Ensure that the security group rules of the application server and the database instance allow network communication between them. It is often necessary to ensure that the application server's security group allows connection requests to the database instance's intranet IP address and port.

    typical example: Allows the application server to send MySQL requests to the RDS database instance (default port 3306):

    • Source IP: Intranet IP or security group of the application server
    • target port:3306
    • pact:TCP
  2. Check and configure network ACLs

    • If a VPC network ACL is used, make sure it allows traffic to pass between the application server and the database instance. Network ACLs can be set in the VPC console.

5. Ensure bandwidth and latency optimization for databases and applications

Although the database and applications are on the same intranet, it is still critical to ensure bandwidth and latency between them. Optimizing bandwidth and network latency can further improve database communication performance.

Steps:

  1. Selection of appropriate instance specifications

    • Select the appropriate database instance specifications and network bandwidth based on the load requirements of the application. In the AliCloud RDS console, you can adjust the database specifications (such as I/O performance and bandwidth) according to performance requirements:
      • If the database is under a lot of read and write pressure, consider using a high-performanceSSD Storage
      • For large-scale concurrent queries, select higher-specification instances to provide greater network bandwidth.
  2. Optimizing RDS Network Configuration

    • For an RDS instance, selectEnhanced Networking maybeDedicated network bandwidthto increase database throughput and reduce latency.

    procedure

    • existAliCloud RDS Console in, selectNetworking and security, enable Enhanced Networking and adjust bandwidth configurations.

6. Avoid using the public network to access databases

If communication between the database and application servers takes place over the public network, additional latency and bandwidth consumption may be incurred. To maximize the performance of intranet communication, ensure that all database connections are made over the intranet.

Steps:

  1. Ensure that database instances are not exposed to the public network

    • existAliCloud RDS Console Ensure that the database instance does not have public IP turned on or public access turned off. Use the intranet IP address for communication.

    procedure

    • show (a ticket)RDS Console → Select the target database instance → In theconnection method choose fromIntranet access only
  2. Check if the application server is accessing the database through the public network

    • Ensure that the application server accesses the database through an intranet IP address, rather than using a public IP If you find that the application server is incorrectly using a public IP address, you can update the configuration file to ensure that an intranet address is used.

To summarize the above

By ensuring that both application servers and database instances are deployed within the same AliCloud VPC and communicate via intranet IPs, you can significantly reduce network latency and bandwidth overhead, and improve the performance of cross-region access. Specific steps include:

  1. Ensure that the application server and database instances are in the same VPC
  2. Using an Intranet IP Address for Database Connections, avoid using public network connections.
  3. Configuring Proprietary Network Links, increasing the security and reliability of network communications.
  4. Configuring Security Groups and Network ACLs, ensuring that intranet communications are not blocked.
  5. Optimize bandwidth and latency, select the appropriate instance specification and use the enhanced network.
  6. Avoid accessing databases over the public network, ensuring that the data transmission path is optimized.

The implementation of these strategies effectively reduces network latency, improves database query performance, and reduces performance bottlenecks due to cross-region network communications.

6. High Concurrency Read/Write Separation

  • read-write separation: If your system has more read operations, consider master-slave separation of databases, directing read requests to read-only slave libraries to reduce the pressure on the master library, thus improving the response speed of read requests.

  • load balancing: If there are multiple database instances, requests can be distributed via load balancing to reduce load and latency on a single database.

The high concurrency read-write separation strategy can significantly improve database performance and scalability, especially in high concurrency scenarios. The following are the key steps to implement this strategy:

  1. Design a master-slave architecture: divide the database into master (writes) and slave (reads) and configure load balancing.
  2. Data synchronization and consistency: Configure data synchronization of master and slave databases to ensure data consistency.
  3. Caching and Optimization: Reduce database access pressure with distributed caching and query optimization.
  4. Load balancing and traffic control: Reasonable distribution of read requests to multiple slave databases to ensure database load balancing.

7. Monitoring and Tuning

  • in-depth monitoring: In addition to Prometheus, one can leverage AliCloud'sCloudMonitorand other monitoring tools, in conjunction with database performance metrics (e.g., QPS, response time, number of connections, etc.) for continuous optimization.
  • Slow Query Log: View the slow query logs of AliCloud RDS to find out the queries with long execution time and optimize the SQL execution plan for these queries.

summarize

  • Optimize database connection and query efficiency to ensure that there is no bottleneck during network transmission. This needs to be continuously monitored and debugged until the project requirements are met.

  • Use caching mechanism to reduce database access. This one I believe the brother has already done and can then analyze what data is being cached.

  • Improved asynchronous processing and load balancing to avoid a single point of stress on the database.

  • Consider using intranet connectivity and cross-region acceleration services to reduce network latency across geographies.

After multi-directional diagnosis and optimization do not know whether it can solve Xie's problem, partners, you think there is a better solution, you can say your insights, let Xie try, pay attention to the Wei brother love programming, together to solve the BUG.