Location>code7788 >text

In-depth analysis of Druid connection pooling: connection effectiveness detection and Keep-Alive mechanism

Popularity:225 ℃/2025-03-17 10:57:05

background

In Java programs, the following is a common error.

Caused by: : Communications link failure

The last packet successfully received from the server was 30,027 milliseconds ago. The last packet sent successfully to the server was 30,028 milliseconds ago.

This error is usually caused by an unexpected disconnection of MySQL connection, and common causes include:

  1. Client connection pools (such as HikariCP, Druid) are improperly configured, including:

  • The idle connection timeout exceeds MySQLwait_timeout(The default is 28800 seconds, that is, 8 hours), causing the connection to be closed by the MySQL server.
  • The proper Keep-Alive mechanism is not configured, resulting in the connection being shut down by the MySQL server for a long time without using it.
  • Failure to perform a connection validity check may cause the client to obtain a failed connection.
  • The connection was not released in time.

    Connections that have not been released for a long time cannot remain active through the Keep-Alive mechanism of the connection pool, and are more likely to be closed by the MySQL server or middleware due to idle timeouts.

  • Timeout limit for intermediate layer components.

    If there are proxy (such as ProxySQL) or load balancer (LB) between the client and MySQL, these components may have independent idle connection timeout settings, causing the connection to be disconnected early.

  • Network problems, including high latency, packet loss or short network outages, can affect the stability of database connections.

  • The connection is automatically disconnected by the MySQL server, such as DBA manually executedKILLThe operation terminates the connection.

  • This article will deeply analyze the connection effectiveness detection mechanism of Druid connection pool, focusing on the following content:

    1. In what cases does Druid check if the connection is available?
    2. How does Druid keep the connection active (Keep-Alive mechanism)?
    3. The specific meaning and function of common parameters in Druid connection pools.
    4. Why can't MySQL's general log be seenvalidationQueryDefined detection statement execution?

    I hope that through this analysis, we can help you understand the operating mechanism of Druid connection pool more deeply.

    In what scenario will the validity of the connection be detected?

    Druid connection pooling detects the effectiveness of the connection in the following four scenarios:

    1. Apply for a connection.
    2. Return the connection.
    3. Create a new physical connection.
    4. Periodic testing.

    Let’s take a look at the specific implementation logic of these four scenarios.

    1. Apply for a connection

    When the application requests an idle connection from the connection pool, the validity of the connection is checked, and there are two related parameters: testOnBorrow and testWhileIdle.

    Apply for connection is ingetConnectionDirectImplemented in the method, let’s take a look at the specific implementation details of the method.

    public DruidPooledConnection getConnectionDirect(long maxWaitMillis) throws SQLException {
            int notFullTimeoutRetryCnt = 0;
            for (; ; ) {
                DruidPooledConnection poolableConnection;
                try {
    // Get free connections from the connection pool
                    poolableConnection = getConnectionInternal(maxWaitMillis);
                } catch (GetConnectionTimeoutException ex) {
                    ...
                }
    // If testOnBorrow is true, testConnectionInternal will be called to detect the validity of the connection
                if (testOnBorrow) {
                    boolean validated = testConnectionInternal(, );
                    if (!validated) {
                        if (()) {
                            ("skip not validated connection.");
                        }
    // If the connection is invalid, discardConnection will be called to discard the connection and continue to get a new free connection from the connection pool.
                        discardConnection();
                        continue;
                    }
                } else {
                    ...
    // If testOnBorrow is not true and testWhileIdle is true, it is determined whether the idle time of the connection exceeds timeBetweenEvictionRunsMillis. If it exceeds, testConnectionInternal will also be called to detect the validity of the connection.
                    if (testWhileIdle) {
                        final DruidConnectionHolder holder = ;
                        long currentTimeMillis = ();
                        long lastActiveTimeMillis = ;
                        ...
                        long idleMillis = currentTimeMillis - lastActiveTimeMillis;

                        if (idleMillis >= timeBetweenEvictionRunsMillis
                                || idleMillis < 0 // unexcepted branch
                        ) {
                            boolean validated = testConnectionInternal(, );
                            if (!validated) {
                                if (()) {
                                    ("skip not validated connection.");
                                }
    // If the connection is invalid, discardConnection will be called to discard the connection and continue to get a new free connection from the connection pool.
                                discardConnection();
                                continue;
                            }
                        }
                    }
                }
                ...
                return poolableConnection;
            }
        }

    The implementation logic of this method is as follows:

    1. First get an idle connection from the connection pool.
    2. iftestOnBorrowIf true, testConnectionInternal is called to detect the validity of the connection. If the connection is valid, it will return directly; if it is invalid, it will discard the connection and reacquire the new idle connection.
    3. iftestOnBorrowis false andtestWhileIdleIf true, it determines whether the idle time of the connection exceeds timeBetweenEvictionRunsMillis. If it exceeds, testConnectionInternal will be called for detection; if it does not exceed or the detection passes, the connection will be returned directly.

    The default values ​​of testOnBorrow, testWhileIdle, timeBetweenEvictionRunsMillis  The default values ​​of these three parameters are false, true and 60000 (i.e. 60 seconds).

    This means that in the default configuration, when applying for a connection from the connection pool, if the connection is idle for more than 60 seconds, the system will check the validity of the connection. Such logic works for most scenarios, because in most cases, the probability of a connection being interrupted within 60 seconds is less.

    If the application requires extremely high connection availability (such as financial, payment and other scenarios), consider setting testOnBorrow to true to ensure that each connection obtained is available. But it should be noted that this will have certain performance overhead.

    2. Return the connection

    When the application calls()When closing the connection, the connection is not destroyed immediately, but is returned to the connection pool for subsequent reuse.

    If testOnReturn (default is false) is true, it will verify its validity when the connection is returned, ensuring that invalid connections are not returned to the connection pool.

    // 
    protected void recycle(DruidPooledConnection pooledConnection) throws SQLException {
            final DruidConnectionHolder holder = ;
                ...
                if (testOnReturn) {
                    boolean validated = testConnectionInternal(holder, physicalConnection);
                    if (!validated) {
                        (physicalConnection);
                        ...
                    }
                }
                ...
        }

    3. Create a new physical connection

    Create a new physical connection is increatePhysicalConnectionImplemented in the method.

    public PhysicalConnectionInfo createPhysicalConnection() throws SQLException {
            String url = ();
            Properties connectProperties = getConnectProperties();
            ...
            try {
    // Here the driver's connect method will be called to establish a connection
                conn = createPhysicalConnection(url, physicalConnectProperties);
                connectedNanos = ();

                if (conn == null) {
                    throw new SQLException("connect error, url " + url + ", driverClass " + );
                }
                ...
                if (!initSqls(conn, variables, globalVariables)) {
                    validateConnection(conn);
                }
                ...
            } 
            ...
            return new PhysicalConnectionInfo(conn, connectStartNanos, connectedNanos, initedNanos, validatedNanos, variables, globalVariables);
        }

    After the connection is established, ifinitSqls(conn, variables, globalVariables)If false, it will be calledvalidateConnectionto verify the validity of the connection.

    The following are the conditions that initSqls(conn, variables, globalVariables) need to be met for false:

    1. connectionInitSqls is empty (default). connectionInitSqls is often used to set some connection initialization statements, such asset NAMES 'utf8mb4'
    2. initVariants is false (default). If the parameter is true, show variables will be executed to get the connected session variable.
    3. initGlobalVariants is false (default). If the parameter is true, show global variables will be executed to get the global variables.

    4. Regular testing

    When Druid initializes the connection pool, a background daemon thread will be started (DestroyConnectionThread), used to periodically destroy expired connections in connection pools.

    The thread is called at a certain time interval (determined by the timeBetweenEvictionRunsMillis parameter, default is 60 seconds).shrink(true, keepAlive)Method, perform specific connection destruction operation.

    Next, let’s take a look at the specific implementation details of this method.

    public void shrink(boolean checkTime, boolean keepAlive) {
            ...
    boolean needFill = false;// Is it necessary to fill the idle connection in the connection pool
    int evictCount = 0; // The number of connections to be destroyed
    int keepAliveCount = 0; // The number of connections that need to be kept active
            ...
            try {
                ...
    final int checkCount = poolingCount - minIdle; // Calculate the number of connections that can be recycled in the connection pool (total number of connections minus the minimum number of idle connections)
    final long currentTimeMillis = (); // Get the current time of the system
                int remaining = 0;
                int i = 0;
    for (; i < poolingCount; ++i) { // Travel the connections in the connection pool
                    DruidConnectionHolder connection = connections[i];

                    ...
    If (checkTime) { // When shrink is called by DestroyConnectionThread, checkTime defaults to true.
    if (phyTimeoutMillis> 0) { // Physical timeout check
                            long phyConnectTimeMillis = currentTimeMillis - ;
    // If the survival time of the connection exceeds the physical timeout time, add the connection to the destruction list.
                            if (phyConnectTimeMillis > phyTimeoutMillis) {
                                evictConnections[evictCount++] = connection;
                                continue;
                            }
                        }
    // Calculate the idle time of the connection
                        long idleMillis = currentTimeMillis - ;
    // If the idle time of the connection is less than minEvictableIdleTimeMillis and keepAliveBetweenTimeMillis, the current loop is exited
                        if (idleMillis < minEvictableIdleTimeMillis
                                && idleMillis < keepAliveBetweenTimeMillis) {
                            break;
                        }
    // If the connection is idle time greater than maxEvictableIdleTimeMillis,
    // Or the idle time of the connection is greater than or equal to minEvictableIdleTimeMillis and the serial number of the connection is less than the number of connections that can be recycled
    // Only then will the connection be added to the destruction list.
                        if (idleMillis >= minEvictableIdleTimeMillis) {
                            if (i < checkCount) {
                                evictConnections[evictCount++] = connection;
                                continue;
                            } else if (idleMillis > maxEvictableIdleTimeMillis) {
                                evictConnections[evictCount++] = connection;
                                continue;
                            }
                        }
    // If keepAlive is enabled and the connection idle time reaches keepAliveBetweenTimeMillis, add it to keepAliveConnections, and the validity of the connection will be detected later.
                        if (keepAlive && idleMillis >= keepAliveBetweenTimeMillis
                                && currentTimeMillis -  >= keepAliveBetweenTimeMillis) {
                            keepAliveConnections[keepAliveCount++] = connection;
                        } else {
    if (i != remaining) { // Move connections that do not need to be destroyed to a new location
                                connections[remaining] = connection;
                            }
                            remaining++;
                        }
                    } 
                  ...
                }
    // Calculate the number of connections to be removed
                int removeCount = evictCount + keepAliveCount;
    // Move unchecked connections to the position after remaining to ensure the continuity of the effective connection.
                if (removeCount > 0) {
                    int breakedCount = poolingCount - i;
                    if (breakedCount > 0) {
                        (connections, i, connections, remaining, breakedCount);
                        remaining += breakedCount;
                    }
                    (nullConnections, 0, connections, remaining, removeCount);
                    poolingCount -= removeCount;
                }
                keepAliveCheckCount += keepAliveCount;

                if (keepAlive && poolingCount + activeCount < minIdle) {
                    needFill = true;
                }
            } finally {
                ();
            }
    // Close the connection that needs to be destroyed
            if (evictCount > 0) {
                for (int i = 0; i < evictCount; ++i) {
                    DruidConnectionHolder item = evictConnections[i];
                    Connection connection = ();
                    (connection);
                    (this);
                }
                (nullConnections, 0, evictConnections, 0, );
            }
    // Detect the validity of the connection in keepAliveConnections, if valid, put it back into the connection pool. If the verification fails, the connection is discarded.
            if (keepAliveCount > 0) {
                for (int i = keepAliveCount - 1; i >= 0; --i) {
                    DruidConnectionHolder holder = keepAliveConnections[i];
                    Connection connection = ();
                    ();

                    boolean validate = false;
                    try {
                        (connection);
                        validate = true;
                    } catch (Throwable error) {
                      ...
                    }

                    boolean discard = !validate;
                    if (validate) {
                         = ();
                        boolean putOk = put(holder, 0L, true);
                        if (!putOk) {
                            discard = true;
                        }
                    }
                ...
                }
                ().addKeepAliveCheckCount(keepAliveCount);
                (nullConnections, 0, keepAliveConnections, 0, );
            }

            if (needFill) {
                ();
                try {
    // If the total number of connections in the connection pool (activeCount + poolingCount + createTaskCount) is less than minIdle, a new connection is supplemented.
                    int fillCount = minIdle - (activeCount + poolingCount + createTaskCount);
                    emptySignal(fillCount);
                } finally {
                    ();
                }
            } else if (fatalErrorIncrement > 0) {
                ();
                try {
                    emptySignal();
                } finally {
                    ();
                }
            }
        }

    The process flow of this method is as follows:

    1. Calculate the number of connections that can be recycled in the current pool by checkCount = poolingCount - minIdle. inminIdleYes Druid parameter, which specifies the minimum number of free connections to be retained by the connection pool.

    2. Iterate through the connections in the connection pool and perform the following check:

    • Physical connection survival time check: If phyTimeoutMillis > 0, check whether the survival time of the physical connection exceeds thephyTimeoutMillis, if it exceeds, the connection is added to the destroy list (evictConnections).

    • Idle time check: If the idle time of the connection is greater than maxEvictableIdleTimeMillis, or the idle time is greater than or equal to minEvictableIdleTimeMillis and the sequence number of the connection is less than checkCount (number of recyclable connections), the connection is added to the destruction list.

    • Keep the connection active: If the connection needs to be active (keepAliveOn) and the idle time exceedskeepAliveBetweenTimeMillis, then add the connection to the keepAliveConnections list.

  • Move unchecked connections to a location after remaining to ensure continuity of effective connections.

  • If evictCount is greater than 0, it means that there are connections that need to be destroyed, and traverse the destroy list (evictConnections) to close these connections.

  • If keepAliveCount is greater than 0, it means that there is a connection that needs to be active. Iterate over the keepAliveConnections list to check the validity of the connection. If it is valid, put it back into the connection pool. If the verification fails, the connection is discarded.

  • If needFill is true, it means there is insufficient idle connection in the connection pool, triggering a fill signal to create a new connection.

  • Therefore, by default, Druid connection pooling is every 60 seconds (bytimeBetweenEvictionRunsMillisParameter control) Perform a connection recovery and maintenance operation once, and maintain a certain number of free connections. Its core logic includes:

    • Recycle timeout or unnecessary idle connections:

      • The connection's idle time exceeds maxEvictableIdleTimeMillis or phyConnectTimeMillis and will be recycled.
      • When the number of connection pools exceeds the minimum number of idle connections minIdle, if the connection's idle time exceeds minEvictableIdleTimeMillis, it will also be recycled.
    • Maintain Keep-Alive mechanism (ifkeepAliveTurn on):

      • Validity detection is performed when the idle time of the connection exceeds keepAliveBetweenTimeMillis and the last time the Keep-Alive detection time exceeds keepAliveBetweenTimeMillis.
      • Testing through validateConnection, qualified connections are re-entered into the pool, and unqualified connections are destroyed.
    • Add new connections if necessary:

      If the current number of connections (activeCount + poolingCount) is lower than minIdle, the connection supplement mechanism is triggered to create a new connection.

    It should be noted that even if the connection is turned on regularly active detection, it will still be recycled if the timeout occurs.

    Next, let's take a look at the default values ​​of the above parameters:

    • timeBetweenEvictionRunsMillis: Default 60000 milliseconds (60 seconds).
    • minEvictableIdleTimeMillis: Default 1800000 milliseconds (30 minutes).
    • maxEvictableIdleTimeMillis: Default 25200000 milliseconds (7 hours).
    • phyTimeoutMillis: Default -1.
    • keepAlive: Default is false.
    • keepAliveBetweenTimeMillis: Default 120000 milliseconds (120 seconds).

    Why does the validationQuery set has no effect?

    In the Druid connection pool, when determining whether the connection is valid, it is usually calledtestConnectionInternalorvalidateConnectionmethod. The core logic of these two methods is basically the same, as follows:

    1. Priority is given to using validConnectionChecker for connection verification:

    • validConnectionChecker is an interface that defines the isValidConnection method to detect the validity of database connections.
    • The specific database has corresponding implementation classes, for example: MySQL is fromMySqlValidConnectionCheckerImplementation, Oracle is implemented by OracleValidConnectionChecker.
    • validConnectionChecker is initialized in the initValidConnectionChecker method and selects the appropriate implementation class according to the database driver type.
  • If validConnectionChecker is not initialized, a default check is performed:

    • Execute SQL statements through validationQuery to verify that the connection is valid.

    • This method is suitable for all databases, but it will bring some performance overhead.

    The following is the MySQL implementation class (MySqlValidConnectionChecker)isValidConnectionSpecific implementation of the method.

    // druid-1.2.24/core/src/main/java/com/alibaba/druid/pool/vendor/
    public boolean isValidConnection(Connection conn,
                                     String validateQuery,
                                     int validationQueryTimeout) throws Exception {
        if (()) {
            return false;
        }

        if (usePingMethod || (validateQuery)) {
            validateQuery = DEFAULT_VALIDATION_QUERY;
        }

        return (conn, validateQuery, validationQueryTimeout);
    }

    The usePingMethod in the method is affected byParameter control, its default value is true.

    When usePingMethod equals true, validateQuery will be set to DEFAULT_VALIDATION_QUERY, i.e./* ping */ SELECT 1, not user-defined validationQuery.

    When execValidQuery() method executes validateQuery, if the query statement is/* ping */At the beginning, the MySQL JDBC driver will undergo special processing.

    Specifically, when MySQL JDBC parses SQL statements, it determines whether it is PING_MARKER (i.e./* ping */) starts with, if so, the SQL statement will not be executed, but instead the call is calleddoPingInstead(), directly send the COM_PING command to the MySQL server, which can reduce the overhead of SQL parsing and execution and improve performance.

    // mysql-connector-j-8.0.33/src/main/user-impl/java/com/mysql/cj/jdbc/
    public  executeQuery(String sql) throws SQLException {
            synchronized (checkClosed().getConnectionMutex()) {
                JdbcConnection locallyScopedConn = ;
                ...
                if ((0) == '/') {
                    if ((PING_MARKER)) {
    doPingOstead();// Send the COM_PING command directly

                        return ;
                    }
                }
             ...
            }
        }

    Some suggestions on parameter settings

    1. minEvictableIdleTimeMillismaxEvictableIdleTimeMillisIt is not advisable to set it too small, as frequent destruction and creation of connections will bring additional performance overhead.

    2. It is recommended to enable the keepAlive mechanism, especially if there is a proxy between the client and MySQL. These components may have independent idle connection timeout settings, causing the connection to be disconnected in advance.

    3. The most efficient way to detect the validity of the connection when requesting a connection (by setting testOnBorrow to true) is to ensure that each connection obtained is available. However, this method will have a certain impact on application performance, especially in high concurrency scenarios.

      Therefore, it is recommended to weigh performance and reliability based on business needs and choose the appropriate detection strategy.

    4. Considering possible network failures, even if the Druid connection pool regularly detects the effectiveness of the connection, it cannot guarantee 100% that all connections are available, so the application side must do a good job of fault tolerance.

    5. For used connections not returned in time in the code, on the one hand, it may cause connection leakage, causing the connection pool to consume available connections. On the other hand, unreleased connections cannot remain active through Druid's Keep-Alive mechanism, making it easier to be closed by the MySQL server or middleware due to idle timeouts.

      To avoid these problems, it is recommended to enable the following parameters in the application's test environment to identify connections that have not been returned for a long time:logAbandonedremoveAbandonedremoveAbandonedTimeoutMillis

    Summarize

    1. Druid connection pools detect the effectiveness of connections in the following four scenarios: requesting connections, returning connections, creating new physical connections, and periodic detection.

    2. Druid periodically performs validity detection of idle connections by turning on the keepAlive parameter to ensure that the connection remains active.

      When the idle time of the connection exceeds keepAliveBetweenTimeMillis, Druid triggers Keep-Alive detection to verify the validity of the connection. If the connection is valid, re-place the connection pool; if it is invalid, it is destroyed.

    3. Druid uses MySQL's COM_PING command by default for connection validity detection, which is more efficient than executing SQL statements.

      Since COM_PING has higher priority than user-defined validationQuery, validationQuery will not be executed under the default configuration.

      If the user wishes to use a custom validationQuery for connection detection, the user can enter theSet the parameter to false to implement.