background
Recently I encountered a case, a Java application could not obtain a new database connection, and the following error occurred in the log:
: wait millis 5001, active 20, maxActive 20, creating 0
at (:1894)
at (:1502)
at (:1482)
at (:1463)
active equals maxActive, indicating that the connection in the connection pool has been exhausted.
When analyzing the database connection situation during the error period, it was found that the number of connections to the database (Threads_connected) increased significantly, but the number of active threads (Threads_running) was low and stable. Low and stable active thread count means no slow query occupies the connection. However, the number of connections increases significantly, indicating that the connection has not been released back to the connection pool in time.
For such connections that have not performed any operations within a certain period of time but have not been returned to the connection pool in time, there is actually a special noun, namely, leaked connection (Leaked Connection
)。
Below, we will talk about issues related to leaked connections, including:
- Hazards of leaking connections.
- Causes of leaked connections.
- How to locate leaked connections in Druid.
- How to locate leaked connections in HikariCP.
Hazards of leaking connections
A leaked connection can cause the following problems:
-
Connection pool exhaustion: Leaked connections continue to occupy resources in the connection pool, resulting in a gradual decrease in available connections and eventually exhausting the connection pool.
-
Application performance degradation: When the connection in the connection pool is exhausted, new database operations cannot obtain the connection, resulting in request blocking or failure, which may cause the application to fail to function properly.
-
Waste of database resources: Leaked connections will occupy the database's connection resources, which may cause the database's connections to reach the upper limit.
-
Connection failure risk: Connections that have not been released for a long time cannot remain active through the connection pool's Keep-Alive mechanism, and are more likely to be closed by the MySQL server or middleware due to idle timeouts.
When performing database operations using these closed connections, the classic "Communications link failure. The last packet successfully received from the server was xxx million seconds ago." error is triggered.
Causes of leaked connections
Leaked connections are usually caused by:
1. Long transactions or long connections.
Transactions are not committed for a long time or connections are not released for a long time.
2. The connection is not closed.
After using the connection, it was not calledclose()
Method returns the connection to the connection pool. like,
Connection conn = ();
// Perform database operations
// Forgot to call ();
3. The exception is not handled.
An exception occurred during database operation, causing the connection not to close normally. like,
Connection conn = null;
try {
conn = ();
// Perform database operations
thrownewRuntimeException("Simulate exception");
} catch (SQLException e) {
();
} finally {
if (conn != null) {
try {
(); // After the exception occurs, it may not be executed here
} catch (SQLException e) {
();
}
}
}
How to locate leaked connections in Druid
In the Druid connection pool, the detection of unreturned connections can be enabled by the following parameters:
- removeAbandoned: Whether to recycle connections that have not returned timeout, the default value is false, indicating that they are not recycled.
- removeAbandonedTimeoutMillis: Timeout time (unit: milliseconds) for connection not returned. The default value is 300000 (i.e. 300 seconds).
- logAbandoned: Whether to print connection information that has not returned timed out to the log. The default value is false, which means no printing.
It should be noted that logAbandoned only takes effect when removeAbandoned is true. That is, the Druid connection pool does not support printing only, but does not recycle the function of timeout and not returning the connection.
Implementation details
When getting a connection from the connection pool, ifremoveAbandoned
To be true, the stack information of the connection and the creation time will be recorded to detect that the connection is not returned.
public DruidPooledConnection getConnectionDirect(long maxWaitMillis) throws SQLException {
...
for (; ; ) {
DruidPooledConnection poolableConnection;
try {
poolableConnection = getConnectionInternal(maxWaitMillis);
} catch (GetConnectionTimeoutException ex) {
...
}
...
if (removeAbandoned) {
// Record stack information for easy debugging and find out the location of the code that has not been closed in time
StackTraceElement[] stackTrace = ().getStackTrace();
= stackTrace;
// Set the connectedTimeNano to the current time
();
= true;
// Add the connection to the active connection list for subsequent unreturned connection detection.
();
try {
(poolableConnection, PRESENT);
} finally {
();
}
}
...
return poolableConnection;
}
}
When will the connection be detected to time out?
This is actuallyDestroyConnectionThread
The cycle task was performed in the previous article, we mentionedDestroyConnectionThread
Called at a certain time interval (determined by the timeBetweenEvictionRunsMillis parameter, default is 60 seconds)shrink(true, keepAlive)
Method, destroy expired connections in the connection pool. In fact, in addition to the shrink method, it will also callremoveAbandoned()
to close connections that have not returned timed out.
public class DestroyTask implements Runnable {
public DestroyTask() {
}
@Override
public void run() {
shrink(true, keepAlive);
if (isRemoveAbandoned()) {
removeAbandoned();
}
}
}
Let's take a look belowremoveAbandoned()
Specific implementation details.
public int removeAbandoned() {
int removeCount = 0;
// If there is currently no active connection (activeConnections is empty), then return directly
if (() == 0) {
return removeCount;
}
long currrentNanos = ();
List<DruidPooledConnection> abandonedList = new ArrayList<DruidPooledConnection>();
();
try {
Iterator<DruidPooledConnection> iter = ().iterator();
//Traveling through active connections
for (; (); ) {
DruidPooledConnection pooledConnection = ();
// If the connection is running (isRunning()), skip
if (()) {
continue;
}
// Calculate the time of use of the connection (timeMillis), that is, the current time minus the lending time of the connection.
long timeMillis = (currrentNanos - ()) / (1000 * 1000);
// If the connection has been used for more than removeAbandonedTimeoutMillis, remove it from the active connection list and add abandonedList
if (timeMillis >= removeAbandonedTimeoutMillis) {
();
(false);
(pooledConnection);
}
}
} finally {
();
}
// traverse abandonedList, call () for each unreturned connection to close the connection
if (() > 0) {
for (DruidPooledConnection pooledConnection : abandonedList) {
...
(pooledConnection);
();
removeAbandonedCount++;
removeCount++;
// If logAbandoned is true, record the connection details that are not returned
if (isLogAbandoned()) {
StringBuilder buf = new StringBuilder();
("abandon connection, owner thread: ");
(().getName());
(", connected at : ");
...
}
(());
}
}
}
return removeCount;
}
The process flow of this method is as follows:
- Iterate through the currently active connections (activeConnections) and check the usage time of each connection. The usage time of the connection is equal to the current time minus the lending time of the connection (i.e.
borrow
timestamp of moment). - If the usage time of a connection exceeds the
removeAbandonedTimeoutMillis
, add it to the abandonedList. - traverse abandonedList to close these unreturned connections. if
logAbandoned
To true, details of the connection that has not been returned will be printed in the log. By analyzing the log, you can locate the code location of the leaked connection.
How to locate leaked connections in HikariCP
In the HikariCP connection pool, connection leak detection can be enabled by the following parameters:
- leakDetectionThreshold: Connection leak detection threshold (unit: milliseconds). If a connection is not closed after being retrieved from the connection pool for more than a specified time, it is considered a leaked connection. The default is 0, which means that connection leakage detection is disabled. The minimum can be set to 2000 (2 seconds).
When a leaky connection occurs, the following information is printed in the HikariCP log
Connection leak detection triggered for @5dd31d98 on thread (), stack trace follows
: Apparent connection leak detected
at (:100)
at (:27)
...
Implementation details
After obtaining the connection from the connection pool, the system will call(poolEntry)
Start a ProxyLeakTask timing task. The task willleakDetectionThreshold
Triggered in millisecondsrun()
Method for detecting and printing connection leak information.
public Connection getConnection(final long hardTimeout) throws SQLException
{
();
finalvar startTime = currentTime();
try {
var timeout = hardTimeout;
do {
// Get free connection from the connection pool
var poolEntry = (timeout, MILLISECONDS);
if (poolEntry == null) {
break; // We timed out... break and throw exception
}
finalvar now = currentTime();
// Close the connection if the connection has been marked as evict or invalid is detected
if (() || (elapsedMillis(, now) > aliveBypassWindowMs && isConnectionDead())) {
closeConnection(poolEntry, () ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE);
timeout = hardTimeout - elapsedMillis(startTime);
}
else {
...
// Return a proxy connection and start the connection leak detection task
return ((poolEntry));
}
} while (timeout > 0L);
...
}
If connectedleakDetectionThreshold
Returned within time (that is, it was calledclose()
Method), the system will call()
Cancel the timing task to avoid triggeringrun()
method.
If the connection timeout is not returned, the system will execute the run() method to print the connection leak information.
The following is the specific implementation of ProxyLeakTask.
class ProxyLeakTask implements Runnable
{
...
ProxyLeakTask(final PoolEntry poolEntry)
{
this.exception = new Exception("Apparent connection leak detected");
this.threadName = ().getName();
this.connectionName = ();
}
...
void schedule(ScheduledExecutorService executorService, long leakDetectionThreshold)
{
scheduledFuture = (this, leakDetectionThreshold, );
}
/** {@inheritDoc} */
@Override
public void run()
{
isLeaked = true;
finalvar stackTrace = ();
finalvar trace = new StackTraceElement[ - 5];
(stackTrace, 5, trace, 0, );
(trace);
("Connection leak detection triggered for {} on thread {}, stack trace follows", connectionName, threadName, exception);
}
void cancel()
{
(false);
if (isLeaked) {
("Previously reported leaked connection {} on thread {} was returned to the pool (unleaked)", connectionName, threadName);
}
}
}
Summarize
A leaked connection refers to a connection that fails to return the connection pool in time after using the database connection. The main hazards of leaked connections include exhaustion of connection pools, degraded application performance, wasted database resources, and potential risk of connection failure. The causes of leaked connections usually include not closing the connection correctly, not handling exceptions or long transactions, etc.
Druid and HikariCP two commonly used connection pools provide corresponding leak connection detection mechanisms. Druid byDestroyConnectionThread
Periodically detect unreturned connections and close them after a timeout. iflogAbandoned
To true, details of the connection that is not returned will also be printed. HikariCP passesleakDetectionThreshold
Parameters enable connection leakage detection. HikariCP triggers when the connection is not returned within the specified timeProxyLeakTask
, print the connection leak information.
In development and testing environments, it is recommended to enable the connection leak detection function to detect problems as early as possible and fix them.