impunity
User feedback that the number of file descriptors opened by hs2 keeps going up, but current hs2 connections are only in the single digits.
investigation process
The first step is to find out which file descriptors are held by the hs2 process, by using thelsof command lsof -p $pid
, see that the hs2 process is indeed in the/data/emr/hive/tmp/operation_logs/
A large number of descriptors are opened in the directory
Found a similar issue in jira.[HIVE-10970] Investigate HIVE-10453: HS2 leaking open file descriptors when using UDFs - ASF JIRA ()
But this scenario is an fd leak due to a UDF, and the leak path is in the path, which is not the same as the operation_logs directory. It doesn't seem to be the same problem
Examining the source code, I found that the operation log has a cleanup logic.#cleanupOperationLog
The guess is that the client sessionAbnormal End This method does not have theCalled normally or cleanup logic is flawed.rounding difference
First, go through the logic of session closure, by analyzing the flame diagram of the beeline client to find the starting point of session closure.#closeClientOperation
Here the client makes a thrift rpc call, and then finds the thrift server counterpart in hs2 thrift.#CloseOperation
Tracing this method will eventually lead you to the#close
The cleanupOperationLog method is called here.
Then it is possible that the operation logs are not cleaned up because the client session exited abnormally.
Then I looked at the cleanupOperationLog logic to see if there was a code bug here, so I used the git branching comparison feature in idea, and found that version 3.1 had committed a fix for it.
[HIVE-18820] Operation doesn't always clean up log4j for operation log - ASF JIRA ()
reach a verdict
- The client session exits abnormally, resulting in operation logs not being cleaned up, similar to the scratch dir not being cleaned up scenario.
- HIVE-18820 community bugs, consider merging into this patch.