Location>code7788 >text

The record of fire fighting at 3 a.m.: You have stepped on at least three of the seven divine pits that leaked Java memory!

Popularity:457 ℃/2025-02-25 22:40:57

Introduction: That night when operations and maintenance work collectively

"Brother Fan! The online service response time has reached 10 seconds!" At 1 a.m., intern Xiao Li's voice was crying.
On the monitoring screen, the JVM heap memory curve is like a rocket - the newly expanded 16G memory was completely eaten up in 30 minutes.
I gritted my teeth and slapped the table: "Turn the code that has been launched in the past week to the sky!"


The first pit: Static assembly into a perpetual motion machine

▌ Scroll code (real project clip)

// Cache user AI dialogue history → Failed to write!
 public class ChatHistoryCache {
     private static Map<Long, List<String>> cache = new HashMap<>();

     public static void addMessage(Long userId, String msg) {
         (userId, k -> new ArrayList<>()).add(msg);
     }
 }

▌ The scene of the car crash

  • When the number of users surges, cached data only enters and does not exit, and memory is bursting for 48 hours
  • Use Arthas to capture the current situation:vmtool --action getInstances -c 4614556eSeeing Map size exceeds tens of millions
  • MAT Analysis:HashMap$NodeObjects account for 82% of the heap memory

▌ Correct posture

// Use Guava to cache with expiration time instead
 private static Cache<Long, List<String>> cache = ()
         .expireAfterAccess(1, )
         .maximumSize(10000)
         .build();

The second pit: Lambda forgot to close the file stream

▌ Fatal code (processing AI model files)

// Load local model file → Overturn writing!
 public void loadModels(List<File> files) {
     (file -> {
         try {
             InputStream is = new FileInputStream(file); // Close if missing!
             parseModel(is);
         } catch (IOException e) { /*...*/ }
     });
 }

▌ Weird phenomenon

  • Three days after the service was running, it suddenly reported "Too many open files"
  • Linux troubleshooting:lsof -p Process ID | grep 'deleted'Found a large number of unreleased file handles
  • JVM monitoring:jcmd PID VM.native_memoryThe number of file descriptors exceeded 10,000

▌ Rescue plan

// Correct writing method: try-with-resources automatically close
 (file -> {
     try (InputStream is = new FileInputStream(file)) { // Automatically shut down the stream
         parseModel(is);
     } catch (IOException e) { /*...*/ }
 });

The third pit: Spring incident monitoring becomes a nail household

▌ Trick code (message notification module)

// Monitor the completion event of AI processing → Overturning writing!
 @Component
 public class NotifyService {

     @EventListener
     public void handleAiEvent(AICompleteEvent event) {
         // Error holding external service reference
         (this::sendNotification);
     }
 }

▌ Memory curve

  • Every time the event is triggered, the listener object is strongly referenced by external services and will never be released.
  • MAT Analysis:NotifyServiceThe number of instances increases linearly with time
  • GC Log: Elderly occupancy rate increases by 5% per week

▌ A trick to avoid pitfalls

//Unbound with weak reference
 public void handleAiEvent(AICompleteEvent event) {
     WeakReference<NotifyService> weakRef = new WeakReference<>(this);
     (() -> {
         NotifyService service = ();
         if (service != null) ();
     });
 }

The fourth pit: zombie missions in the thread pool

▌ Problem code (asynchronous processing of AI requests)

// Asynchronous thread pool configuration → overturn writing!
 @Bean
 public Executor asyncExecutor() {
     return new ThreadPoolExecutor(10, 10,
         0L, ,
         new LinkedBlockingQueue<>()); // Unbounded queue!
 }

▌ Disaster scene

  • When requesting burst, the queue accumulates 500,000 tasks, and each task holds an AI response object.
  • Heap dump display:byte[]Accounts for 90% of memory, all of which are pending response data
  • Monitoring indicators:queue_sizeThe indicator continues to be high and does not fall

▌ Correct configuration

// Set queue upper limit + reject policy
 new ThreadPoolExecutor(10, 50,
     60L, ,
     new ArrayBlockingQueue<>(1000),
     new ());

Pit 5: The ghost in the MyBatis connection pool

▌ Fatal code (query user conversation history)

public List<ChatRecord> getHistory(Long userId) {
     SqlSession session = ();
     try {
         return ("queryHistory", userId);
     } finally {
         // Forget() → Connection pool is gradually exhausted
     }
 }

▌ Leaked evidence

  • Druid monitoring panel shows that the number of active connections reaches the maximum
  • Log error:Cannot get connection from pool, timeout 30000ms
  • Heap analysis:SqlSessionThe number of instances increases abnormally

▌ Correct posture

// Use try-with-resources to close automatically
 try (SqlSession session = ()) {
     return ("queryHistory", userId);
 }

The Sixth Pit: The Gentle Trap of the Third Party Library

▌ Problem code (caches user preferences)

// Incorrect configuration when using Ehcache
 CacheConfiguration<Long, UserPreference> config = new CacheConfiguration<>()
     .setName("user_prefs")
     .setMaxEntriesLocalHeap(10000); // Only the quantity is set, no expiration time is set!

▌ Memory Symptoms

  • GC log shows that seniors grow 3% per week
  • Arthas Monitoring:watch getCachedUserReturn object survival time exceeds 7 days
  • OOM is triggered during pressure measurement, and a large number of them are found in the heap.UserPreferenceObject

▌ Correct configuration

(3600) // 1 hour expires
       .setDiskExpiryThreadIntervalSeconds(60); // Expiry check interval

Seventh pit: ThreadLocal won't clean after use

▌ Fatal code (user context pass)

public class UserContextHolder {
     private static final ThreadLocal<User> currentUser = new ThreadLocal<>();

     public static void set(User user) {
         (user);
     }

     // Missing remove method!
 }

▌ Memory exception

  • After thread pool reuse, old user data accumulation in ThreadLocal
  • MAT Analysis:UserThe object isThreadLocalMapStrong quotes cannot be released
  • Monitoring discovery: Each thread holds an average of 50 expired user objects

▌ Repair plan

// It must be cleaned after use!
 public static void remove() {
     ();
 }

 // Force cleaning in the interceptor
 @Around("execution(* ..*.*(..))")
 public Object clearContext(ProceedingJoinPoint pjp) throws Throwable {
     try {
         return ();
     } finally {
         (); // The key!
     }
 }

Ultimate troubleshooting toolbox

1. Arthas three-combo

# Real-time monitoring of GC situation
 dashboard -n 5 -i 2000

 # Track the frequency of suspicious method calls
 trace addCacheEntry -n 10

 # Dynamically modify log level (no restart required)
 logger --name ROOT --level debug

2. Three tricks for MAT analysis

  • Dominator Tree: Expose the memory devourer
  • Path to GC Roots: Follow the clues to find the murderer
  • OQL Black Technology
    SELECT * FROM  WHERE size > 10000
    SELECT toString(msg) FROM  WHERE  LIKE "%OOM%"
    

3. Online fire extinguishing order package

# Quickly view the heap memory distribution
 jhsdb jmap --heap --pid <PID>

 # Ranking of count objects
 jmap -histo:live <PID> | head -n 20

 # Force trigger Full GC (use with caution!)
 jcmd <PID>

Twelve military regulations on preventing leakage

  1. All caches must be set to double insurance: Expiration time + capacity limit
  2. IO operation triple protection
    try (InputStream is = ...) { // First level
         useStream(is);
     } catch (IOException e) { // Second level
         ("IO exception", e);
     } finally { // The third level
         cleanupTempFiles();
     }
  3. Four principles of thread pool
    • No unbounded queues
    • No unreasonable core numbers
    • Don't ignore rejection policies
    • Do not store magnified objects
  4. Spring component three checks
    • Check event listener reference chain
    • Check the collection class in a singleton object
    • Check the thread pool configuration of @Async annotation
  5. Third-party library two-inspection
    • Verify connection pool return mechanism
    • Verify the default configuration of cache
  6. Key points of code review
    • A collection of all static modifications
    • All close()/release() call points
    • Where all internal classes hold external references

Operation and Maintenance Lao Fan’s Pit Avoidance Diary

2024-03-20 2 am
"Xiao Wang, do you know why I have so little hair?
Back then, someone saved the user session in ThreadLocal and did not clean it up.
As a result, when 100,000 online users are online at the same time—
That memory leaks faster than a barber shop fader! "


Self-test question: Can you see where this code will leak?

// Dangerous code!  Please find out three leaks
 public class ModelLoader {
     private static List<Model> loadedModels = new ArrayList<>();
    
     public void load(String path) {
         Model model = new Model(((path)));
         (model);
         ()
                  .scheduleAtFixedRate(() -> (), 1, 1, HOURS);
     }
 }

The answer is revealed

  1. There is no cleanup mechanism for static collection
  2. Timed task thread pool is not closed
  3. Anonymous internal class holds Model strong reference