Location>code7788 >text

Zookeeper Java client connection slow and timeout problem Ad-Hoc checklist

Popularity:61 ℃/2025-03-04 11:31:01

TL;DR

Stroke thoughts:

  1. First, make sure that the connectivity between your device and zookeeper is OK, you can use the commandecho srvr | nc HOST 2181, check whether the node information can be printed normally. Windows users can enter it on the command linetelnet HOST 2181Enter after connectionsrvrThen press Enter.
  2. If step 1 checks OK, but the connection is slow or timed out, then the connection is executed during the startup process connection zk.jstack -l <pid>Get thread dump information for further analysis. If you don’t want to check it one by one, you can try the following method quickly:
    1. Locally configure the hosts file, add the following entry:
      127.0.0.1   localhost 
      ::1         localhost 
      
      Notice:To replace the output of the hostname command
    2. Configure system properties when starting the Java process:-.preferIPv4Stack=true -=false

case record

Zookeeper Java client initializes the printing logging environment information, and is stuck().getCanonicalHostName()method

"ZookeeperServiceUrlProvider-1" #1 prio=5 os_prio=0 tid=0x00000000033b0800 nid=0x432c runnable [0x000000000329e000]
   : RUNNABLE
	at .(Native Method)
	at $(:928)
	at (:1323)
	at .getAllByName0(:1276)
	at .getAllByName0(:1253)
	at (:634)
	at (:588)
	at (:62)
	at (:98)
	at .<clinit>(:97)
	at ...

Solution: locally configure the hosts file and add the following entry

127.0.0.1   localhost 
::1         localhost 

Notice:To replace the output of the hostname command

Slow DNS resolution response causes SendThread on the zk client to be stuck for a long time

"ZookeeperServiceUrlProvider-1-SendThread()@9231" daemon prio=5 tid=0x1b nid=NA waiting
  : WAITING
	  at (:-1)
	  at (:502)
	  at (:1393)
	  at (:1310)
	  at .getAllByName0(:1276)
	  at .getAllByName0(:1253)
	  at (:634)
	  at (:559)
	  at (:531)
	  at $(:82)
	  at $$600(:56)
	  at (:345)
	  at $(:998)
	  at $(:1060)

Solution:

  1. Further troubleshoot the problem of slow DNS resolution response
  2. Switch to use IP address to connect

The client tries to SASL authentication to trigger DNS resolution, and the slow response of DNS resolution results in a timeout

"ZookeeperServiceUrlProvider-1-SendThread(HOST:2181)" #34 daemon prio=5 os_prio=0 tid=0x0000024d6f94a000 nid=0xa984 runnable [0x000000ff9cffe000]
   : RUNNABLE
        at .(Native Method)
        at $(:933)
        at (:618)
        at (:560)
        at (:532)
        at $(:82)
        at $$600(:56)
        at (:345)
        at $(:105)
        at (:59)
        at (:41)
        at $(:1161)
        at $(:1211)

   Locked ownable synchronizers:
        - None

Solution: Configure when starting the Java process-=falseDisable SASL (If you don't know what SASL is, you most likely don't need it)
See:SASL Client-Server mutual authentication

Reference link

  • /a/39698914

About the Author

Author: Xiao Yike, a first-line deep-root message middleware, has been working on RPC framework for many years, welcome to communicate in the comment section or by email.

WeChat official account: Xiao Yike

github id: shawyeok