Location>code7788 >text

Based on Zookeeper, the master selection and heartbeat detection of scheduling tasks are realized

Popularity:914 ℃/2025-04-14 17:11:54

In the microservice architecture, ZooKeeper is used to implement distributed task scheduling and master selection, and ensure that the Follower node can monitor the Master status in real time and trigger re-election in a timely manner. It can be implemented through the following scheme:


1. Core design principle

1. ZooKeeper feature utilization

ZK function Application in selecting master
Temporary Node (EPHEMERAL) Master creates a temporary node, and the node will be automatically deleted when the session is disconnected (equivalent to heartbeat detection)
Watcher mechanism Follower listens for changes in Master nodes
Sequential Node (SEQUENTIAL) Achieve fair election sorting

2. Status monitoring process

sequenceDiagram participant master participant Follower1 participant Follower2 participant ZK Master->>ZK: Create /master_leader temporary node Follower1->>ZK: Listen to/master_leader node Follower2->>ZK: Listen to/master_leader node Note over Master: Refresh sessions regularly during normal operation Master--xZK: Session timeout disconnects ZK->>Follower1: Trigger NodeDeleted event ZK->>Follower2: Trigger NodeDeleted event Follower1->>ZK: Try to create a new /master_leader node ZK-->>Follower1: Create successfully and become a new master Follower2->>ZK: Listen to the new /master_leader node

2. Complete implementation plan

1. Add dependencies

<!-- Curator client (recommended) -->
 <dependency>
     <groupId></groupId>
     <artifactId>curator-recipes</artifactId>
     <version>5.5.0</version>
 </dependency>

2. Master selection service implementation

import ;
 import ;
 import ;
 import ;
 import ;

 import ;
 import ;

 @Component
 public class ZkLeaderElection {

     private final CuratorFramework zkClient;
     private LeaderSelector leaderSelector;
     private volatile boolean isLeader = false;

     public ZkLeaderElection(CuratorFramework zkClient) {
          = zkClient;
     }

     @PostConstruct
     public void init() throws Exception {
         leaderSelector = new LeaderSelector(zkClient, "/scheduler/leader",
             new LeaderSelectorListener() {
                 @Override
                 public void takeLeadership(CuratorFramework client) throws Exception {
                     // Logic after becoming a leader
                     isLeader = true;
                     ("The current node is elected as Leader");
                     try {
                         while (true) {
                             (1000); // Simulation continuous work
                         }
                     } finally {
                         isLeader = false;
                     }
                 }

                 @Override
                 public void stateChanged(CuratorFramework client, ConnectionState newState) {
                     //Connection status change processing
                     if (newState == ) {
                         isLeader = false;
                     }
                 }
             });

         (); // Automatically re-participate in the election
         ();
     }

     @PreDestroy
     public void shutdown() {
         if (leaderSelector != null) {
             ();
         }
     }

     public boolean isLeader() {
         return isLeader;
     }
 }

3. Enhanced status monitoring (production level)

// Add the following logic in the init() method
 public void init() throws Exception {
     // ...Original code...
    
     // Add additional heartbeat detection
     ().addListener((client, newState) -> {
         if (newState == ) {
             // Force check the Leader status after reconnection
             checkLeaderStatus();
         }
     });
    
     // Start the scheduled check task
     ()
         .scheduleAtFixedRate(this::checkLeaderStatus, 0, 5, );
 }

 private void checkLeaderStatus() {
     try {
         if (().forPath("/scheduler/leader") == null) {
             ("Leader node does not exist, triggers re-election");
         }
     } catch (Exception e) {
         ();
     }
 }

3. Key optimization points

1. Dual Watch Mechanism

// In addition to the built-in listening of LeaderSelector, additional data Watch is added
 ().usingWatcher((Watcher) event -> {
     if (() == ) {
         ("Leader node is deleted, and the election is triggered immediately");
     }
 }).forPath("/scheduler/leader");

2. Election performance optimization

parameter Recommended value illustrate
sessionTimeoutMs 10000-15000ms Adjust according to network conditions
() Must be enabled Ensure that nodes re-enter the election after exiting
1000ms Delayed for the first retry

3. Failover time control

// Optimize in ZK configuration
 @Bean
 public CuratorFramework zkClient() {
     Return ()
         .connectString("zk1:2181,zk2:2181,zk3:2181")
         .sessionTimeoutMs(15000) // Session timeout
         .connectionTimeoutMs(5000) // Connection timeout
         .retryPolicy(new ExponentialBackoffRetry(1000, 3)) // Retry the policy
         .build();
 }

Failover time= Session timeout + Election time (usually controlled within 15 seconds)


4. Production environment suggestions

1. Monitoring indicators

Metric Name Collection method Alarm threshold
ZK elections ZK'sleader_electioncounter >5 times within 1 hour
Master's survival time Timestamp in node data 3 times in a row <30 seconds
Node connection status Curator event listening RECONNECTED status lasts >1 minute

2. Deployment architecture

[Microservice example 1] [Microservice example 2] [Microservice example 3]
       | | |
       +-------------------------------+
                    |
            [ZooKeeper Ensemble]
                    |
             [Monitoring System (Prometheus + Grafana)]

3. Exception scene processing

  • Schizoar protection: ZK enabledquorumMechanism (at least 3 nodes)
  • Network partition: Cooperate with Sidecar agent to detect the real network status
  • Persistence Issues: Regular backup/schedulerNode data

5. Integrate with Spring Cloud

1. Health Check Endpoint

@RestController
@RequestMapping("/leader")
public class LeaderController {
    
    @Autowired
    private ZkLeaderElection election;

    @GetMapping("/status")
    public ResponseEntity<String> status() {
        return () 
            ? ("MASTER")
            : ("FOLLOWER");
    }
}

2. Scheduling task example

@Scheduled(fixedRate = 5000)
 public void scheduledTask() {
     if (()) {
         ("Only the tasks executed by Master...");
     }
 }

6. Comparison of Redisson Solution

Dimension ZooKeeper solution Redisson Solution
Real-time Seconds (relying on ZK session timeout) Seconds (relying on Redis TTL)
reliability High (CP system) (Rely on Redis persistence)
Operation and maintenance complexity Higher (requires maintenance of ZK cluster) Lower (multiplex Redis)
Applicable scenarios A system with strong consistency requirements Scenarios that allow brief split brain

Through the above solution, your microservice can be implemented:

  1. Second-level fault detection: Based on ZK temporary nodes and Watcher mechanism
  2. Automatically and quickly select the master: Election algorithm using Curator
  3. Production-grade reliability: Multiple monitoring and protection mechanisms
  4. Seamless integration of Spring ecosystem:and@ScheduledComponents work together