In the microservice architecture, ZooKeeper is used to implement distributed task scheduling and master selection, and ensure that the Follower node can monitor the Master status in real time and trigger re-election in a timely manner. It can be implemented through the following scheme:
1. Core design principle
1. ZooKeeper feature utilization
ZK function | Application in selecting master |
---|---|
Temporary Node (EPHEMERAL) | Master creates a temporary node, and the node will be automatically deleted when the session is disconnected (equivalent to heartbeat detection) |
Watcher mechanism | Follower listens for changes in Master nodes |
Sequential Node (SEQUENTIAL) | Achieve fair election sorting |
2. Status monitoring process
sequenceDiagram
participant master
participant Follower1
participant Follower2
participant ZK
Master->>ZK: Create /master_leader temporary node
Follower1->>ZK: Listen to/master_leader node
Follower2->>ZK: Listen to/master_leader node
Note over Master: Refresh sessions regularly during normal operation
Master--xZK: Session timeout disconnects
ZK->>Follower1: Trigger NodeDeleted event
ZK->>Follower2: Trigger NodeDeleted event
Follower1->>ZK: Try to create a new /master_leader node
ZK-->>Follower1: Create successfully and become a new master
Follower2->>ZK: Listen to the new /master_leader node
2. Complete implementation plan
1. Add dependencies
<!-- Curator client (recommended) -->
<dependency>
<groupId></groupId>
<artifactId>curator-recipes</artifactId>
<version>5.5.0</version>
</dependency>
2. Master selection service implementation
import ;
import ;
import ;
import ;
import ;
import ;
import ;
@Component
public class ZkLeaderElection {
private final CuratorFramework zkClient;
private LeaderSelector leaderSelector;
private volatile boolean isLeader = false;
public ZkLeaderElection(CuratorFramework zkClient) {
= zkClient;
}
@PostConstruct
public void init() throws Exception {
leaderSelector = new LeaderSelector(zkClient, "/scheduler/leader",
new LeaderSelectorListener() {
@Override
public void takeLeadership(CuratorFramework client) throws Exception {
// Logic after becoming a leader
isLeader = true;
("The current node is elected as Leader");
try {
while (true) {
(1000); // Simulation continuous work
}
} finally {
isLeader = false;
}
}
@Override
public void stateChanged(CuratorFramework client, ConnectionState newState) {
//Connection status change processing
if (newState == ) {
isLeader = false;
}
}
});
(); // Automatically re-participate in the election
();
}
@PreDestroy
public void shutdown() {
if (leaderSelector != null) {
();
}
}
public boolean isLeader() {
return isLeader;
}
}
3. Enhanced status monitoring (production level)
// Add the following logic in the init() method
public void init() throws Exception {
// ...Original code...
// Add additional heartbeat detection
().addListener((client, newState) -> {
if (newState == ) {
// Force check the Leader status after reconnection
checkLeaderStatus();
}
});
// Start the scheduled check task
()
.scheduleAtFixedRate(this::checkLeaderStatus, 0, 5, );
}
private void checkLeaderStatus() {
try {
if (().forPath("/scheduler/leader") == null) {
("Leader node does not exist, triggers re-election");
}
} catch (Exception e) {
();
}
}
3. Key optimization points
1. Dual Watch Mechanism
// In addition to the built-in listening of LeaderSelector, additional data Watch is added
().usingWatcher((Watcher) event -> {
if (() == ) {
("Leader node is deleted, and the election is triggered immediately");
}
}).forPath("/scheduler/leader");
2. Election performance optimization
parameter | Recommended value | illustrate |
---|---|---|
sessionTimeoutMs | 10000-15000ms | Adjust according to network conditions |
() | Must be enabled | Ensure that nodes re-enter the election after exiting |
1000ms | Delayed for the first retry |
3. Failover time control
// Optimize in ZK configuration
@Bean
public CuratorFramework zkClient() {
Return ()
.connectString("zk1:2181,zk2:2181,zk3:2181")
.sessionTimeoutMs(15000) // Session timeout
.connectionTimeoutMs(5000) // Connection timeout
.retryPolicy(new ExponentialBackoffRetry(1000, 3)) // Retry the policy
.build();
}
Failover time= Session timeout + Election time (usually controlled within 15 seconds)
4. Production environment suggestions
1. Monitoring indicators
Metric Name | Collection method | Alarm threshold |
---|---|---|
ZK elections | ZK'sleader_election counter |
>5 times within 1 hour |
Master's survival time | Timestamp in node data | 3 times in a row <30 seconds |
Node connection status | Curator event listening | RECONNECTED status lasts >1 minute |
2. Deployment architecture
[Microservice example 1] [Microservice example 2] [Microservice example 3]
| | |
+-------------------------------+
|
[ZooKeeper Ensemble]
|
[Monitoring System (Prometheus + Grafana)]
3. Exception scene processing
-
Schizoar protection: ZK enabled
quorum
Mechanism (at least 3 nodes) - Network partition: Cooperate with Sidecar agent to detect the real network status
-
Persistence Issues: Regular backup
/scheduler
Node data
5. Integrate with Spring Cloud
1. Health Check Endpoint
@RestController
@RequestMapping("/leader")
public class LeaderController {
@Autowired
private ZkLeaderElection election;
@GetMapping("/status")
public ResponseEntity<String> status() {
return ()
? ("MASTER")
: ("FOLLOWER");
}
}
2. Scheduling task example
@Scheduled(fixedRate = 5000)
public void scheduledTask() {
if (()) {
("Only the tasks executed by Master...");
}
}
6. Comparison of Redisson Solution
Dimension | ZooKeeper solution | Redisson Solution |
---|---|---|
Real-time | Seconds (relying on ZK session timeout) | Seconds (relying on Redis TTL) |
reliability | High (CP system) | (Rely on Redis persistence) |
Operation and maintenance complexity | Higher (requires maintenance of ZK cluster) | Lower (multiplex Redis) |
Applicable scenarios | A system with strong consistency requirements | Scenarios that allow brief split brain |
Through the above solution, your microservice can be implemented:
- Second-level fault detection: Based on ZK temporary nodes and Watcher mechanism
- Automatically and quickly select the master: Election algorithm using Curator
- Production-grade reliability: Multiple monitoring and protection mechanisms
-
Seamless integration of Spring ecosystem:and
@Scheduled
Components work together