Location>code7788 >text

zk basics—3. Cluster and core parameters

Popularity:459 ℃/2025-04-02 22:34:19

Outline

How is stand-alone mode started

How clusters are deployed and started

What kind of configuration machine should be used to deploy a cluster

4. How to reasonably set the JVM parameters and memory size of zk

The core parameters of the configuration are tickTime, dataDir and dataLogDir

6. InitLimit and syncLimit, two core parameters that affect the formation of the Leader and Follower cluster.

When to take a snapshot of data (snapCount = 10000)

8. How many zk clients can be started on a machine at most (maxClientCnxns = 60)

9. How much data can be stored in a znode at most (= 1M)

10. The runtime leader and Follower will communicate through ports 2888 and 3888.

11. How to clean up transaction logs and data snapshots regularly

12. Transaction logs written in the 2PC stage also have the risk of loss

The two related parameters: leaderServers and cnxTimeout

Command instructions provided for operation and maintenance

 

How is stand-alone mode started

Next, first start the zk in stand-alone mode through bin/script, and then connect to zk through bin/script + 2181 port.

$ cd zookeeper
$ sudo bin/ start
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /Users/demo/Documents/zookeeper/bin/../conf/
Starting zookeeper ... STARTED

$ ps -ef |grep java
$ sudo bin/
/usr/bin/java
Connecting to localhost:2181
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]

 

How clusters are deployed and started

To deploy a zk cluster, you can prepare three machines, and one zk process is deployed on each machine. The zk configuration file on each machine is added as follows to make each node aware of how many machines there are in the cluster.

server.1=zookeeper01:2888:3888
server.2=zookeeper02:2888:3888
server.3=zookeeper03:2888:3888

Each node in the zk cluster has an ID number. You need to create a myid file in the specified dataDir directory and write the ID number of the zk node in it.

 

The port numbers of zk are divided into 2888 and 3888. Among them, 3888 is used for Leader elections, and 2888 is used for data synchronization.

 

What kind of configuration machine should be used to deploy a cluster

zk belongs to the infrastructure middleware system, and generally deploys fewer machines (usually 3-5 units), but it is not recommended to use too ordinary configurations. When deploying machines such as zk, Kafka, Hbase, HDFS, etc., the required configuration is relatively high.

 

A 4-core 8G machine can withstand 1,000 concurrency per second.

An 8-core 16G machine can withstand several thousand concurrency per second.

A 16-core 32G machine can withstand tens of thousands or tens of thousands of concurrency per second.

 

Generally, 3 machines (8-core 16G/16-core 32G + SSD solid-state drive) are used to deploy a zk cluster. The small cluster of three such machines can withstand more than 100,000 concurrency per second.

 

For example, 3 16-core 32G machines: 1 Leader + 2 Followers. Among them, the leader is mainly responsible for writing, which can withstand tens of thousands of writing concurrency per second. Follower is responsible for reading, which can withstand concurrency of 50,000 to 100,000 per second.

 

Although writing QPS cannot be expanded linearly, reading QPS can be expanded through the Observer node. Setting up an Observer node is simple. Just modify the configuration and force its role to be Observer, not Leader or Follower.

 

4. How to reasonably set the JVM parameters and memory size of zk

zk itself is written in Java and runs based on a JVM virtual machine. Starting zk is to start a JVM process. All the code of zk will be executed in the JVM process, so the three major parameters of the JVM need to be set reasonably.

 

1. First allocate the size of each block of memory area

Including heap memory, stack memory, and memory in the Metaspace area, it is set according to the machine memory. For example, the machine has 16G of memory, 10G of heap memory can be allocated, stack memory can be allocated 1MB of each thread's stack, and 512MB of Metaspace area can be allocated.

 

2. Then set up a garbage collector

Usually, the new generation + old-age garbage collector for Java applications will be set to ParNew + CMS, but large memory machines do not recommend using this combination, and it is recommended to use the G1 garbage collector.

 

3. Some parameters should be added before starting JVM

For example, which directory should the GC log be written into, and when OOM occurs, which directory should be automatically dumped and placed.

 

After all these three JVM parameters are set reasonably, you can start zk and start using it. In various situations and some peaks of luck, you should also use jstat to observe the JVM situation of zk: including the object growth rate of the new generation + the frequency of YGC + the growth rate of the old generation + the frequency of FGC. If there is a problem with GC, GC tuning should be performed to reasonably optimize JVM parameters. If there is a monitoring system, it is necessary to monitor the JVM situation when the zk runs.

 

The core parameters of the configuration are tickTime, dataDir and dataLogDir

Before officially launching zk, you need to configure some parameters of zk:

$ cat conf/

(1)tickTime

The default value of tickTime is 2000ms, which represents the minimum time unit in zk. Some other parameters will be set based on this tickTime, for example, some parameters are tickTime * 2.

 

(2)dataDir

Used to store data snapshots in zk. A lot of data will be stored in zk, and there will be a snapshot of memory data on disk, so that the previous data can be restored after zk is shut down and restarted.

 

(3)dataLogDir

There is a 2PC mechanism when writing data. First, we will write a Proposal proposal, that is, the transaction log. Each machine writes the transaction log to the local disk. Therefore, if you use SSD solid-state drives, you can improve your writing performance.

 

6. InitLimit and syncLimit, two core parameters that affect the formation of the Leader and Follower cluster.

(1)initLimit

The default value of initLimit is 10, which means 10 * tickTime, i.e. 20s. It means that the leader will wait for the Follower to establish a connection with it after starting, and the maximum waiting time is 20s. Therefore, within the 20s, the Follower must establish a connection with the Leader, otherwise the Leader will not wait for the Follower and will directly enter the message broadcast mode to provide services to the outside world. If the amount of data stored in zk is relatively large, it takes a long time for Follower to recover data, and this parameter can be adjusted at this time.

 

(2)syncLimit

The default value of syncLimit is 5, which means 5 * tickTime, i.e. 10s. The leader will initiate a heartbeat detection to the Follower. If there is no heartbeat response for more than 10 seconds, the Leader will kick the Follower out.

 

When to take a snapshot of data (snapCount = 10000)

The data in zk is divided into two parts: one is the transaction log on disk (dataLogDir), and the other is the data in memory.

 

The two data are theoretically consistent. Even if Follower goes down, only the data in memory is lost, and the transaction log on the disk exists. Even if the Follower fails to receive the transaction log and it goes down, you can restart and find the Leader to synchronize the data.

 

For example, zk has a transaction log on disk. How should it rebuild the data in memory after it is started? Should I play back the transaction log? Re-execute each transaction log into memory?

 

Therefore, zk has a data snapshot mechanism. After each execution of a certain number of transactions, the data in memory will be stored in the dataDir directory as a current data snapshot of zk.

 

For example, there are 1,000 transactions in the transaction log, and the memory data corresponding to 1,000 transactions has been written into the dataDir as a data snapshot. After that, when there are 1032 transactions in the transaction log, zk will restart. When zk restarts, you can directly load a snapshot of 1,000 transactions into memory, and then play back the 32 transactions after 1,000 in memory to restore the data before restart.

 

The corresponding parameter is snapCount, which is the default of 100,000 transactions to store a snapshot. It means that if zk is restarted before 100,000 transactions, then there is no data snapshot at this time, that is, no snapshot is needed within 100,000 transactions.

 

8. How many zk clients can be started on a machine at most (maxClientCnxns = 60)

How many zk clients can a machine create and how many connections can it establish with the zk server? This is limited, with a maximum of 60 by default, that is, the zk server only allows only one machine to establish 60 connections with it, but it can be set through the maxClientCnxns parameter.

 

If you do not pay attention to the singleton mode during development, create a zk client for each request to establish a connection for communication, and then destroy the zk client. When multiple concurrent requests are connected to zk together, it will cause many zk clients to be on one machine, which may eventually lead to the zk request issued by this machine being rejected by the zk server.

 

9. How much data can be stored in a znode at most (= 1M)

The maximum amount of data that can be stored in a znode is set by parameters, and the default is 1M. The corresponding value is 1048575, the unit is bytes, which is converted into M and is 1M. It is generally recommended not to store too much data in a znode, with only a few hundred bytes and hundreds of K.

 

10. The runtime leader and Follower will communicate through ports 2888 and 3888.

Port 3888 for Leader election voting in cluster recovery mode. Port 2888 for data synchronization and runtime communication between Leader and Follower.

 

11. How to clean up transaction logs and data snapshots regularly

When zk is running constantly, the transaction logs will increase, but it cannot be infinite, so zk will cut out multiple transaction log files. Each time a data snapshot is performed, there will be an independent data snapshot file. Finally, there will be multiple transaction log files and multiple data snapshot files on the disk of zk.

 

The default is that the function of timed cleaning data files is not enabled, so it should be enabled as follows. This will allow zk to automatically clean up unnecessary transaction log files and data snapshot files in the background every 1 hour, and save up to 3 files.

=1
=3

 

12. Transaction logs written in the 2PC stage also have the risk of loss

Is there a risk of log loss when a transaction is committed? Yes, there are. In the first stage of 2PC, when Proposal proposes, each machine will write the transaction log to disk, and will generally write it to OS Cache. In the second stage of 2PC, the written transaction log fsync will be forced to disk only when Commit commits.

 

Therefore, the parameter "forceSync: yes" refers to: when committing, the transaction log in OS Cache will be forced to be fsync to disk.

 

When committing, it is best to fsync the transaction logs in OS Cache to disk. Otherwise, if the machine hangs up, the transaction log data that has not been flashed into the disk in the OS Cache may be lost.

 

Although zk will involve multiple machines writing transaction logs to disk when processing write requests. However, since writing transaction logs is written sequentially, this is similar to the performance of random memory writing, so the performance is also very high. Moreover, writing transaction logs is written in OS Cache at the beginning, so the performance can be higher.

 

The two related parameters: leaderServers and cnxTimeout

(1) Parameter leaderServers, the default value is yes

Indicates whether the leader receives the client's connection. All write requests are forwarded by the Follower to the Leader. The Leader only receives write requests forwarded by the Follower.

 

(2) Parameter cnxTimeout, default 5000

During the Leader election, each machine will establish a long TCP connection based on port 3888. The timeout time for establishing a TCP connection during this process is 5000 milliseconds by default.

 

Generally speaking, before starting the zk cluster, you need to set parameters in the configuration file, such as setting the information of zk cluster server.1 and server.2, setting the directories of dataDir and dataLogDir, and turning on automatic cleaning of data files. Then use the command to start multiple nodes such as Leader and Follower respectively.

 

Command instructions provided for operation and maintenance

(1) Commonly used commands on the server

(2) Commonly used commands on the client

(3) Common ACL commands

(4) Commonly used four-character command

(5) zk log visualization

 

(1) Commonly used commands on the server

After preparing the corresponding configuration, you can directly perform relevant services through this script:

Start zk service: sh bin/ start
 Check the zk service status: sh bin/ status
 Stop zk service: sh bin/ stop
 Restart zk service: sh bin/ restart

(2) Commonly used commands on the client

You can try to connect to the zk service using -server 127.0.0.1:2181. After the connection is successful, the system will output the relevant environment and configuration information of zk.

 

Some simple operations of command line tools are as follows:

1. Display the file in the root directory: ls /
 Use the ls command to view the content contained in the current zk;

 2. Display the file in the root directory: ls2 /
 Check the current node data and see data such as updates;

 3. Create a file and set the initial content: create /zk "test"
 Create a new znode node "zk" and the string associated with it;
 [-e] [-s] (-e temporary node) (-s sequential node)

 4. Get the file content: get /zk
 Confirm whether the ZNode contains the string we created, [watch](watch listen);

 5. Modify the file content: set /zk "zkbak"
 Set the string associated with zk;

 6. Delete file: delete /zk
 Delete the ZNode you just created, if there is a child node that fails to delete;

 7. Recursive deletion: rmr or deleteall /zk
 Delete the ZNode you just created and delete the child nodes at the same time;

 8. Exit the client: quit; Help command: help

Nodes with the same name cannot be created:

[zk: 127.0.0.1:2181(CONNECTED) 0] create /node1 value
Created /node1
[zk: 127.0.0.1:2181(CONNECTED) 1] ls /
[aaa, node1, zookeeper]
[zk: 127.0.0.1:2181(CONNECTED) 2] create /node1 value 
Node already exists: /node1

Create a temporary node to use -e:

[zk: 127.0.0.1:2181(CONNECTED) 5] create -e /node3 value
Created /node3
[zk: 127.0.0.1:2181(CONNECTED) 7] ls /
[aaa, node1, node2, node3, zookeeper]

After creating the temporary node node3, exit the client and the server connection, and then the client connects to the server again, and then executes ls to view it, and finds that node3 is gone.

[zk: 127.0.0.1:2181(CONNECTED) 8] quit
$ ./ -server 127.0.0.1:2181
[zk: 127.0.0.1:2181(CONNECTED) 0] ls /
[aaa, node1, node2, zookeeper]

Whether it is a persistent node or a temporary node, once a node is created, it is unsuccessful to repeatedly create a node with the same name.

[zk: 127.0.0.1:2181(CONNECTED) 0] ls /
[aaa, node1, node2, zookeeper]
[zk: 127.0.0.1:2181(CONNECTED) 1] create /seqnode abc
Created /seqnode
[zk: 127.0.0.1:2181(CONNECTED) 2] create /seqnode abc
Node already exists: /seqnode

[zk: 127.0.0.1:2181(CONNECTED) 3] create -e /nodee aaa
Created /nodee
[zk: 127.0.0.1:2181(CONNECTED) 4] create -e /nodee aaa
Node already exists: /nodee

Create sequential nodes using -s, you can repeatedly create sequential nodes with "same name" (the command line looks like the same name, but it is not actually the same name).

[zk: 127.0.0.1:2181(CONNECTED) 6] create -e -s /nodes aaa
Created /nodes0000000007
[zk: 127.0.0.1:2181(CONNECTED) 7] create -e -s /nodes aaa
Created /nodes0000000008

To modify node content use the set command:

[zk: 127.0.0.1:2181(CONNECTED) 9] get /aaa
ccc
[zk: 127.0.0.1:2181(CONNECTED) 10] stat /aaa
cZxid = 0x2
ctime = Wed Feb 09 16:01:50 CST 2022
mZxid = 0x5
mtime = Wed Feb 09 16:03:01 CST 2022
pZxid = 0x8
cversion = 2
dataVersion = 2
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 3
numChildren = 2
[zk: 127.0.0.1:2181(CONNECTED) 11] set /aaa ccc
[zk: 127.0.0.1:2181(CONNECTED) 12] get /aaa
ccc
[zk: 127.0.0.1:2181(CONNECTED) 13] stat /aaa
cZxid = 0x2
ctime = Wed Feb 09 16:01:50 CST 2022
mZxid = 0x17
mtime = Wed Feb 09 16:53:46 CST 2022
pZxid = 0x8
cversion = 2
dataVersion = 3
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 3
numChildren = 2

Use delete (cannot delete children) or deleteall (can be recursively deleted):

[zk: 127.0.0.1:2181(CONNECTED) 21] delete /aaa
Node not empty: /aaa
[zk: 127.0.0.1:2181(CONNECTED) 22] ls /aaa
[aaa_1, aaa_2]
[zk: 127.0.0.1:2181(CONNECTED) 23] delete /aaa/aaa_1
[zk: 127.0.0.1:2181(CONNECTED) 24] ls /aaa
[aaa_2]
[zk: 127.0.0.1:2181(CONNECTED) 25] delete /aaa
Node not empty: /aaa
[zk: 127.0.0.1:2181(CONNECTED) 26] rmr /aaa
The command 'rmr' has been deprecated. Please use 'deleteall' instead.

(3) Common ACL commands

zk's ACL is divided into three dimensions: scheme, id, and permission, which is usually expressed as: scheme:id:permission. where schema represents authorization policy, id represents user, and permission represents permission.

[zk: 127.0.0.1:2181(CONNECTED) 1] getAcl /seqnode
'world,'anyone
: cdrwa

1.Scheme (authorization policy)

world: the default method, equivalent to access by the whole world;
 auth: represents the user who has passed the authentication, and the password here is plain text;
 digest: that is, user name: password authentication, which is also the most commonly used in business systems;
 ip: Use IP address authentication;

2.id(user)

id is the verification mode, and the values ​​of id are also different for different schemes;

When the scheme is auth: id is username:password, the password here is plain text
 When the scheme is digest: id is username: BASE64(SHA1(password))
 When scheme is ip: id is the client's ip address
 When scheme is world: id is anyone

3.Permission (Permission)

CREATE, READ, WRITE, DELETE, ADMIN, that is, the permissions for adding, deleting, modifying, checking, and managing. These five permissions are abbreviated as crwda (that is, the abbreviation of each word).

CREATE(c): permission to create child nodes;
 DELETE(d): Delete permissions of nodes;
 READ(r): permission to read node data;
 WRITE(w): permission to modify node data;
 ADMIN(a): Set permissions for child node permissions;

IV.ACL commands

Command 1: getAcl

Get the ACL information of the specified node:

create /testDir/testAcl deer # Create a child node
 getAcl /testDir/testAcl # Get the acl permission information of this node
 [zk: 127.0.0.1:2181(CONNECTED) 1] create /testDir testDir
 Created /testDir
 [zk: 127.0.0.1:2181(CONNECTED) 2] create /testDir/testAcl deer
 Created /testDir/testAcl
 [zk: 127.0.0.1:2181(CONNECTED) 3] getAcl /testDir/testAcl
 'world,'anyone
 : cdrwa

Command 2: setAcl

Set the ACL information for the specified node:

setAcl /testDir/testAcl world:anyone:crwa # Set the acl permissions for this node
 getAcl /testDir/testAcl # Get the acl permission information of the node. After success, the node will lose the d permissions
 create /testDir/testAcl/xyz xyz-data # Create child nodes
 delete /testDir/testAcl/xyz # Since there is no d permission, it is impossible to delete

 [zk: 127.0.0.1:2181(CONNECTED) 3] getAcl /testDir/testAcl
 'world,'anyone
 : cdrwa
 [zk: 127.0.0.1:2181(CONNECTED) 4] setAcl /testDir/testAcl world:anyone:crwa
 [zk: 127.0.0.1:2181(CONNECTED) 5] getAcl /testDir/testAcl
 'world,'anyone
 : crwa
 [zk: 127.0.0.1:2181(CONNECTED) 6] create /testDir/testAcl/xyz xyz-data
 Created /testDir/testAcl/xyz
 [zk: 127.0.0.1:2181(CONNECTED) 7] delete /testDir/testAcl/xyz
 Authentication is not valid : /testDir/testAcl/xyz

Command 3: addauth

Register session authorization information. If you set permissions for ZNode and log out of the client later, you need to register and authorize again when logging in to have ZNode permissions.

//Scheme's auth
 addauth digest user1:123456 # You need to add a user first
 setAcl /testDir/testAcl auth:user1:123456:crwa # Then you can use this user to set permissions
 getAcl /testDir/testAcl # Passwords are stored in ciphertext
 create /testDir/testAcl/testa aaa
 delete /testDir/testAcl/test # Since there is no d permission, it is impossible to delete

 [zk: 127.0.0.1:2181(CONNECTED) 8] addauth digest user1:123456
 [zk: 127.0.0.1:2181(CONNECTED) 9] setAcl /testDir/testAcl auth:user1:123456:crwa
 [zk: 127.0.0.1:2181(CONNECTED) 10] getAcl /testDir/testAcl
 'digest,'user1:HYGa7IZRm2PUBFiFFu8xY2pPP/s=
 : crwa
 [zk: 127.0.0.1:2181(CONNECTED) 11] create /testDir/testAcl/testa aaa
 Created /testDir/testAcl/testa
 [zk: 127.0.0.1:2181(CONNECTED) 12] delete /testDir/testAcl/testa
 Authentication is not valid : /testDir/testAcl/testa

After exiting the client:

ls /testDir/testAcl #No access without permission
 create /testDir/testAcl/testb bbb #No access without permission
 addauth digest user1:123456 # After re-adding permission, you can access it, which is equivalent to adding user1 to the current user and the password is 123456

 [zk: 127.0.0.1:2181(CONNECTED) 13] quit
 $./ -server 127.0.0.1:2181
 [zk: 127.0.0.1:2181(CONNECTED) 0]
 [zk: 127.0.0.1:2181(CONNECTED) 0] ls /testDir/testAcl
 Authentication is not valid : /testDir/testAcl
 [zk: 127.0.0.1:2181(CONNECTED) 1] create /testDir/testAcl/testb bbb
 Authentication is not valid : /testDir/testAcl/testb
 [zk: 127.0.0.1:2181(CONNECTED) 2] addauth digest user1:123456
 [zk: 127.0.0.1:2181(CONNECTED) 3] ls /testDir/testAcl
 [testa, xyz]
 [zk: 127.0.0.1:2181(CONNECTED) 4] create /testDir/testAcl/testb bbb
 Created /testDir/testAcl/testb

The difference between auth and digest is that the former uses a plaintext password to log in, while the latter uses a ciphertext password to log in.

create /testDir/testDigest data
 addauth digest user1:123456 # Add user
 setAcl /testDir/testDigest digest:user1:HYGa7IZRm2PUBFiFFu8xY2pPP/s=:crwa # Use digest to set permissions
 setAcl /testDir/testAcl auth:user1:123456:crwa # Use auth to set permissions

Note: If you use plaintext here, the znode will be inaccessible. Obtain ciphertext through plaintext: Use zk's class DigestAuthenticationProvider.

$ java -=/Users/demo/Documents/apache-zookeeper-3.5.9-bin/lib -cp /Users/juntaowong/Documents/apache-zookeeper-3.5.9-bin/lib/zookeeper-3.5.  deer:123456
deer:123456->deer:UwPiSoSNChqcQfSgAat6bc3IzwQ=

Command 4: ACL command line ip

create  /testDir/testIp data
setAcl  /testDir/testIp ip:192.168.30.10:cdrwa
getAcl  /testDir/testIp

Command 5: ACL enables super permissions

When setAcl permission is set, only the r permission is set, resulting in the node being unable to be deleted:

create /enjoy1/node1  enjoy
setAcl /enjoy1  world:anyone:r

At this time, neither delete nor rmr has permission to delete. The solution is to enable super permission:

Step 1: Use ("super:admin"); to obtain the password
 Step 2: Modify the zkServer startup script to add: "-=super:xQJmxLMiHGwaqBvst5y6rkB6HQs="
 Step 3: Start the client and log in with the administrator: addauth digest super:admin

(4) Commonly used four-character command

zk supports certain specific four-character command letters combined with nc to interact with it, so that they can be used to obtain the current status and related information of the zk service, that is, submit corresponding commands to zk through telnet or nc.

echo stat | nc 127.0.0.1 2181, to see which node is selected as a follower or leader;
 echo ruok | nc 127.0.0.1 2181, test whether the server has been started. If the reply is imok, it means it has been started;
 echo dump | nc 127.0.0.1 2181, listing unprocessed sessions and temporary nodes;
 echo kill | nc 127.0.0.1 2181, turn off the server;
 echo conf | nc 127.0.0.1 2181, output detailed information about related service configuration;
 echo cons | nc 127.0.0.1 2181, lists the full connection/session details of all clients connected to the server;
 echo envi | nc 127.0.0.1 2181, output detailed information about the service environment (different from the conf command);
 echo reqs | nc 127.0.0.1 2181, listing unprocessed requests;
 echo wchs | nc 127.0.0.1 2181, listing the detailed information of the server watch;
 echo wchc | nc 127.0.0.1 2181, lists the details of the server watch through session, and its output is a list of sessions related to watch;
 echo wchp | nc 127.0.0.1 2181, lists the detailed information of the server watch through the path, and it outputs a path related to the session;

(5) zk log visualization

zk has two very important configurations: one is dataDir, which stores snapshot data. One is dataLogDir, which stores transaction log data. You can use zk's class LogFormatter to view logs and zk's class SnapshotFormatter to view snapshots:

$ java -cp /soft/zookeeper-3.4.12/zookeeper-3.4.:/soft/zookeeper-3.4.12/lib/slf4j-api-1.7.  log.1
$ java -cp /soft/zookeeper-3.4.12/zookeeper-3.4.:/soft/zookeeper-3.4.12/lib/slf4j-api-1.7.  log.1