HDFS Important Mechanisms for Checkpoint

Core concepts

The hdfs checkpoint mechanism is critical to the protection of namenode metadata, and the proper completion of checkpoints is a key factor in evaluating the health of hdfs clusters andexposuresKey Indicators

editslog : a transaction log of hdfs operations, similar to wal, the edit log file starts with edits_ followed by a txid range, and multiple edit logs are connected to each other at the beginning and end, the name of the edit log that is being used is edits_inprogress_txid ()
fsimage: file system metadata, snn will regularly merge the metadata in memory to generate a new fsimage, the default will save two fsimage files, the file format is fsimage_txid ()
seen_txid: records the transaction ID after the last checkpoint or edit rollback (rollback of the edits_inprogress_xxx file to a new Edits file), mainly used to check if there is any loss of Edits files during NameNode startup.

Pasted image

The checkpoint will merge the old fsimage with the edit log to create a new fsimage.

Pasted image

The checkpoint trigger is controlled by three parameters

The checkpoint process for an HA cluster

Pasted image

Here the standby namenode is called SBNN and the active namenode is called ANN.

SBNN to see if the conditions for creating a checkpoint are met (time interval since last checkpoint is greater than or equal to
The number of transactions in the edits log has reached the limit.)
SBNN saves the current state in memory as a new file named fsimage.ckpt_txid. where txid is the ID of the last transaction in the last edit log (transaction ID, excluding inprogress). Then create an MD5 file for that fsimage file and rename the fsimage file to fsimage_txid.
SBNN sends an HTTP GET request to ANN. The request contains the domain name of SBNN, the port and the txid of the new fsimage.
Once the ANN receives the request, it uses the information obtained to in turn send another HTTP GET request to the SBNN for a new fsimage file. When this new fsimage file is transferred to the ANN, it is also named fsimage.ckpt_txid and an MD5 file is created for it. It is then renamed to fsimage_txid and the checkpoint process is complete.

production practice

For large-scale clusters, if a checkpoint is not completed successfully for a long time, a large number of editlog files will accumulate. When restarting the namenode, the editlog must be played back to bring the in-memory directory tree back up to date. Playing back the editlog must be done file by file, and if a large number of editlog files have been accumulated, this process can take up to three hours or more. Increasing the memory of the namenode can speed up the process.

edit File Stacking

If there is a buildup of edit log files over time, you can enter safe mode and run the saveNamespace command manually to perform a merge. However, in online environments, you cannot enter safe mode, so you can trigger a checkpoint by restarting the standynamenode.

I have encountered an online problem, due to the problem of the ann lock leads to sbnn can not put fsimage to ann, restart sbnn also can not complete the final completion of the checkpoint, at this time you can wait for sbnn namespace normal startup, and then carry out a master and backup switchover, so that before the lock of the ann into the sbnn, then restart the node! then restart the node, then you can complete the checkpoint, the pile of edit files can also be cleaned up!

Manual checkpoint methods

hdfs dfsadmin -fs 10.0.0.26:4007 -safemode enter
hdfs dfsadmin -fs 10.0.0.26:4007 -saveNamespace
hdfs dfsadmin -fs 10.0.0.26:4007 -safemode leave
hdfs dfsadmin -safemode forceExit //force exit safe mode

Monitoring checkpoint

There are two important indicators.

TransactionsSinceLastCheckpoint Indicates the number of transactions since the last checkpoint, it is recommended to exceed 3,000,000 alerts.
LastCheckpointTime The time of the last checkpoint, it is recommended to alert if more than 12 hours have passed since the current time.

Restart the NameNode with caution

hdfs needs to load edit logs in the reboot process, and if edit logs are left behind, it will be necessary to load the edit logs.Unnoticed errorsThe hdfs will not be able to start and finish, resulting in a production accident.

The more common causes are.
Mistakenly deleted editslog, JournalNode node has power failure, data directory disk is full, network continues to be abnormal, etc.

Common errors are as follows.

: Gap in transactions. Expected to be able to read up until at least txid 813248390 but unable to find any edit logs containing txid 363417469

canDynamically Enable DEBUG Log LevelLocate the location of the error message

Solution:

Check the other JournalNode's data directory or NameNode data directory for a contiguous file of contiguous edits associated with that serial number. If you can find one, copy a contiguous edits file to the JournalNode.
utilizationnamenode recovery mode Skip edits error
utilizationedits viewer Fix buggy edit files
Recover standy nn from active nn
If the above cannot be resolved, onlyRecover namenode from fsimage to bring the namenode online.

High-frequency fault issue.
/jira/browse/HDFS-15175