Why print a log?
1. Monitoring system operation
Regularly reviewing system logs is an important means of knowing if services are running properly. Logs provide operations personnel with critical information to monitor the status of the system in real time and identify potential problems.
2. Troubleshooting (e.g., exception stacks)
Logs record detailed error information, especially the exception stack, which helps to quickly locate the root cause of the problem. For some occasional bugs, logs are the only way to troubleshoot them, especially in production environments.
With try-catch it is important to note that no exceptions can be eaten without special circumstances.
3. Performance monitoring
By recording the elapsed time of specific operations, we can monitor the system performance in real time, which helps us to identify performance bottlenecks in time and optimize the operations that take a long time.
Java can use the StopWatch tool.
4. Network attack monitoring
Security information in logs is critical for timely detection of network attacks. Firewalls or other security components record attacks, which helps us respond quickly by taking measures such as enhancing protection mechanisms or switching service nodes.
What operations need to print logs?
1. User operations
in particularBack-office management The logs of the operation of the system need to be recorded in detail. This not only helps to understand how the system is being used, but also provides a basis for recourse in the event of problems.
For example.Users may not recognize that they have performed certain actionsThe information such as operation time, IP address, and specific tasks performed can be used as strong evidence through the logs.
2. Timed missions
It is very important to keep a log of the execution of a timed task to help us confirm whether the task ran on time, whether it was executed successfully, and whether there were any exceptions.
3. External requests
When the system interacts with external interfaces, the recordsRequest Parameters cap (a poem)Response content is crucial.
With these logs, we can not only verify that the external system is returning the correct data, but we can also ensure that our system is working as expected.
What to print?
1. Required ID information
The log should containRequired identifiers(e.g., user IDs, transaction IDs, etc.), and this information is critical to the rapid localization of problems. Recording identifiers helps us trace operations, identify the originator of an operation, and quickly locate the source of a problem when it occurs.
2. Readability of information
Simply recording ID information may be unintuitive and make subsequent analysis difficult. Therefore, it is best toAlso record relevant readable informatione.g.username
(username) anduser_id
For example, recording a user's ID separately (e.g.user_id=12345
) may not directly help us understand which user is having the problem, and by also recording theusername
(e.g.username=Zhang San
), on the other hand, makes problem localization more intuitive and easier to deal with.
What are the main things to look out for?
1. Printing frequency
Too high a log printing frequency will lead to rapid growth of log files, occupy a lot of disk space, and even affect system performance. Therefore, the frequency of logging should be reasonably controlled to avoid recording unnecessary information.
2. Sensitive information
Logs may contain sensitive information (e.g. user passwords, payment information, etc.). While most technicians do not actively access this information, log leaks can have a serious impact on a company's reputation. To prevent leakage of sensitive information, logging such data should be avoided. The value of sensitive information can be replaced by matching the key of an object based on the keyword****
to avoid this risk.
How do I view and use the logs?
1. Implementation of link tracing
This is accomplished by adding theTAG(tags), it is easy to filter and track the execution of an action.
For example, each user request or business operation can be assigned a uniquerequest ID maybetransaction IDand record it as a TAG in the log so that the execution of the same request can be traced across different systems and modules.
2. Keyword search
Log files usually contain a large amount of information, especially in highly concurrent environments where the amount of logs can be very large. To quickly locate and troubleshoot problems, we can utilize theKeyword Search Function.
For example, if the system reports an error with an exception messageNullPointerException
We can search for the keyword in the log to quickly locate the specific location of the error, and then narrow down the scope of investigation and quickly solve the problem.
Or, during manual scanning, you can search for Exception or error to quickly filter the exception information to see if there are any exceptions that don't meet expectations.
File logs or database logs?
-
running log: In general, runtime logs are written directly to files. The file log records the overall running status of the system, which facilitates monitoring and performance tuning.
-
operation log: For higher-level logs such as user operations and business processing, it is recommended that they be written to a database. This not only facilitates retrieval and analysis, but also generates reports and facilitates long-term preservation and auditing.
Why are log files sliced? How to slice?
Log files cannot grow indefinitely or they will interfere with viewing and management. For example, when viewing logs for a specific date, if all logs are grouped together in one file, the file will become huge and difficult to read, and opening the file may also lag.
It also becomes difficult to delete old logs because file locking affects the writing of new logs.
log slicing, also known as log rotation, is usually sliced based on time (e.g., by day, by month) or file size. The most common way to slice is to combine date and serial number: when the file reaches a certain size, a new log file is created and named by serial number.
How are massive logs monitored?
For a single simple application, the amount of logs is relatively small and can be downloaded directly and checked manually. However, for complex systems with microservice architecture, the log volume is huge and scattered, and manual scanning is obviously impractical.
-
Centralized log management: Log collection tools can be used (e.g.ELK Stack) Store and manage logs centrally. In this way, logs of all microservices are collected in a unified way for easy analysis and monitoring.
-
Need for log collection: Why not just write logs to centralized storage in the first place? Because this can lead to a single point of failure and potentially increase network latency. A better approach would be for the service to write the logs to a local file first, and then process and transfer them through a specialized collector (e.g. Filebeat, Logstash).
-
Monitoring Logs with Kafka: For high-frequency log data, logs can be transferred to theKafka, which is then consumed and analyzed in real time by a dedicated consumer application. Consumers can examine the contents of the logs, identify serious errors or anomalies, and provide timely alerts (e.g., by notifying operations and maintenance personnel via email).
-
Monitoring Service Independence: Why is it not recommended to embed log monitoring logic directly in the service code? Because the monitoring service should be separated from the application service so that it is easy to update, extend and follow the decoupling design principle. A separate monitoring service allows for flexible configuration and management.
summarize
Logging is an indispensable part of any system. A reasonable logging strategy not only helps developers to efficiently troubleshoot problems, but also helps operation and maintenance teams to monitor system health, optimize performance, and ensure security. When designing logs, we should pay special attention to readability, protection of sensitive information, and reasonable control of logging frequency. Through effective log management, analysis and alert mechanisms, we can better protect the stability and security of the system.