preamble
classification tree
The query function can be found everywhere in various business systems, especially in e-commerce systems.
But it's such a simple classification tree query function that we optimized the5
Times.
What the hell is going on?
contexts
Our website uses theSpringBoot
Recommended template engine:Thymeleaf
that performs dynamic rendering.
It is an XML/XHTML/HTML5 template engine for application development in Web and non-Web environments .
It provides an optional module for integrating SpringMVC in application development , we can use Thymeleaf to completely replace the JSP or other template engine , such as Velocity \ FreeMarker and so on.
Front-end developers write Thymeleaf template files, call the back-end interface to get the data, dynamic binding, you can display the desired content to the user.
As this was a new project from 0-1, in order to open the rapid development of functionality, our first version of the interface, directly from the database to query thecategorization
data, assembled intoclassification tree
, which is then returned to the front end.
In this way, the data flow was simplified and the entire page functionality was quickly dialed in.
1st optimization
We deployed the interface to a dev environment and at first there was no problem.
As developers add more and more categories, performance bottlenecks are quickly exposed.
We had to do optimization.
Our first thought:Add Redis Cache
。
The flowchart is shown below:
So it's optimized this way for now:
- When the user accesses the interface to get the classification tree, the data is first queried from Redis.
- Direct data if there is data in Redis.
- If there is no data in Redis, the data is then queried from the database and stitched together to return a classification tree.
- Take the data from the database for the classification tree, save it to Redis, and set an expiration time of 5 minutes.
- Returns the classification tree to the user.
We have defined a key in Redis, the value is a classification tree in json format converted to a string, using a simple key/value form to save data.
After this optimization, the debugging and self-testing of the dev environment was successfully completed.
2nd optimization
We've deployed this feature to the st environment.
At the beginning of the test students did not find any problems, but with the back of the continuous in-depth testing, every once in a while there is a home page access is very slow.
So, we immediately performed the 2nd optimization.
We decided to useJob
periodicalsynchronous
Updating the classification tree into Redis will generate a copy of the data first before the system goes live.
Of course, to be on the safe side, to prevent Redis from suddenly hanging up at any point, the previous logic of synchronously writing the classification tree to Redis is still retained.
So the flowchart was changed to this:
Added a job to execute every 5 minutes to query the categorized data from the database, encapsulate it into a categorized tree, and update it into the Redis cache.
The rest of the process remains the same.
Also, the Redis expiration time was previously set to 5 minutes, but now it's going to be changed to permanent.
After this optimization, the st environment has not had any further performance issues with sorted tree queries.
3rd optimization
After testing for a while, the whole site is about to go live.
To be on the safe side, we need to do a stress test on the home page of the site.
Sure enough, I measured the problem, and the max qps for the home page of the site was more than 100, and finally realized that it was the performance bottleneck of the home page of the site caused by fetching the sorting tree from Redis every time.
We need to do a 3rd optimization.
What to optimize?
A: Add memory cache.
If you add a memory cache, you need to consider data consistency.
The memory cache is stored on the server node, and the frequency of updates may vary a bit from server node to server node, which may lead to data inconsistency.
However, the categories themselves are data that are updated less frequently and are less sensitive to the user, so even if the user sees some differences in the category tree over a short period of time, it won't have much of an impact on the user.
Therefore, business scenarios such as classification trees are amenable to in-memory caching.
So, we used Spring's recommendedcaffine
as a memory cache.
The flowchart of the revamped process is shown below:
- Users accessing the interface are changed to first query data from the local cache taxonomy count.
- If the local cache is available, it is returned directly.
- If the local cache is not available, the data is queried from Redis.
- If there is data in Redis, it updates the data to the local cache and then returns the data.
- If there is no data in Redis either (indicating that Redis hangs), the data is queried from the database, updated to Redis (in case Redis recovers), and then updated to the local cache, returning the returned data.
It should be noted that you need to change the local cache to set an expiration time, which is set to 5 minutes, otherwise, there is no way to get new data.
After optimizing in this way, the site's home page stress test was done again, and the qps was raised to over 500, meeting the requirements for going live.
4th optimization
After that, the feature went live without a hitch.
Used it for a long time without problems.
One day, two years later, a user gave feedback that the front page of the site was a bit slow.
We troubleshooted the cause and found that the classification tree had too much data and returned tens of thousands of categories at once.
It turns out that in the two years or so that the system has been online, the operations students have added a lot of categories to the backend of the system.
We need to do a 4th optimization.
How do you optimize at this point?
Limit the number of classification trees?
A: It's also not very realistic, the current business scenario is that there are so many categories, you can't let the user can't choose the category he wants, right?
The fastest way we could think of at this point was to turn onnginx
(used form a nominal expression)GZip
Function.
Let the data be compressed before it is transmitted, and then transmit it, before the userbrowser (software)
In the middle, it automatically decompresses and displays the real classification tree data to the user.
The size of the classification tree returned by the previous call to the interface was 1MB. After the optimization, the size of the classification tree returned by the interface is 100Kb, which is 10 times smaller at once.
After this simple optimization, the performance improved a bit.
5th optimization
After the above optimization, users didn't feedback performance issues for a long time.
But one day the company's coworkers were troubleshooting the big key in Redis when they uncovered the classification tree. The previous classification tree used the key/value structure to save the data.
We had to do a 5th optimization.
To optimize the size of the data stored in Redis, we first need to slim down the data.
Save only the fields that need to be used.
Example:
@AllArgsConstructor @Data public class Category { private Long id; private String name; private Long parentId; private Date inDate; private Long inUserId; private String inUserName; private List<Category> children; }
Fields like inDate, inUserId and inUserName in this categorized object are available without saving.
Modify the automatic name.
Example:
@AllArgsConstructor @Data public class Category { /** * :: Classification number */ @JsonProperty("i") private Long id; /** * :: Classification hierarchy */ @JsonProperty("l") private Integer level; /** * :: Classification name */ @JsonProperty("n") private String name; /** * Parent classification number */ @JsonProperty("p") private Long parentId; /** * :: List of subcategories */ @JsonProperty("c") private List<Category> children; }
Since the field names are fixed for each of the 10,000+ pieces of data, their duplication rate is too high.
From this, the json serialization can be changed to a short name to make it easier to return a smaller data size.
This is not enough, compression needs to be done on the stored data.
The key/value previously saved in Redis, where the value is a string in json format.
indeedRedisTemplate
Support, value savingbyte array
。
First, the json string data is converted into a json string using theGZip
The tool class is compressed into a byte array and then saved to Redis.
When fetching the data again, the byte array is converted to a json string, which is then converted to a classification tree.
After this optimization, the data size of the classification tree saved to Redis is reduced by a factor of 10 at once, and the big key problem of Redis is solved.
One final note (ask for attention, don't patronize me)
If this article is helpful to you, or if you are inspired, help scan the QR code below to pay attention to it, your support is my biggest motivation to keep writing.
Ask for a one-click trifecta: like, retweet, and watch at.
Concerned about the public number: [Su San said technology], in the public number reply: into the big factory, you can get free access to my recently organized 100,000 words of the interview dictionary, a lot of partners rely on this dictionary to get a number of big factory offers.