Interview scenario question: a discussion about a thread pool usage scenario.

Hello, I'm Crooked.

Let's take a look at a question on thread pool usage scenarios as a scenario interview question.

Here's the thing:

It's a bit wordy, so I'll just give you a picture so you can understand:

The front-end initiates a request to generate a report page, and the data on this page is returned by multiple interfaces on the back-end. In addition, due to microservicing, the data is scattered in each microservice, so multiple downstream interfaces need to be called to get the data for integration.

When calling multiple downstream interfaces, it is possible to initiate asynchronous calls requesting different downstream interfaces at the same time because there are no data dependencies between the interfaces.

That is, there's a thread pool here:

For this thread pool, the number of core threads is only 30, the maximum number of threads is 100, and the queue length is 1000.

A request for a report page is sent, and in order to consolidate the data, more than 20 downstream interfaces are called to get the data.

In this case, in the questioner's words: 3 people open it at the same time, and it slows down accordingly, because the task exceeds the number of free core threads, and is put on the blocking queue.

The theory is indeed this theory.

What to do?

One of the most straightforward options one can think of for this scenario is definitely to expand the number of core threads.

This option was also thought of by the questioner, along with further reflection:

I think we should modify the Spring thread pool work mode, in order to allow requests to be processed as soon as possible, you can learn from the Tomcat thread pool work mode.

Spring This thread pool works by enabling core threads first, then queues, and finally non-core threads.

The Tomcat thread pool works by enabling core threads first, then non-core threads, and finally queues.

The thought process is good, but barring that, I don't think it's necessary in this scenario.

For example, for the Tomcat thread pool, your configuration is 30 core threads, 300 non-core threads, and a queue length of 1000.

In fact, when you configure the Spring thread pool, the number of core threads 300, the number of non-core threads 300, the length of the queue 1000, the effect and Tomcat thread pool is the same.

The only slight difference is in the recycling of non-core threads, but that little bit of memory footprint is so minuscule that I personally find it negligible.

In addition, in terms of thread pool configuration, besides adjusting the number of core threads, another common configuration is to modify the thread pool's rejection policy with CallerRunsPolicy, which means that when the thread pool is full, the task caller thread is allowed to execute the task.

This option was also mentioned during the discussion:

So is this program available?

You can, but you need to be aware of the existence of a dark pit.

After a task is submitted to the thread pool, there are two scenarios for Tomcat threads:

Scenario 1: The request ends, is returned to the front-end, and returns to the Tomcat thread pool to process the new request.
Scenario 2: Wait for a downstream return and then continue execution.

Let's look at Scenario 1. Think about it, in this scenario, CallerRun, who is this Runner?

is a thread in the Tomcat container.

Okay, now let's imagine this scenario: you have a custom thread pool, but due to a slow interface in a downstream request, the custom thread pool is full, triggering a denial policy.

The denial policy at this time is CallerRunsPolicy.

So a thread in the Tomcat thread pool needs to call the slow interface, resulting in the Tomcat thread that would have returned after submitting the task to the thread pool being taken to call the slow interface, resulting in a long time occupation.

So what would be a situation?

It's when critical resources are hogged for long periods of time, and in severe cases, the service becomes externally unavailable.

Think about it, let's say Tomcat has 200 threads in total.

190 of them are being held up by your slow interface, leaving only 10 threads that can be serviced externally.

It's even possible that 200 of them are being held up by this slow interface, and surely you, the service, have more than just this one interface to the outside world, right?

Other interfaces will be affected by the fact that, inside the slow interface, the thread pool's rejection policy is CallerRunsPolicy, which takes up all the resources.

It's useless to wait in the queue even if an interface for one of your other functions only takes 10ms, because there are no resources to process your request right now.

And for the interface caller, the time to enter the queue and wait is counted in the interface response elapsed time.

So, when using the CallerRunsPolicy denial policy, you need to pay special attention and analyze whether it will take up critical resources and cause slowing down this service.

But in the case of this report, we are in case two, and the Tomcat thread has to wait for the data to be returned.

Waiting would have been a hog. So there's nothing wrong with using CallerRunsPolicy.

However, Tomcat threads are a valuable resource, and if they are occupied for an extended period of time, it is a performance bottleneck.

So, in essence, there still shouldn't be a noticeably slow interface present.

On the point of "freeing up valuable resources as quickly as possible", you can also look at the Dubbo Service Segment Threading Model, which amounts to a best practice here:

/zh-cn/overview/mannual/java-sdk/advanced-features-and-usage/performance/threading-model/provider/

The default configuration for the Dubbo Protocol Provider threading model is AllDispatcher.

There is an official schematic for AllDispatcher:

The diagram shows two thread pools, the IO thread pool and the Dubbo thread pool.

Why do you need two thread pools?

Because the IO thread is a very valuable resource, it should just take on the function of sending requests and sending responses.

The purpose of having a Dubbo thread pool is to free up valuable IO thread resources as quickly as possible.

For example, received, connected, disconnected, and caught are all behaviors that are executed on Dubbo threads, and deserialization actions are done in Dubbo's thread pool.

Analogous to our previous example, the IO thread pool is the Tomcat thread pool, and the Dubbo thread pool is the custom thread pool in our project.

The pattern, that's the pattern.

Reason, that's what it is.

Keep digging.

If this is really an interview scenario question, the scenario of increasing the number of core threads and CallerRunsPolicy is not going to satisfy the interviewer, and you'll have to keep digging down.

For example, why is the downstream return so slow?

Is there room for optimization on the interface?

Isn't there a slow SQL?

Is it returning more information than is needed?

Is there an unreasonable data structure?

Is it doing something else inside the interface?

Is it a downstream downstream crotch pull?

...

Don't keep looking at yourself and reasonably pointing out other people's problems, right?

Anyway, with more than 20 asynchronous interfaces, there must be a relatively slow one.

It, that shortcoming, find it and then analyze it directionally.

If the downstream said that there is really no room for optimization, then add some money, get a few more machines, horizontal expansion, it does not cost a few dollars.

This situation is really quite common in practice, and Master Crooked has encountered it.

The upstream service has 8 machines and I only have 4. When the upstream concurrency comes up, it says that my interface response is slow.

The machines are twice as bad, and there are so many requests coming in that they're piling up, so it's going to be slow.

Of course, dumping the pot on the downstream doesn't necessarily mean the downstream will pick it up, and the problem still has to be solved on its own.

Through the previous analysis, we know that the number of core threads can be adjusted up, but the interviewer directly followed up with a question, to how much is appropriate?

Fit is such a delicate word.

Generally, we use "dynamic adjustment" to deal with this problem.

But in this "statement" scenario, Master Crooked thinks that it can really be adjusted to a suitable value, and can even use "accurate" to describe this value.

How?

Get a picture:

Get a thread pool inside each interface that has a lifecycle tied to a single request.

That is, the thread pool is shutdown at the end of a request.

How many downstream asynchronous interfaces you need to call behind one of your interfaces, you know that when you write it.

Let's say it's 15, so you get a thread pool with 15 core threads inside this request.

I'm just asking you, is it precise?

This usage, then, is more applicable to this more specific scenario.

The special point is that the data needs to be aggregated, so you need to call multiple downstream services asynchronously, and you have to get the data back from the downstream services before you can return it to the caller.

But here the upstream service initiates the call with uncertainty, it could be 10 requests at the same time or 1000 requests at the same time.

This situation leads to the headache of how to go about defining a global thread pool in a sensible way.

So, think differently.

Finding Certainty in Uncertainty.

Uncertainty is not knowing how many requests will come through. Determinism is that each request coming over the front end will correspond to a fixed number of downstream interfaces.

Then don't use a global thread pool, get a separate thread pool for each request, create it in time and recycle it in time.

Of course, one of the drawbacks of this solution is that there is no thread pool reuse anymore, just for pure asynchrony.

The second disadvantage is the possibility of instantly generating a large number of threads, which puts some pressure on memory, but theoretically these threads will be reclaimed very quickly, so this pressure should be within acceptable limits.

But think about it, the fundamental purpose of cranking up your core thread count is to assign a thread to every asynchronous task.

This program I have above accomplishes the same thing and, with more precise control.

Keep digging.

In fact, have you found that the previously mentioned options of adjusting the number of cores and corresponding a thread pool to a request are awkward and don't feel like they work?

Yes, that's how I feel.

So, I'll look at the question again:

"A report-like system."

If I was just three years into the job, I'd probably be thinking about the technical solution as soon as I got the requirement.

But now, with more years of experience, I look at needs with a "skeptical" eye to determine if it's a "pseudo-need".

Find a balance between requirements and technical landings that is comfortable for both business and development.

For example, if it's a report why is it required to respond in real time?

The front-end initiates a request, the back-end receives the request and returns it to the front-end first, giving a hint: buddy, got your request, it takes a bit of time to generate the report, please visit it under the xx menu in ten minutes.

Then you take your time with it in the background, it's actually generated in five minutes, and then you send a text alert to your buddy: the data is in place.

Others will still think: yes, pretty fast, techy dude is awesome.

Besides, don't reports generally show T-1 day data?

Since it's T-1 day data, why not do a timed task in the wee hours of the morning to first proactively aggregate the previous day's data from each system and put a copy locally?

So that you do not have to call the front-end call, real-time to call the interface aggregation well. Check the local data, that is not a very fast thing, the performance went up.

Or think one step further: why would you want to call another system's interface to get data?

Because you don't have the data in your own system.

Why don't you have data on your own system?

Because you have a microservices architecture, the data is scattered inside various microservices systems.

And when splitting microservices, have you considered the various reporting requirements?

If it has been considered, shouldn't a big data platform be built, which will siphon off business data from various microservice systems, then integrate the data and come up with all sorts of reports based on that data.

Microservices systems, on the other hand, only need to focus on the business.

If you don't have a big data platform, you should fully articulate to your leaders the need for that platform in the present, and its importance in the future.

Then, this function is not required by you.