I: Background
1. Storytelling
The other day a friend found me, said their system appeared the CPU 100% situation, let you help to see what is going on. dump also got it, I thought of this situation let him catch a few more, since there is a take the existing analysis.
II: WinDbg Analysis
1. Why the high burst
Since we are talking about 100%, as a debugger, we have to take the data and we can use the!tp
Come and observe.
0:000:x86> !tp
CPU utilization: 100%
Worker Thread: Total: 382 Running: 382 Idle: 0 MaxLimit: 8191 MinLimit: 8
Work Request in Queue: 8694
Unknown Function: 6f62b650 Context: 4a36bbbc
Unknown Function: 6f62b650 Context: 4a36e1d4
Unknown Function: 6f62b650 Context: 4a372384
Unknown Function: 6f62b650 Context: 239adfec
Unknown Function: 6f62b650 Context: 4a374994
Unknown Function: 6f62b650 Context: 239b9e14
Unknown Function: 6f62b650 Context: 2399fd9c
...
From the trigrams, it looks like the CPU is 100% and all the threads in the thread pool are full.Life has always been a little too full for that. Half poor, half rich, half at peace.. At the same time the thread pool queue has accumulated 8694 tasks pending, indicating that at this point the thread pool is in full fallout, and to find this out, you need to use the~*e !clrstack
command observes what each thread is doing at that moment, and the output is as follows:
0:000:x86> ~*e !clrstack
OS Thread Id: 0x22f4 (429)
Child SP IP Call Site
4bc1e060 0000002b [GCFrame: 4bc1e060]
4bc1e110 0000002b [HelperMethodFrame_1OBJ: 4bc1e110] (Boolean, Int32, )
4bc1e19c 24aad7da (, Int32, Boolean)
4bc1e1ac 2376f0d6 ()
4bc1e1dc 2420bbc6 ()
...
4bc1e234 24206fbe (`2<,>)
4bc1e3e0 216e25f9 DynamicClass.lambda_method(, , [])
4bc1e3f0 238b86b7 (, [])
...
4bc1eee0 2353d448 (IntPtr, IntPtr, IntPtr, Int32)
4bc1efb8 00a9e3c2 [ContextTransitionFrame: 4bc1efb8]
You can see from the trigram that there are currently 371 threads in the I can't get out of the Wait in the center. Why can't I get out of the Wait?
2. Exploring the source code
The only way to find this answer is to look at the source code, which is simplified as follows:
public IRedisClient GetClient()
{
lock (writeClients)
{
AssertValidReadWritePool();
RedisClient inActiveWriteClient;
while ((inActiveWriteClient = GetInActiveWriteClient()) == null)
{
if (!(writeClients, ))
{
throw new TimeoutException("Redis Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use.");
}
}
}
}
private RedisClient GetInActiveWriteClient()
{
int num = WritePoolIndex % ;
for (int i = 0; i < ; i++)
{
int num2 = (num + i) % ;
RedisEndPoint redisEndPoint = ReadWriteHosts[num2];
for (int j = num2; j < ; j += )
{
if (writeClients[j] != null && !writeClients[j].Active && !writeClients[j].HadExceptions)
{
return writeClients[j];
}
}
}
return null;
}
Carefully read the code in the trigrams, the main reason why you enter Wait is becauseGetInActiveWriteClient()
method returns null, we know from the exception message that this is because the writeClients pool is full, so is the pool full? You can dig out the writeClients array and use the!dso
Command.
0:429:x86> !dso
OS Thread Id: 0x22f4 (429)
ESP/REG Object Name
...
4BC1E0D0 0ea38d18 []
4BC1E100 0ea38bb0
...
0:429:x86> !da 0ea38d18
Name: []
MethodTable: 237af1c0
EEClass: 0129a224
Size: 52(0x34) bytes
Array: Rank 1, Number of elements 10, Type CLASS
Element Methodtable: 237ae954
[0] 0ea38dd4
[1] 0a9f9f58
[2] 0296e468
[3] 0c9786a0
[4] 0a9fe768
[5] 04a21f24
[6] 0aa0d758
[7] 10946d90
[8] 04a8c8b0
[9] 02a2a2a0
0:429:x86> !DumpObj /d 0ea38dd4
Name:
MethodTable: 237ae954
EEClass: 2375d154
Size: 152(0x98) bytes
File: C:\Windows\xxx\
Fields:
...
0129aa48 4000169 7d 1 instance 1 <Active>k__BackingField
...
Looking at the trigrams the writeClients pool is only 10 sizes, and they are allActive=1
It's not surprising that it returns null.
3. why are the clients in use?
To find this answer, one needs to look at the upper level of the How the method is called is a little more vague for privacy.
As you can see from the diagram, the problem lies in the use of foreach to keep iterating leading to writeClient pool depletion, resulting in a large number of requests in the constant blocking, do not forget that there are 371 threads in the contention here, oh, it is really a big no-no.
Next, how many times does this foreach foreach? Keep digging with !dso.
0:429:x86> !DumpObj /d 077cec20
Name: `1[[xxxInfo, xxx]]
MethodTable: 241ad794
EEClass: 0193166c
Size: 24(0x18) bytes
File: C:\Windows\\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\
Fields:
MT Field Offset Type VT Attr Value Name
01860eec 4001891 4 System.__Canon[] 0 instance 077e0048 _items
0129c9b0 4001892 c System.Int32 1 instance 307 _size
0129c9b0 4001893 10 System.Int32 1 instance 307 _version
01296780 4001894 8 0 instance 00000000 _syncRoot
01860eec 4001895 4 System.__Canon[] 0 static <no information>
From the trigrams to see the current need to cycle 307 times, but also once again verified the pool exhaustion, I know that careful friends will certainly say, I recognize the card death, but can lead to the CPU burst high I can not understand, in fact, you carefully read the source code will be able to understand, which is the classicLock convoy.
phenomenon because the following two conditions are satisfied.
- Multi-threaded foreach high-frequency calls.
- Wait causes the thread to pause and enter the wait queue.
4. How to address the issue
Knowing the causes and consequences, the solution is relatively simple, three approaches:
- Change foreach iteration to batch processing to reduce writeclient rental.
- Increase the writeclient pool size, as described on the official website.
- The version is very old and charged, so it's best to replace it to eliminate any future problems.
III: Summary
This production accident analysis is still very interesting, a seemingly blocking problem can also cause the CPU to explode high, beyond some people's knowledge, right, in fact, it is the classiclock convoy
Phenomenon, you can find me if you have any dump questions, as always, free analysis.