I: Background
1. Storytelling
The other day a friend found me, said their program crashed, but also their own analysis of the preliminary results, let me help reconfirm, since I let me confirm, then start dump analysis journey.
II: WinDbg Analysis
1. Why the collapse
One of the powerful things about windbg is that it comes with an automated analyze command.!analyze -v
It can help us to quickly analyze the output as follows:
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
CONTEXT: (.ecxr)
rax=00007ff95c5a9877 rbx=00007ff959d6d8e0 rcx=0000000000000000
rdx=0000000000000000 rsi=000000e394b98de0 rdi=000000e394b99530
rip=00007ff959c7b699 rsp=000000e394b99510 rbp=000000e394b99d00
r8=0000000000000000 r9=0000000000000007 r10=0000000000000000
r11=0000000000000000 r12=0000022da11451d0 r13=0000000000000000
r14=000000e394b9a9e0 r15=0000000000040ae4
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000200
KERNELBASE!RaiseException+0x69:
00007ff9`59c7b699 0f1f440000 nop dword ptr [rax+rax]
Resetting default scope
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ff959c7b699 (KERNELBASE!RaiseException+0x0000000000000069)
ExceptionCode: c000041d
ExceptionFlags: 00000001
NumberParameters: 0
PROCESS_NAME:
ERROR_CODE: (NTSTATUS) 0xc000041d - <Unable to get error code text>
EXCEPTION_CODE_STR: c000041d
...
You can see from the trigrams that the current crash code isc000041d
namelyAn unhandled exception was encountered during a user callback
The exception code is a generic exception, and the implication is that there is a real exception code hidden inside, so what is the real exception code?
2. Where are the real exception codes?
To know the answer to this, you can cut to the exception context to find the parent function of RaiseException in the graph observation, the output is as follows:
0:000> k 5
# Child-SP RetAddr Call Site
00 000000e3`94b99510 00007ff8`eb52cb19 KERNELBASE!RaiseException+0x69
01 000000e3`94b995f0 00007ff8`eb52cb4b coreclr!NakedThrowHelper2+0x9
02 000000e3`94b99620 00007ff8`eb52cb55 coreclr!NakedThrowHelper_RspAligned+0x1e
03 000000e3`94b99b48 00007ff8`8da3caa3 coreclr!NakedThrowHelper_FixRsp+0x5
04 000000e3`94b99b50 00007ff8`8d5a5e23 Avalonia_Base!+0x83
0:000> ub 00007ff8`eb52cb19
...
00007ff8`eb52cb14 e857910b00 call coreclr!LinkFrameAndThrow (00007ff8`eb5e5c70)
0:000> uf coreclr!LinkFrameAndThrow
Flow analysis was incomplete, some code may be missing
coreclr!LinkFrameAndThrow [D:\a\_work\1\s\src\coreclr\vm\ @ 6934]:
6934 00007ff8`eb5e5c70 4053 push rbx
6934 00007ff8`eb5e5c72 4883ec20 sub rsp,20h
6937 00007ff8`eb5e5c76 488d05bb771f00 lea rax,[coreclr!FaultingExceptionFrame::`vftable' (00007ff8`eb7dd438)]
...
6949 00007ff8`eb5e5cea 448b05c7682800 mov r8d,dword ptr [coreclr!g_SavedExceptionInfo+0x18 (00007ff8`eb86c5b8)]
6949 00007ff8`eb5e5cf1 8b15ad682800 mov edx,dword ptr [coreclr!g_SavedExceptionInfo+0x4 (00007ff8`eb86c5a4)]
6949 00007ff8`eb5e5cf7 8b0da3682800 mov ecx,dword ptr [coreclr!g_SavedExceptionInfo (00007ff8`eb86c5a0)]
6950 00007ff8`eb5e5cfd 4883c420 add rsp,20h
6950 00007ff8`eb5e5d01 5b pop rbx
6949 00007ff8`eb5e5d02 48ff2537581b00 jmp qword ptr [coreclr!_imp_RaiseException (00007ff8`eb79b540)] Branch
...
As you can see from the trigram, RaiseException's arguments come from the exception information global variableg_SavedExceptionInfo
This variable holds the true context of the current crash as well as register information, and has the following data structure in the CLR:
struct SavedExceptionInfo
{
EXCEPTION_RECORD m_ExceptionRecord;
CONTEXT m_ExceptionContext;
CrstStatic m_Crst;
}
With that in place the next step is to dig in with dt and the output is as follows:
0:000> dt coreclr!g_SavedExceptionInfo 00007ff8eb86c5a0
+0x000 m_ExceptionRecord : _EXCEPTION_RECORD
+0x0a0 m_ExceptionContext : _CONTEXT
+0x570 m_Crst : CrstStatic
0:000> dx -r1 (*((coreclr!_EXCEPTION_RECORD *)0x7ff8eb86c5a0))
(*((coreclr!_EXCEPTION_RECORD *)0x7ff8eb86c5a0)) [Type: _EXCEPTION_RECORD]
[+0x000] ExceptionCode : 0xc0000005 [Type: unsigned long]
[+0x004] ExceptionFlags : 0x0 [Type: unsigned long]
[+0x008] ExceptionRecord : 0x0 [Type: _EXCEPTION_RECORD *]
[+0x010] ExceptionAddress : 0x7ff88da3caa3 [Type: void *]
[+0x018] NumberParameters : 0x2 [Type: unsigned long]
[+0x020] ExceptionInformation [Type: unsigned __int64 [15]]
The real reason for the current collapse from the information in the trigrams is that0xc0000005
namelyaccess violation
And the point at which it collapsed.RIP=0x7ff88da3caa3
。
3. What logic led to the crash
This is relatively simple to do with!U
cap (a poem)uf
Try them all, the output is as follows:
0:000> !U 0x7ff88da3caa3
Normal JIT generated code
()
ilAddr is 0000022DC65AE2D4 pImport is 00000238EE6FECA0
Begin 00007FF88DA3CA20, size 96
...
00007ff8`8da3ca9b 488bce mov rcx,rsi
00007ff8`8da3ca9e e8cdeaa5fe call 00007ff8`8c49b570 ((), mdToken: 00000000060009D9)
>>> 00007ff8`8da3caa3 488b4008 mov rax,qword ptr [rax+8]
00007ff8`8da3caa7 8b4008 mov eax,dword ptr [rax+8]
...
0:000> dt coreclr!g_SavedExceptionInfo 00007ff8eb86c5a0
+0x000 m_ExceptionRecord : _EXCEPTION_RECORD
+0x0a0 m_ExceptionContext : _CONTEXT
+0x570 m_Crst : CrstStatic
0:000> dx -r1 (*((coreclr!_CONTEXT *)0x7ff8eb86c640))
...
[+0x078] Rax : 0x0 [Type: unsigned __int64]
...
Looking at the assembly code in the trigram, the reason for the crash is that theAvalonia
frameworkRequestCompositionBatchCommitAsync
return null lead to, that is, rax = 0, this Avalonia is not that cross-platform WPF, a little interesting, the next to the source code to confirm what variables.
From the code logic _nextCommit is a class variable rather than a method local variable, and in the case of high concurrency if any other method will be_nextCommit=null
If this is the case, I searched in the class to verify the idea, and there is really a method that sets null, as shown in the screenshot below:
It's basically clear that this is a bug in Avalonia. Finally, we look at the version of Avalonia, which we find to be very recent, and the output is as follows:
0:000> lmvm Avalonia_Base
...
Timestamp: A0BE2821 (This is a reproducible build file hash, not a timestamp)
CheckSum: 001CDA05
ImageSize: 001D4000
File version: 11.1.0.0
Product version: 11.1.0.0
File flags: 0 (Mask 3F)
File OS: 4 Unknown Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0000.04b0
Information from resource tables:
CompanyName: Avalonia Team
ProductName: Avalonia
InternalName:
OriginalFilename:
ProductVersion: 11.1.0+2a8ea17985fd739234fa0d93c3437948535d35c4
FileVersion: 11.1.0.0
FileDescription:
LegalCopyright: Copyright 2013-2024 © The AvaloniaUI Project
4. How can this be resolved?
Knowing that this is a bug in Avalonia, and that Avalonia is a very new version, the way to upgrade is blocked, so I can only submit an issue to the official:/AvaloniaUI/Avalonia Come and fix it.
III: Summary
I dug up some new stuff in this production accident, and I'm a bit curious to know if Avalonia is now being used as a replacement for WPF in the industrial control industry. But at this stage the stability is not comparable to WPF, look forward to a more robust version in the future.