1: Background
1. Tell stories
A friend from the advanced debugging training camp came to me and said that their .NET program running on Linux had a memory leak. I went to windbg and found that the memory was eaten by IMAGEs, and those images were marked.doublemapper__deleted_
Words, ask me why is this happening? To be honest, as debuggers, we like to deal with such people very much. After all, communication is smooth and it can particularly stimulate the other party's desire to explore. This is also a charm given by the training camp.
2: Analysis of memory explosion
1. Why did it skyrocket?
Friends who have watched this series of mine all know that observing memory is used!address -summary
command, but this command is built for windows, so it will not work on linux. SOS provides a special command for this!maddress
to replace, then use!maddress -orderBySize
Observe the memory distribution.
0:000> !maddress -orderBySize
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Memory Kind | StartAddr | EndAddr-1 | Size | Type | State | Protect | Image |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Image | 7f4000000000 | 7f4007ff6000 | 127.96mb | MEM_IMAGE | MEM_COMMIT | PAGE_READWRITE | doublemapper__deleted_ |
| Image | 7f3fc4000000 | 7f3fcbff5000 | 127.96mb | MEM_IMAGE | MEM_COMMIT | PAGE_READWRITE | doublemapper__deleted_ |
| Image | 7f404c021000 | 7f4051b4c000 | 91.17mb | MEM_IMAGE | MEM_UNKNOWN | PAGE_UNKNOWN | doublemapper__deleted_ |
| Image | 7f3fae82e000 | 7f3fb4000000 | 87.82mb | MEM_IMAGE | MEM_COMMIT | PAGE_EXECUTE_READ | doublemapper__deleted_ |
| Image | 7f406c021000 | 7f40701ff000 | 65.87mb | MEM_IMAGE | MEM_UNKNOWN | PAGE_UNKNOWN | doublemapper__deleted_
...
+----------------------------------------------------------------------+
| Memory Type | Count | Size | Size (bytes) |
+----------------------------------------------------------------------+
| Image | 980 | 3.54gb | 3,801,517,056 |
| PAGE_READWRITE | 1,178 | 1.17gb | 1,255,059,968 |
| Stack | 66 | 499.35mb | 523,604,992 |
...
| NewStubPrecodeHeap | 4 | 64.00kb | 65,536 |
+----------------------------------------------------------------------+
| [TOTAL] | 8,254 | 6.01gb | 6,451,347,968 |
+----------------------------------------------------------------------+
Judging from the hexagram, the total memory usage is 6.4G, and Image only consumes 3.8G. From the details, it is indeed marked.doublemapper__deleted_
To be honest, I have analyzed more than 300 cases of dumps. This is the second time I have encountered the biggest problem with Image. This kind of fault case is generally rare. Let’s explore it next.doublemapper__deleted_
Why is it sacred?
2. What is doublemapper__deleted_
To find this answer, first look for clues in the coreclr source code. After global search, I quickly discovered the keywordsdoublemapper
Relevant code:
bool VMToOSInterface::CreateDoubleMemoryMapper(void** pHandle, size_t *pMaxExecutableCodeSize)
{
#ifndef TARGET_OSX
#ifdef TARGET_FREEBSD
int fd = shm_open(SHM_ANON, O_RDWR | O_CREAT, S_IRWXU);
#elif defined(TARGET_SUNOS) // has POSIX implementation
char name[24];
sprintf(name, "/shm-dotnet-%d", getpid());
name[sizeof(name) - 1] = '\0';
shm_unlink(name);
int fd = shm_open(name, O_RDWR | O_CREAT | O_EXCL | O_NOFOLLOW, 0600);
#else // TARGET_FREEBSD
int fd = memfd_create("doublemapper", MFD_CLOEXEC);
#endif // TARGET_FREEBSD
*pMaxExecutableCodeSize = MaxDoubleMappedSize;
*pHandle = (void*)(size_t)fd;
#else // !TARGET_OSX
*pMaxExecutableCodeSize = SIZE_MAX;
*pHandle = NULL;
#endif // !TARGET_OSX
return true;
}
Judging from the hexagram, coreclr was really messed up. In order to be compatible with various operating system cores, coreclr added countless if and else judgments. I was speechless. In the end, it left under the circumstances of non-OSX, non-FREEBSD, and non-SUNOS.memfd_create
Function, things have made some progress here.
Friends who are familiar with Linux should knowmemfd_create
It is a Linux system call used to create an anonymous file descriptor. If you are looking for the equivalent function on Windows, it is in win32api.CreateFileMapping
Functions, that is, memory mapped files, can also be viewed in the source code directory:
Maybe some friends are interested inmemfd_create
The use of is still a bit vague. I asked chatgpt to help me generate a simple demo to help everyone understand. The simplified version is as follows:
int main() {
const char *name = "example_memfd";
int fd;
size_t size = 1024; // 1 KB
void *map;
const char *text = "Hello, memfd_create!";
// Create the memory file descriptor
fd = memfd_create(name, MFD_CLOEXEC);
// Resize the memory file to the desired size
ftruncate(fd, size)
// Map the memory file into the address space
map = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
// Write some data to the memory file
strncpy(map, text, strlen(text));
// Print the data from the memory file
printf("Data in memory file: %s\n", (char *)map);
// Unmap the memory
munmap(map, size)
// Close the file descriptor
close(fd);
return 0;
}
The logic in the hexagram is very simple. It should be noted that an important step here is to mount fd on physical memory through mmap, that isfd -> mmap <- memory
, you can easily write data into it after hanging it.
After having these foundations, let’s look atdoublemapper__deleted_
Does the wording give you a sense of enlightenment? Probably it is only executed during resource release.close(fd)
, but not executedmummap
, the reference is as follows:
// Unmap the memory (not executed for some reason)
//munmap(map, size)
// Close the file descriptor
close(fd);
Haha, of course my guess is different. Friends who are familiar with Linux can give me some advice. Where is the next research direction? Now that I have speculated that there seems to be some kind of logic bug, but the coreclr code is not written by us, so can I work around it?
3. Can memfd_create be bypassed?
If you want to know whether you can get around it, you have to look for the answer in the source code. God is not disappointed, and we have found it. The simplified code is as follows:
bool ExecutableAllocator::Initialize()
{
if (IsDoubleMappingEnabled())
{
if (!VMToOSInterface::CreateDoubleMemoryMapper(&m_doubleMemoryMapperHandle, &m_maxExecutableCodeSize))
{
g_isWXorXEnabled = false;
return true;
}
m_CriticalSection = ClrCreateCriticalSection(CrstExecutableAllocatorLock,CrstFlags(CRST_UNSAFE_ANYMODE | CRST_DEBUGGER_THREAD));
}
return true;
}
bool ExecutableAllocator::IsDoubleMappingEnabled()
{
#if defined(HOST_OSX) && defined(HOST_ARM64)
return false;
#else
return g_isWXorXEnabled;
#endif
}
bool ExecutableAllocator::g_isWXorXEnabled = CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_EnableWriteXorExecute) != 0;
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableWriteXorExecute, W("EnableWriteXorExecute"), 1, "Enable W^X for executable memory.");
Judging from the code in the hexagram, it is ultimately composed ofEnableWriteXorExecute
Controlled by external variables, what does this variable mean? In fact, it is a function jointly provided by the operating system and the CPU./wiki/W%5EX
Top pairW^X
The features are introduced, which roughly means:
It is a memory protection strategy according to which every page in the process or kernel address space is either writable or executable, but not both attributes at the same time. Without this protection, the program It is possible to write (as data "W") CPU instructions in the memory area originally used to store data, and then run (as executable code "X"; or read-execute "RX") these instructions. This can be dangerous if the party writing to the memory has malicious intent.
andEnableWriteXorExecute
There are many discussions on github about the problems caused by this thing:
- /dotnet/runtime/issues/97765
- /questions/77164379/how-do-i-debug-a-net-core-console-app-with-windbg-by-launch-executable
- /dotnet/runtime/issues/79469
The advice given by everyone is to turn it off. The operation method is as follows:
export DOTNET_EnableWriteXorExecute=0
After asking friends to turn off this option, they reported that the program was running normally.
4. What code caused it?
Although it can be passedexport DOTNET_EnableWriteXorExecute=0
To solve this problem, what exactly is the business that causes a lot ofdoublemapper
Woolen cloth? This requires looking for the answer from these memory segments. If you think about it carefully, since it is a memory file, it most likely carries .NET dll files, and dll files are all magic.MZ
Beginning. So use s-a to spot check one of the memory segments.
0:000> s-a 7f3fc4000000 7f3fcbff5000-0x1 "MZ"
00007f3f`c4059ce4 4d 5a 00 00 00 00 00 00-00 00 00 00 7c 00 00 00 MZ..........|...
00007f3f`c44f2989 4d 5a 3c 40 7f 00 00 b1-05 00 00 94 99 00 00 80 MZ<@............
00007f3f`c44f2b69 4d 5a 3c 40 7f 00 00 b1-05 00 00 98 99 00 00 40 MZ<@...........@
00007f3f`c44f3d99 4d 5a 3c 40 7f 00 00 b2-05 00 00 ac 99 00 00 80 MZ<@............
00007f3f`c44f4d49 4d 5a 3c 40 7f 00 00 b2-05 00 00 b6 99 00 00 80 MZ<@............
00007f3f`c45a3c61 4d 5a c4 3f 7f 00 00 00-00 00 00 00 00 00 00 cd MZ.?............
00007f3f`c45a3ca1 4d 5a c4 3f 7f 00 00 00-00 00 00 00 00 00 00 cd MZ.?............
00007f3f`c45a3ce1 4d 5a c4 3f 7f 00 00 00-00 00 00 00 00 00 00 cd MZ.?............
00007f3f`c45a3d21 4d 5a c4 3f 7f 00 00 00-00 00 00 00 00 00 00 cd MZ.?............
...
Then I used a private script to export it and found that it contained a large number of project dlls. I won’t take screenshots here. Friends also said that their program has the logic to dynamically generate code.
Four: Summary
The EnableWriteXorExecute feature is set to 0 to 1 by default after .NET7. On some open source Linux, various weird problems will occur due to various compatibility issues. I feel that it is still possible to disable this feature.