Location>code7788 >text

Remember once: analysis of memory explosion in a certain .NET digital collaborative management system

Popularity:310 ℃/2025-01-20 13:51:50

1: Background

1. Tell stories

A friend from the advanced debugging training camp came to me and said that their .NET program running on Linux had a memory leak. I went to windbg and found that the memory was eaten by IMAGEs, and those images were marked.doublemapper__deleted_Words, ask me why is this happening? To be honest, as debuggers, we like to deal with such people very much. After all, communication is smooth and it can particularly stimulate the other party's desire to explore. This is also a charm given by the training camp.

2: Analysis of memory explosion

1. Why did it skyrocket?

Friends who have watched this series of mine all know that observing memory is used!address -summarycommand, but this command is built for windows, so it will not work on linux. SOS provides a special command for this!maddressto replace, then use!maddress -orderBySizeObserve the memory distribution.


0:000> !maddress -orderBySize
 +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
 | Memory Kind         |        StartAddr |        EndAddr-1 |         Size | Type        | State       | Protect           | Image                                                             | 
 +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
 | Image               |     7f4000000000 |     7f4007ff6000 |     127.96mb | MEM_IMAGE   | MEM_COMMIT  | PAGE_READWRITE    | doublemapper__deleted_                                            | 
 | Image               |     7f3fc4000000 |     7f3fcbff5000 |     127.96mb | MEM_IMAGE   | MEM_COMMIT  | PAGE_READWRITE    | doublemapper__deleted_                                            | 
 | Image               |     7f404c021000 |     7f4051b4c000 |      91.17mb | MEM_IMAGE   | MEM_UNKNOWN | PAGE_UNKNOWN      | doublemapper__deleted_                                            | 
 | Image               |     7f3fae82e000 |     7f3fb4000000 |      87.82mb | MEM_IMAGE   | MEM_COMMIT  | PAGE_EXECUTE_READ | doublemapper__deleted_                                            | 
 | Image               |     7f406c021000 |     7f40701ff000 |      65.87mb | MEM_IMAGE   | MEM_UNKNOWN | PAGE_UNKNOWN      | doublemapper__deleted_    
 ...
  +----------------------------------------------------------------------+ 
 | Memory Type         |          Count |         Size |   Size (bytes) | 
 +----------------------------------------------------------------------+ 
 | Image               |            980 |       3.54gb |  3,801,517,056 | 
 | PAGE_READWRITE      |          1,178 |       1.17gb |  1,255,059,968 | 
 | Stack               |             66 |     499.35mb |    523,604,992 | 
...
 | NewStubPrecodeHeap  |              4 |      64.00kb |         65,536 | 
 +----------------------------------------------------------------------+ 
 | [TOTAL]             |          8,254 |       6.01gb |  6,451,347,968 | 
 +----------------------------------------------------------------------+ 

Judging from the hexagram, the total memory usage is 6.4G, and Image only consumes 3.8G. From the details, it is indeed marked.doublemapper__deleted_To be honest, I have analyzed more than 300 cases of dumps. This is the second time I have encountered the biggest problem with Image. This kind of fault case is generally rare. Let’s explore it next.doublemapper__deleted_Why is it sacred?

2. What is doublemapper__deleted_

To find this answer, first look for clues in the coreclr source code. After global search, I quickly discovered the keywordsdoublemapperRelevant code:


bool VMToOSInterface::CreateDoubleMemoryMapper(void** pHandle, size_t *pMaxExecutableCodeSize)
{
#ifndef TARGET_OSX

#ifdef TARGET_FREEBSD
    int fd = shm_open(SHM_ANON, O_RDWR | O_CREAT, S_IRWXU);
#elif defined(TARGET_SUNOS) // has POSIX implementation
    char name[24];
    sprintf(name, "/shm-dotnet-%d", getpid());
    name[sizeof(name) - 1] = '\0';
    shm_unlink(name);
    int fd = shm_open(name, O_RDWR | O_CREAT | O_EXCL | O_NOFOLLOW, 0600);
#else // TARGET_FREEBSD
    int fd = memfd_create("doublemapper", MFD_CLOEXEC);
#endif // TARGET_FREEBSD

    *pMaxExecutableCodeSize = MaxDoubleMappedSize;
    *pHandle = (void*)(size_t)fd;
#else // !TARGET_OSX

    *pMaxExecutableCodeSize = SIZE_MAX;
    *pHandle = NULL;
#endif // !TARGET_OSX

    return true;
}

Judging from the hexagram, coreclr was really messed up. In order to be compatible with various operating system cores, coreclr added countless if and else judgments. I was speechless. In the end, it left under the circumstances of non-OSX, non-FREEBSD, and non-SUNOS.memfd_createFunction, things have made some progress here.

Friends who are familiar with Linux should knowmemfd_createIt is a Linux system call used to create an anonymous file descriptor. If you are looking for the equivalent function on Windows, it is in win32api.CreateFileMappingFunctions, that is, memory mapped files, can also be viewed in the source code directory:

Maybe some friends are interested inmemfd_createThe use of is still a bit vague. I asked chatgpt to help me generate a simple demo to help everyone understand. The simplified version is as follows:


int main() {
    const char *name = "example_memfd";
    int fd;
    size_t size = 1024; // 1 KB
    void *map;
    const char *text = "Hello, memfd_create!";

    // Create the memory file descriptor
    fd = memfd_create(name, MFD_CLOEXEC);

    // Resize the memory file to the desired size
    ftruncate(fd, size)

    // Map the memory file into the address space
    map = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

    // Write some data to the memory file
    strncpy(map, text, strlen(text));

    // Print the data from the memory file
    printf("Data in memory file: %s\n", (char *)map);

    // Unmap the memory
    munmap(map, size)

    // Close the file descriptor
    close(fd);

    return 0;
}

The logic in the hexagram is very simple. It should be noted that an important step here is to mount fd on physical memory through mmap, that isfd -> mmap <- memory, you can easily write data into it after hanging it.

After having these foundations, let’s look atdoublemapper__deleted_Does the wording give you a sense of enlightenment? Probably it is only executed during resource release.close(fd), but not executedmummap, the reference is as follows:

// Unmap the memory (not executed for some reason)
     //munmap(map, size)

     // Close the file descriptor
     close(fd);

Haha, of course my guess is different. Friends who are familiar with Linux can give me some advice. Where is the next research direction? Now that I have speculated that there seems to be some kind of logic bug, but the coreclr code is not written by us, so can I work around it?

3. Can memfd_create be bypassed?

If you want to know whether you can get around it, you have to look for the answer in the source code. God is not disappointed, and we have found it. The simplified code is as follows:


bool ExecutableAllocator::Initialize()
{
    if (IsDoubleMappingEnabled())
    {
        if (!VMToOSInterface::CreateDoubleMemoryMapper(&m_doubleMemoryMapperHandle, &m_maxExecutableCodeSize))
        {
            g_isWXorXEnabled = false;
            return true;
        }

        m_CriticalSection = ClrCreateCriticalSection(CrstExecutableAllocatorLock,CrstFlags(CRST_UNSAFE_ANYMODE | CRST_DEBUGGER_THREAD));
    }

    return true;
}

bool ExecutableAllocator::IsDoubleMappingEnabled()
{

#if defined(HOST_OSX) && defined(HOST_ARM64)
    return false;
#else
    return g_isWXorXEnabled;
#endif
}

bool ExecutableAllocator::g_isWXorXEnabled = CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_EnableWriteXorExecute) != 0;

RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableWriteXorExecute, W("EnableWriteXorExecute"), 1, "Enable W^X for executable memory.");

Judging from the code in the hexagram, it is ultimately composed ofEnableWriteXorExecuteControlled by external variables, what does this variable mean? In fact, it is a function jointly provided by the operating system and the CPU./wiki/W%5EXTop pairW^XThe features are introduced, which roughly means:

It is a memory protection strategy according to which every page in the process or kernel address space is either writable or executable, but not both attributes at the same time. Without this protection, the program It is possible to write (as data "W") CPU instructions in the memory area originally used to store data, and then run (as executable code "X"; or read-execute "RX") these instructions. This can be dangerous if the party writing to the memory has malicious intent.

andEnableWriteXorExecuteThere are many discussions on github about the problems caused by this thing:

  1. /dotnet/runtime/issues/97765
  2. /questions/77164379/how-do-i-debug-a-net-core-console-app-with-windbg-by-launch-executable
  3. /dotnet/runtime/issues/79469

The advice given by everyone is to turn it off. The operation method is as follows:


export DOTNET_EnableWriteXorExecute=0

After asking friends to turn off this option, they reported that the program was running normally.

4. What code caused it?

Although it can be passedexport DOTNET_EnableWriteXorExecute=0To solve this problem, what exactly is the business that causes a lot ofdoublemapperWoolen cloth? This requires looking for the answer from these memory segments. If you think about it carefully, since it is a memory file, it most likely carries .NET dll files, and dll files are all magic.MZBeginning. So use s-a to spot check one of the memory segments.


0:000> s-a 7f3fc4000000 7f3fcbff5000-0x1 "MZ"
00007f3f`c4059ce4  4d 5a 00 00 00 00 00 00-00 00 00 00 7c 00 00 00  MZ..........|...
00007f3f`c44f2989  4d 5a 3c 40 7f 00 00 b1-05 00 00 94 99 00 00 80  MZ<@............
00007f3f`c44f2b69  4d 5a 3c 40 7f 00 00 b1-05 00 00 98 99 00 00 40  MZ<@...........@
00007f3f`c44f3d99  4d 5a 3c 40 7f 00 00 b2-05 00 00 ac 99 00 00 80  MZ<@............
00007f3f`c44f4d49  4d 5a 3c 40 7f 00 00 b2-05 00 00 b6 99 00 00 80  MZ<@............
00007f3f`c45a3c61  4d 5a c4 3f 7f 00 00 00-00 00 00 00 00 00 00 cd  MZ.?............
00007f3f`c45a3ca1  4d 5a c4 3f 7f 00 00 00-00 00 00 00 00 00 00 cd  MZ.?............
00007f3f`c45a3ce1  4d 5a c4 3f 7f 00 00 00-00 00 00 00 00 00 00 cd  MZ.?............
00007f3f`c45a3d21  4d 5a c4 3f 7f 00 00 00-00 00 00 00 00 00 00 cd  MZ.?............
...

Then I used a private script to export it and found that it contained a large number of project dlls. I won’t take screenshots here. Friends also said that their program has the logic to dynamically generate code.

Four: Summary

The EnableWriteXorExecute feature is set to 0 to 1 by default after .NET7. On some open source Linux, various weird problems will occur due to various compatibility issues. I feel that it is still possible to disable this feature.
图片名称