NET String in a few diagrams

String

String as aspecialReference types, by far the most used types in .NET programs. It can be said that everything can be string

So when analyzing dumps, it's common to see a large number of string objects

The immutability of string

String as a reference type, that means it is mutable. NET, they are immutable by default.
That is to say that the behavior is similar to value types and is actually a special case of reference types.

However, "strings are immutable" only holds true for .NET, simply because there is no way to change the content of a string in the BCL (Basic Class Library).
In C/C++/F#, it is possible to change it. Therefore, it is entirely possible to modify the string contents in the underlying implementation

seeing is believing

Example 1

sample code (computing)

        static void Main(string[] args)
        {
            var teststr = "aaa";
            ();
            (teststr);
            ();
        }

As you can see, the value of string is aaa

Through the algorithm: address + 0x10 + 2 * sizeof(char) ,we directly modify the contents of memory

As you can see, the value of the same memory address has changed from "aaa" to "aab".

Example 2

Click to view code

        static void Main(string[] args)
        {
            var str1 = "aaa";


            ref var c0 = ref <char>((0));
            ref var c0 = ref <char>((0)); c0 = '0';
            ref var c1 = ref <char>((1));
            c1 = '1';

            (str1);// from aaa to 01a
        }

Variable behavior of strings

Then in everyday use when we need a lot of string splicing. How to improve it?
The most common approach is to use Stringbuilder.

Stringbuilder source code analysis

 public sealed partial class StringBuilder : ISerializable
 {
 		//Stores the string'schar[]
        internal char[] m_ChunkChars;

		//StringBuilderUse a chained table to correlate
        internal StringBuilder? m_ChunkPrevious;
		
        public StringBuilder(string? value, int startIndex, int length, int capacity)
        {
            (capacity);
            (length);
            (startIndex);

            value ??= ;

            if (startIndex > - length)
            {
                throw new ArgumentOutOfRangeException(nameof(length), SR.ArgumentOutOfRange_IndexLength);
            }

            m_MaxCapacity = ;
            if (capacity == 0)
            {
                capacity = DefaultCapacity;
            }
            capacity = (capacity, length);

            m_ChunkChars = <char>(capacity);
            m_ChunkLength = length;

            (startIndex, length).CopyTo(m_ChunkChars);
        }
		public StringBuilder Append(char value, int repeatCount)
        {
            if (repeatCount == 0)
            {
                return this;
            }

            char[] chunkChars = m_ChunkChars;
            int chunkLength = m_ChunkLength;


    		// Try to put all duplicate characters in the current block
    		// Comparison of the use of Span<T>.Slice Same checks，in order to 64 Folding in a bit system
    		// on account of repeatCount Cannot be negative，thus in 32 No overflow on bit systems
            if (((nuint)(uint)chunkLength + (nuint)(uint)repeatCount) <= (nuint)(uint))
            {
				//utilizationSpanHigh Performance Fillingchar[]
                (chunkLength, repeatCount).Fill(value);
                m_ChunkLength += repeatCount;
            }
            else
            {
				//If there is not enough space，then expand the capacity
                AppendWithExpansion(value, repeatCount);
            }
            return this;
        }
		public override string ToString()
        {
			// Allocate a new string for storing the result
            string result = (Length);
            StringBuilder? chunk = this;
            do
            {
                if (chunk.m_ChunkLength > 0)
                {
                   // Copy these values into local variables，to ensure stability in multi-threaded environments
                    char[] sourceArray = chunk.m_ChunkChars;
                    int chunkOffset = chunk.m_ChunkOffset;
                    int chunkLength = chunk.m_ChunkLength;

					// utilization内存移动复制数据到resultcenter
                    (
                        ref (ref (), chunkOffset),
                        ref (sourceArray),
                        (nuint)chunkLength);
                }
				//Move to previousStringBuildercenter，chained read
                chunk = chunk.m_ChunkPrevious;
            }
            while (chunk != null);

            return result;
        }
 }

Inside Stringbuilder, internally use char[] m_ChunkChars to save the text. And use Span way to direct high performance operation memory.

Avoiding object allocation is the most common way to improve code performance
//$"name={name}" and other common functions have Stringbuilder implemented internally.

Why are strings immutable?

So given the counter-intuitive nature of string, why is it designed this way? There are a few reasons

safety
There are so many uses for string, such as new Dictionary<string, string>(), user token, file path. Their uses all represent a key, if the key can be modified by the program. Then there will be no security.
concurrency
Precisely because string is so widely used, there are many scenarios where there may be concurrent accesses, and if mutable, then additional synchronization overhead needs to be incurred.

Why is string not a structure?

Having said that, structs fulfill the immutability/concurrency safety criteria perfectly, so why not define a string as a struct?
The core reason for this is that the pass-value semantics of the structure leads to frequentCopy string
And the overhead of copying large strings is too great, so use pass quoting semantics toefficientlyplentiful

Serialization/deserialization of JSON is a typical example

string staging

NET Rumtime has an internal string interning mechanism.
When two strings are exactly the same, there is no need to store two copies in memory. Just keep one copy

However, there is a limitation on string staging. By default, only statically created strings are staged. That is, only statically created strings are staged. The JIT determines whether or not to store a string.

give an example

        static void Main(string[] args)
        {
            var s1 = "hello world";
            var s2 = "hello ";
            var s3 = "world";

            ((global,s1)); //True ,both are consistent, only one variable is kept
            ((s1, s2 + s3));//False s2 + s3 are dynamic, not staged

            ();
        }

The reason for this is that it is a huge overhead, and when creating a new string, the runtime needs to dynamically check if it has been staged. If the string being tested is quite large, or if the number of strings is particularly large, then theIt's just as expensive.。

FCL provides an explicit API / to allow us to actively store strings.

Where is the string stored?

/dotnet/runtime/blob/main/src/coreclr/vm/

At this point you can think about what is the difference between a temporary string and a static variable? They are both objects that will never be released
So you can guess. Strings should be staged in the AppDomain. It should be next to the HF heap.

NET internal Appdomain, there is a private heap called String Literal Map object, internally stored string hash with a memory address.
The memory address points to another data structure, LargeHeapHandleTable . Located in the LOH heap, LargeHeapHandleTable contains references to string instances within the

Under normal circumstances, only >85000 bytes will be allocated in the LOH heap, LargeHeapHandleTable is a typical exception. Some objects that will not be reclaimed/hard to reclaim will be allocated in the LOH heap even if they don't exceed 85000. Because this can reduce the GC workload (no generation up, no compression)

seeing is believing

Dig a hole to be buriedI don't think sos provides a heap address for String Literal Map, so I'll have to fiddle with it for a few days.

safety string

In the process of using string, it may contain sensitive objects. For example, Password.
String objects use char[] internally to carry. Therefore, when unsafe or unmanaged code is executed, it is possible that the memory will be scanned. It is possible that the memory will be scanned.
It's only safe if the object is recovered by GC. But the time difference in between is enough to be scanned N times.

To solve this problem, the SecureString class was added to the FCL. As a replacement for the upper

Internal use of UnmanagedBuffer instead of char[]

public sealed partial class SecureString : IDisposable
{
		private readonly object _methodLock = new object();//synchronous lock
        private UnmanagedBuffer? _buffer; //utilizationUnmanagedBuffersubstitute (X for Y, or a number in an algebraic expression)char[]
		public SecureString()
        {
			_buffer = (GetAlignedByteSize());
            _decryptedLength = ;

            SafeBuffer? bufferToRelease = null;
            try
            {
                Span<char> span = AcquireSpan(ref bufferToRelease);
                (span);
            }
            finally
            {
                ProtectMemory();
                bufferToRelease?.DangerousRelease();
            }
        }

		
		public void AppendChar(char c)
        {
            lock (_methodLock)
            {
                EnsureNotDisposed();
                EnsureNotReadOnly();

                (_buffer != null);

                SafeBuffer? bufferToRelease = null;

                try
                {
				    //Decrypting memory for modification
                    UnprotectMemory();

                    EnsureCapacity(_decryptedLength + 1);

                    Span<char> span = AcquireSpan(ref bufferToRelease);
                    span[_decryptedLength] = c;
                    _decryptedLength++;
                }
                finally
                {
					//re-encrypt
                    ProtectMemory();
                    bufferToRelease?.DangerousRelease();
                }
            }
        }
}

Implemented IDisposable interface, developers can manually execute Dispose(). Directly zero the memory buffer to ensure that malicious code can not obtain sensitive information.


        public void Dispose()
        {
            lock (_methodLock)
            {
                if (_buffer != null)
                {
                    _buffer.Dispose();
                    _buffer = null;
                }
            }
        }

Are security strings really secure?

The purpose of SecureString is to avoid the need to use the name in thein-processUsing plain text to store confidential information
The underlying SecureString is also essentially an unencrypted char[], with data encryption/decryption performed by the FCL.
So only the .NET Framework, where the internal char[] is supported by windows, is cryptographic
NET Core, however, other platforms do not provide system-level support for the

/dotnet/platform-compat/blob/master/docs/

So, personally, I think the real silver bullet. Is the data itself is encrypted. For example, if it is stored in a database, it is encrypted, or if it is encrypted in a configuration file. Because the operating systemNo security stringsThe concept.

The malicious code just has to be able to read the memory and the memory itself is not encrypted. Then it's naked on the CLR layer