Location>code7788 >text

NET Core Panopticon Underlying Principles

Popularity:491 ℃/2024-11-07 17:08:11

synopsis

image

The generalized reference material sucks, so I won't go over the basics, such as the use of generalized interfaces/delegates/methods, inversion and covariance.

The benefits of generalization are as follows

  1. Code Reuse
    Algorithm reuse, just pre-define the algorithms, sorting, searching, swapping, comparing, etc. The same set of logic can be used for any type
  2. type safety
    The compiler guarantees that it will not pass int to string
  3. simple and clear
    Reduced type conversion code
  4. Better performance
    Reduced boxing/unboxing and better generalized algorithms.

Why is generalization more performant?

This is mainly due to the managed heap allocation issues and performance loss associated with boxing.

  1. Value type boxing takes up extra memory
            var a = new List<int>()
            {
                1,2, 3, 4
            };
            var b = new ArrayList()
            {
                1,2,3,4
            };

Variable a:72kb
image
Variable b:184kb
image

  1. Loading/unloading boxes consumes additional CPU
	public void ArrayTest()
	{
		Stopwatch stopwatch = ();
		();
		ArrayList arrayList = new ArrayList();
		for (int i = 0; i < 10000000; i++)
		{
			(i);
			_ = (int)arrayList[i];
		}
		();
		($"array time is {}");
	}

	public void ListTest()
	{
		Stopwatch stopwatch = ();
		();
		List<int> list = new List<int>();
		for (int i = 0; i < 10000000; i++)
		{
			(i);
			_ = list[i];
		}
		();
		($"list time is {}");
	}

image

Such a huge difference will undoubtedly result in increased management costs for the GC as well as additional CPU consumption.

Think about the question, if it was a real parameter of a reference type. Would the difference still be so large?
If the gap is small, then what is our reason for using generalization?

Open/closed type

There are a variety of CLRtyped object , such as reference types, value types, interface and delegate types, and generic types.

Based on the creation behavior, they are further categorized intoOpen type/closed type

Why is this important? One of the advantages of generalization is code reuse, just define the algorithm. The rest just fill in the good. For example, List<> open to any real parameter, we can reuse the same set of algorithms.

give an example

  1. An open type is one where the type parameters have not yet been specified, they cannot be instantiated List<>,Dictionary<,>,interface . They just build the basic framework and open different real parameters.
            Type it = typeof(ITest).
            (it);//creation failed

            Type di = typeof(Dictionary<,>);
            (di);//creation failed
  1. Closed type means that the type has been specified and can be instantiated List<string>,String are closed types. They only accept real parameters with a specific meaning
            Type li = typeof(List<string>);.
            (li);//created successfully

code explosion

So when we useWhen opening up the type, there is a problem. During the JIT compilation phase, the CLR fetches the IL of the generalized type and then looks for the corresponding real parameter replacement to generate the appropriate native code.
But doing so has the disadvantage of generating, for each different generic type/method combination, all kinds of various native code. This would significantly increase the Assembly of the program, thus hurting performance
The CLR has a special optimization to mitigate that phenomenon:shared methodology

  1. Same type of real parameters, common set of methods
    If List<Struct> is used in one Assembly and another Assembly also uses List<Struct>.
    Then the CLR will only generate a set of native code.

  2. Reference type real parameters that share a common set of methods
    List<String> and List<Stream> real parameters are both reference types, and their values are pointer references on the managed heap. So the CLR can operate on pointers in the same way!
    This is not the case for value types, such as int vs. long. One takes up 4 bytes and one takes up 8 bytes. They don't take up the same amount of memory, which makes it impossible to reuse them with the same set of logic.

Seeing is believing1

sample code (computing)
    internal class Program
    {
        static void Main(string[] args)
        {
            var a = new Test<string>();
            var b = new Test<Stream>();
            
            ();
        }
    }

    public class Test<T>
    {
        public void Add(T value)
        {
		
        }
        public void Remove(T value)
        {

        }
    }

Variable a:
image

Variable b
image

A closer look reveals that their EEClasses are identical, and the MethodDesc of their Add/Remove methods are also identical. This confirms the above statement that reference type real parameters refer to the same set of methods.

Seeing is believing2

Click to view code
    internal class Program
    {
        static void Main(string[] args)
        {
            var a = new Test<int>();
            var b = new Test<long>();
            var c = new Test<MyStruct>();
            
            ();
        }
    }

    public class Test<T>
    {
        public void Add(T value)
        {

        }
        public void Remove(T value)
        {

        }
    }

    public struct MyStruct
    {
        public int Age;
    }

Let's replace the reference type with a value type and look at their method tables again.
Variable a.
image
Variable b.
image
Variable c.
image

A quick glance shows that they have completely different MethodDesc. This means that in Assembly. the CLR generates 3 sets of methods for generalization.

For those of you who are careful, you may notice that the real parameter of the reference type becomes a type called System.__Canon, which is used internally by the CLR as a "placeholder" for all reference types.
Interested partners can refer to its source code: coreclr\\src\System__Canon.cs

Why can't value types share the same set of methods?

In fact, it is well understood that the length of the pointer of the reference type is fixed (32-bit 4byte, 64-bit 8byte), while the length of the value type is not the same. Resulting in value types generated by the underlying assembly can not be handled uniformly. Therefore, the value type can not reuse the same set of methods.

seeing is believing

Click to view code
    internal class Program
    {
        static void Main(string[] args)
        {
            var a = new Test<int>();
            (1);
            var b = new Test<long>();
            (1);

            var c = new Test<string>();
            ("");
            var d = new Test<Stream>();
            (null);
            
            ();
        }
    }

    public class Test<T>
    {
        public void Add(T value)
        {
            var s = value;
        }
        public void Remove(T value)
        {

        }
    }
//variable a
00007FFBAF7B7435 mov eax,dword ptr [rbp+58h]
00007FFBAF7B7438 mov dword ptr [rbp+2Ch],eax //int type step 4 2ch

//variable b
00007FFBAF7B7FD7 mov rax,qword ptr [rbp+58h]
00007FFBAF7B7FDB mov qword ptr [rbp+28h],rax //long type step 8 28h assembly inconsistent

//variable c
00007FFBAF7B8087 mov rax,qword ptr [rbp+58h]
00007FFBAF7B808B mov qword ptr [rbp+28h],rax // 28h

// variable d
00007FFBAF7B8087 mov rax,qword ptr [rbp+58h]
00007FFBAF7B808B mov qword ptr [rbp+28h],rax // 28h The reference type address step is consistent, as is the assembly.

Mathematical computation of generalizations

Prior to .NET 7, if we want to utilize generics for math operations. It was not possible to do so. It could only be curved through dynamic

image

NET 7 introduces new math-related generic interfaces and provides default implementations of the interfaces.
image

/zh-cn/dotnet/standard/generics/math

The underlying implementation of the Math Computing Interface

C# Layer:
The operation of summing relies heavily on the IAdditionOperators interface.
image

ILayer:
The + operator is JIT-compiled into the op_Addition abstract method
image

For int, an implementation of int is called
System.
image

For long, the long implementation is called
System.
image

In principle, it is very simple, BCL implements the basic value type of all the +-*/ operations, as long as the constraints in the generalization is good, JIT will automatically call the corresponding implementation.

reach a verdict

There were no words along the way, nothing but fighting.
generalization, use it and be done with it. It's all aboutA little attention.(Hard drives are much cheaper than programmers.) Code explosion caused by value type generalization.