Location>code7788 >text

My friends complained about why I was so stupid not to use it in the source generator, instead of using an under-optimized hash method.

Popularity:982 ℃/2024-08-10 22:09:42

There's obviously a better hash method.

A friend was spouting off to me about the optimization points I listed the other day for generating db-mapped entities in the source generatorGenerate a partial hashcode in advance for comparison

Code shown

public static void GenerateReadTokens(this IDataReader reader, Span<int> s)
{
    for (int i = 0; i < ; i++)
    {
        var name = (i);
        var type = (i);
        switch ((name))
        {
            
            case 742476188U:
                s[i] = type == typeof(int) ? 1 : 2; 
                break;

            case 2369371622U:
                s[i] = type == typeof(string) ? 3 : 4; 
                break;

            case 1352703673U:
                s[i] = type == typeof(float) ? 5 : 6; 
                break;

            default:
                break;
        }
    }
}

Why don't you use it here?, and to useSlowNonRandomizedHash(name)It's silly not to use a better way.

At that time, I could only explain it with an embarrassed face.It really doesn't work.

Unfortunately, no amount of verbal explanation can get rid of TA's disdainful gaze.

I can only write a few more words of "sophistry" here.

"sophistry."

First of all, actually.NormalizedHash The performance is very strong and is implemented as follows

public static uint SlowNonRandomizedHash(this string? value)
{
    uint hash = 0;
    if (!(value))
    {
        hash = 2166136261u;
        foreach (char c in value!)
        {
            hash = ((c) ^ hash) * 16777619;
        }
    }
    return hash;
}

But strong performance or not, that's not the only reason to use this method

In fact, the real reason many people know, are everyone's default common sense: net codeis random, run the program multiple times and the same string may have a different hash value each time it is run

For example, the '18 articleWhy is () different each time I run my program in .NET Core?

Here's a brief recap of the original

Take this very simple program, for example, which calls a string GetHashCode() twice in succession

using System;

static class Program
{
    static void Main(string[] args)
    {
        ("Hello World!".GetHashCode());
        ("Hello World!".GetHashCode());
    }
}

If you run this program on the .NET Framework, you will get the same value every time you run the program:

> dotnet run -c Release -f net471
-1989043627
-1989043627

> dotnet run -c Release -f net471
-1989043627
-1989043627

> dotnet run -c Release -f net471
-1989043627
-1989043627

In contrast, if you compile the same program for .NET Core, you will get the same value for each call in the same program execution, but different values for different program executions: GetHashCode()

> dotnet run -c Release -f netcoreapp2.1
-1105880285
-1105880285

> dotnet run -c Release -f netcoreapp2.1
1569543669
1569543669

> dotnet run -c Release -f netcoreapp2.1
-1477343390
-1477343390

After making an effort to look up theMicrosoft's official documentation gives advice on using the GetHashCode() method.The GetHashCode() method should not be used as if it were the same persistent value. It explicitly suggests that the hash value produced by the GetHashCode() method should not be used as the same value that can be persisted.

The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across .NET implementations, across .NET versions, and across .NET platforms (such as 32-bit and 64-bit) for a single version of .NET. In some cases, they can even differ by application domain. This implies that two subsequent runs of the same program may return different hash codes.

Why use randomized hash?

Stephen Toub in aissue The answer to this question is mentioned in.

Q: Why .NET Core utilize randomized string hashing?
Q: Why does .NET Core utilize randomized string hashing?
A: Security, prevention against DoS attacks, etc.
A: Security, prevention against DoS attacks, etc.

The original article explains in great detail what is involved in security, so I won't repeat it in detail here.

So is there a better hash method?

Of course there is, the string class actually has one.

For those interested, you can read the source code/dotnet/runtime/blob/main/src/libraries//src/System/#L923

Inside, both case-sensitive and case-insensitive implementations are available, and the code has more performance optimizations than the methods listed in the '18 article above

There's no way we can just use the internal methods, though.

But what? We have black magic we can just use.

public static partial class StringHashing
{
    [UnsafeAccessor(, Name = "GetNonRandomizedHashCodeOrdinalIgnoreCase")]
    public static extern int Hash(this string c);
}

Compare.

We've written all this and people will not be convinced if we don't compare performance

A quick comparison.

[ShortRunJob, MemoryDiagnoser, Orderer(summaryOrderPolicy: ), GroupBenchmarksBy(), CategoriesColumn]
public class StringHashingBenchmarks
{
    [Params(0, 1, 10, 100)]
    public int Count { get; set; }

    public string Str { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        var s = ("", ("_", Count));
        var b = Encoding.(s);
        (b);
        Str = Encoding.(b);
    }

    [Benchmark(Baseline = true)]
    public int GetHashCode()
    {
        return ();
    }

    [Benchmark]
    public uint SlowNonRandomizedHash()
    {
        return ();
    }

    [Benchmark]
    public int NonRandomizedHash()
    {
        return ();
    }
}

in the end


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3)
13th Gen Intel Core i9-13900KF, 1 CPU, 32 logical and 24 physical cores
.NET SDK 9.0.100-preview.6.24328.19
  [Host]   : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1  
WarmupCount=3  

Method Count Mean Error StdDev Ratio RatioSD Allocated Alloc Ratio
SlowNonRandomizedHash 0 0.3286 ns 0.0727 ns 0.0040 ns 0.69 0.01 - NA
GetHashCode 0 0.4751 ns 0.1093 ns 0.0060 ns 1.00 0.00 - NA
NonRandomizedHash 0 0.6614 ns 0.0339 ns 0.0019 ns 1.39 0.02 - NA
GetHashCode 1 0.5686 ns 0.0881 ns 0.0048 ns 1.00 0.00 - NA
NonRandomizedHash 1 0.6559 ns 0.0254 ns 0.0014 ns 1.15 0.01 - NA
SlowNonRandomizedHash 1 7.3752 ns 0.2379 ns 0.0130 ns 12.97 0.11 - NA
GetHashCode 10 3.1627 ns 0.2081 ns 0.0114 ns 1.00 0.00 - NA
NonRandomizedHash 10 16.1921 ns 1.1773 ns 0.0645 ns 5.12 0.02 - NA
SlowNonRandomizedHash 10 44.4825 ns 2.8742 ns 0.1575 ns 14.06 0.01 - NA
GetHashCode 100 40.4233 ns 0.7217 ns 0.0396 ns 1.00 0.00 - NA
NonRandomizedHash 100 110.2494 ns 13.1581 ns 0.7212 ns 2.73 0.02 - NA
SlowNonRandomizedHash 100 362.0329 ns 11.0681 ns 0.6067 ns 8.96 0.02 - NA

Of course, the hash code we're comparing is case-sensitive, while the other two are case-insensitive.

But their gaps are all very small, so it's safe to say they're all very strong methods

regrettablyUnsafeAccessor These black magic spells don't work in the source generator