make superior
Any other special optimization points for Dapper AOT left in the previous post
After a careful reading of the generated code and the source code, the answer finally came to me
Personally, I have always thought that Dapper AOT only used iterators to implement, so it should be almost the implementation of the code but a huge gap, thinking in a deadlock, once thought there is some black magic!
It turns out that Dapper AOT is not implemented with iterators!!!! Holy crap, I thought there was a new way to optimize iterators!
Iterators are no longer used
List<> results = new();
try
{
while (())
{
(ReadOne(reader, readOnlyTokens));
}
return results;
}
Of course, the only way to do that is to require the user to use theAsList
method because theToList
It's going to cause problems with copying lists, leading to negative optimizations.
as such
<Dog>("select * from dog").AsList();
// AsList realization
public static List<T> AsList<T>(this IEnumerable<T>? source) => source switch
{
null => null!,
List<T> list => list,
_ => (source),
};
Use span
Without the restriction of an iterator method, span can be used in any way you want.
public static ReadOne(this IDataReader reader, ref ReadOnlySpan<int> ss)
{
var d = new ();
for (int j = 0; j < ; j++)
{
Using ArrayPool Reduced memory footprint
public Span<int> GetTokens()
{
FieldCount = Reader!.FieldCount;
if (Tokens is null || < FieldCount)
{
// no leased array, or existing lease is not big enough; rent a new array
if (Tokens is not null) ArrayPool<int>.(Tokens);
Tokens = ArrayPool<int>.(FieldCount);
}
return (ref (Tokens), FieldCount);
}
Data hours are allocated using the stack
var s = <= 64 ? (ref (stackalloc int[]), ) : ();
Generate a partial hashcode in advance for comparison
Since comparing isn't time-consuming anymore, the cache isn't necessary and is removed.
public static void GenerateReadTokens(this IDataReader reader, Span<int> s)
{
for (int i = 0; i < ; i++)
{
var name = (i);
var type = (i);
switch ((name))
{
case 742476188U:
s[i] = type == typeof(int) ? 1 : 2;
break;
case 2369371622U:
s[i] = type == typeof(string) ? 3 : 4;
break;
case 1352703673U:
s[i] = type == typeof(float) ? 5 : 6;
break;
default:
break;
}
}
}
Performance Test Description
BenchmarkDotNet
Here's a special note
BenchmarkDotNet, which already takes into account jit optimization and so on, with warm-ups, super-multiple executions.
The result values are also handled statistically taking into account the distribution of the result set, removing values with large variations (e.g., a small number of isolated very large and very small values), and displaying the mean value for cases with small differences, and the median value for cases with large differences.
Interested kids can go to/dotnet/BenchmarkDotNet realize
The chole is a bit tricky to mock, so I copied some of the source code and only compared the entity mapping parts
DapperAOT and pure dapper are hard to run together, so I won't compare them, dapper is definitely slower anyway!
Test Data
Test Data As I said before, we use the manual mock method to avoid execution differences caused by db drivers, db execution, mock libraries, and so on.
class
Very simple class, of course not representative of all cases, but simple enough for testing
public class Dog
{
public int? Age { get; set; }
public string Name { get; set; }
public float? Weight { get; set; }
}
mock data
public class TestDbConnection : DbConnection
{
public int RowCount { get; set; }
public IDbCommand CreateCommand()
{
return new TestDbCommand() { RowCount = RowCount };
}
}
public class TestDbCommand : DbCommand
{
public int RowCount { get; set; }
public IDataParameterCollection Parameters { get; } = new TestDataParameterCollection();
public IDbDataParameter CreateParameter()
{
return new TestDataParameter();
}
protected override DbDataReader ExecuteDbDataReader(CommandBehavior behavior)
{
return new TestDbDataReader() { RowCount = RowCount };
}
}
public class TestDbDataReader : DbDataReader
{
public int RowCount { get; set; }
private int calls = 0;
public override object this[int ordinal]
{
get
{
switch (ordinal)
{
case 0:
return "XX";
case 1:
return 2;
case 2:
return 3.3f;
default:
return null;
}
}
}
public override int FieldCount => 3;
public override Type GetFieldType(int ordinal)
{
switch (ordinal)
{
case 0:
return typeof(string);
case 1:
return typeof(int);
case 2:
return typeof(float);
default:
return null;
}
}
public override float GetFloat(int ordinal)
{
switch (ordinal)
{
case 2:
return 3.3f;
default:
return 0;
}
}
public override int GetInt32(int ordinal)
{
switch (ordinal)
{
case 1:
return 2;
default:
return 0;
}
}
public override string GetName(int ordinal)
{
switch (ordinal)
{
case 0:
return "Name";
case 1:
return "Age";
case 2:
return "Weight";
default:
return null;
}
}
public override string GetString(int ordinal)
{
switch (ordinal)
{
case 0:
return "XX";
default:
return null;
}
}
public override object GetValue(int ordinal)
{
switch (ordinal)
{
case 0:
return "XX";
case 1:
return 2;
case 2:
return 3.3f;
default:
return null;
}
}
public override bool Read()
{
calls++;
return calls <= RowCount;
}
}
Benchmark Code
[MemoryDiagnoser, Orderer(summaryOrderPolicy: ), GroupBenchmarksBy(), CategoriesColumn]
public class ObjectMappingTest
{
[Params(1, 1000, 10000, 100000, 1000000)]
public int RowCount { get; set; }
[Benchmark(Baseline = true)]
public void SetClass()
{
var connection = new TestDbConnection() { RowCount = RowCount };
var dogs = new List<Dog>();
try
{
();
var cmd = ();
= "select ";
using (var reader = ())
{
while (())
{
var dog = new Dog();
(dog);
= (0);
= reader.GetInt32(1);
= (2);
}
}
}
finally
{
();
}
}
[Benchmark]
public void DapperAOT()
{
var connection = new TestDbConnection() { RowCount = RowCount };
var dogs = <Dog>("select * from dog").AsList();
}
[Benchmark]
public void SourceGenerator()
{
var connection = new TestDbConnection() { RowCount = RowCount };
List<Dog> dogs;
try
{
();
var cmd = ();
= "select ";
using (var reader = ())
{
dogs = <Dog>().AsList();
}
}
finally
{
();
}
}
[Benchmark]
public void Chloe()
{
var connection = new TestDbConnection() { RowCount = RowCount };
try
{
();
var cmd = ();
var dogs = new InternalSqlQuery<Dog>(cmd, "select").AsList();
}
finally
{
();
}
}
}
The full code can be found in/fs7744/SlowestEM
Test results
BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4651/22H2/2022Update)
Intel Core i7-10700 CPU 2.90GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.5.24307.3
[Host] : .NET 8.0.6 (8.0.624.26715), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.6 (8.0.624.26715), X64 RyuJIT AVX2
Method | RowCount | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|
DapperAOT | 1 | 446.3 ns | 8.81 ns | 8.65 ns | 0.60 | 0.03 | 0.0525 | 0.0515 | - | 440 B | 1.00 |
SourceGenerator | 1 | 690.0 ns | 13.72 ns | 32.34 ns | 0.95 | 0.07 | 0.0525 | 0.0515 | - | 440 B | 1.00 |
SetClass | 1 | 728.3 ns | 14.59 ns | 37.41 ns | 1.00 | 0.00 | 0.0525 | 0.0515 | - | 440 B | 1.00 |
Chloe | 1 | 909.7 ns | 17.49 ns | 22.75 ns | 1.25 | 0.06 | 0.1020 | 0.1011 | - | 856 B | 1.95 |
SetClass | 1000 | 8,593.3 ns | 169.90 ns | 390.38 ns | 1.00 | 0.00 | 6.7902 | 1.6937 | - | 56912 B | 1.00 |
SourceGenerator | 1000 | 16,967.8 ns | 310.02 ns | 258.88 ns | 1.91 | 0.08 | 6.7749 | 1.6785 | - | 56912 B | 1.00 |
DapperAOT | 1000 | 18,299.7 ns | 267.72 ns | 250.43 ns | 2.06 | 0.09 | 6.7749 | 1.3428 | - | 56912 B | 1.00 |
Chloe | 1000 | 116,049.4 ns | 297.71 ns | 263.91 ns | 13.06 | 0.54 | 6.8359 | 1.7090 | - | 57328 B | 1.01 |
SetClass | 10000 | 309,255.1 ns | 3,945.26 ns | 3,294.47 ns | 1.00 | 0.00 | 83.0078 | 82.5195 | 41.5039 | 662782 B | 1.00 |
DapperAOT | 10000 | 402,700.7 ns | 7,676.45 ns | 7,180.56 ns | 1.31 | 0.03 | 83.0078 | 82.5195 | 41.5039 | 662782 B | 1.00 |
SourceGenerator | 10000 | 414,226.2 ns | 8,149.22 ns | 10,007.97 ns | 1.34 | 0.04 | 83.0078 | 82.5195 | 41.5039 | 662782 B | 1.00 |
Chloe | 10000 | 1,453,166.1 ns | 19,660.10 ns | 17,428.16 ns | 4.70 | 0.07 | 82.0313 | 80.0781 | 41.0156 | 663199 B | 1.00 |
SetClass | 100000 | 2,176,860.4 ns | 42,449.84 ns | 63,536.93 ns | 1.00 | 0.00 | 496.0938 | 496.0938 | 496.0938 | 6098015 B | 1.00 |
SourceGenerator | 100000 | 3,045,760.4 ns | 59,378.23 ns | 63,534.04 ns | 1.39 | 0.05 | 496.0938 | 496.0938 | 496.0938 | 6098015 B | 1.00 |
DapperAOT | 100000 | 3,053,510.0 ns | 35,015.61 ns | 29,239.62 ns | 1.40 | 0.04 | 496.0938 | 496.0938 | 496.0938 | 6098015 B | 1.00 |
Chloe | 100000 | 13,152,653.6 ns | 65,400.49 ns | 51,060.40 ns | 6.02 | 0.14 | 484.3750 | 484.3750 | 484.3750 | 6098433 B | 1.00 |
SetClass | 1000000 | 105,420,410.0 ns | 2,093,734.23 ns | 3,380,990.50 ns | 1.00 | 0.00 | 6800.0000 | 6800.0000 | 2200.0000 | 56780029 B | 1.00 |
SourceGenerator | 1000000 | 115,534,043.8 ns | 1,828,036.86 ns | 1,795,376.62 ns | 1.09 | 0.03 | 6800.0000 | 6800.0000 | 2200.0000 | 56780118 B | 1.00 |
DapperAOT | 1000000 | 115,751,485.5 ns | 2,120,239.39 ns | 2,603,844.38 ns | 1.10 | 0.04 | 6800.0000 | 6800.0000 | 2200.0000 | 56780029 B | 1.00 |
Chloe | 1000000 | 208,295,919.3 ns | 4,031,590.18 ns | 4,481,101.81 ns | 1.97 | 0.06 | 6666.6667 | 6666.6667 | 2333.3333 | 56781907 B | 1.00 |
SourceGenerator is basically equivalent to DapperAOT, except that it doesn't use an Interceptor and doesn't take into account the details of various cases.
SourceGenerator is definitely the best way to optimize performance nowadays. After all, it can generate code files, and it's actually much less difficult to get started than emit and the like.