EF Core Performance Optimization Tips

Code-Level Optimization

1. Use of instance pools

EFCore 2.0 introduces a new registration method for DbContext: it transparently registers a pool of DbContext instances, which avoids the need to create new instances all the time; EF Core resets their state and stores them in an internal pool; when a new instance is requested the next time, it is returned to the shared instance instead of setting up a new instance

Example of use:

<HandshakesWebDBContext>(options => ());

Replace with

<HandshakesWebDBContext>(options => (), poolSize: 80);
// Be careful to set the maximum number of connections, once the default configured maximum number of connection pools is exceeded, it will fall back to the behavior of creating instances on demand

Benchmarking (official)test code：

methodologies	quantities	average value	incorrect	standard deviation	Gen 0	Gen 1	Gen 2	allocated
WithoutContextPooling	1	701.6 us	26.62 us	78.48 us	11.7188	-	-	50.38 KB
WithContextPooling	1	350.1 us	6.80 us	14.64 us	0.9766	-	-	4.63 KB

Caveat: While this may not be a very significant performance improvement in most cases, it is a good practice to avoid wasting resources while providing some performance improvement.

2. Use of split queries

Understand whatCartesian product (of vectors) ?

In layman's terms, it refers to the elements from the two sets (Set) to form a new set of pairs to McDonald's set to the analogy of the store will be the burger line and the beverage line of each product set to form a new set of how many kinds of set menu

In the database in the form of a join operation two tables in the data volume is not very large in terms of querying may be ambiguous in terms of the performance impact, but for some of the business needs of increasing the number of columns of the large wide table as well as the data stock of the table is too large in terms of querying too slow and data redundancy issues arise.
It is especially suitable for the scenario of one-to-many and large amount of sub-table data.

Look at a piece of Linq code:

var data = 
    .Include(x => )
    .Include(x => )
    .ThenInclude(x => x.D1s)
    .Include(x => )
    .ThenIncude(x => x.C1s)
    .ThenInclude(x=>x.D2s)
    .ToList();

Monitor to see the generated Sql statements:

SELECT [A].[Id], [A].[Name], 
       [B].[Id], [B].[AId], [B].[Name],
       [C].[Id], [C].[AId], [C].[Name],
       [D1].[Id], [D1].[CId], [D1].[Name],
       [C1].[Id], [C1].[CId], [C1].[Name],
       [D2].[Id], [D2].[C1Id], [D2].[Name]
FROM [As] AS [A]
LEFT JOIN [Bs] AS [B] ON [A].[Id] = [B].[AId]
LEFT JOIN [Cs] AS [C] ON [A].[Id] = [C].[AId]
LEFT JOIN [D1s] AS [D1] ON [C].[Id] = [D1].[CId]
LEFT JOIN [C1s] AS [C1] ON [C].[Id] = [C1].[CId]
LEFT JOIN [D2s] AS [D2] ON [C1].[Id] = [D2].[C1Id]

Undoubtedly, this is a bad sql statement, assuming a large amount of data per table, which is undoubtedly a big burden on the query, and if the conditions were more complex, the analysis of the whole statement would be bad.About the Ali development specification that defines that join queries with more than 3 tables are prohibited (not verified), this may just be for the sake of development specification and management, from a technical point of view, there is actually no such principle problem.

Solution: Use SplitQuery, which can be literally understood as splitting these join queries into individual queries for execution.

Sample code (recommended):

var data =
    .Include(x => )
    .Include(x => )
    .ThenInclude(x => x.D1s)
    .Include(x => )
    .ThenIncude(x => x.C1s)
    .ThenInclude(x=>x.D2s)
    .AsSplitQuery() //Setting up a split query
    .ToList();

Of course it can also be configured globally (but this is generally not recommended, it's best to use the recommended approach above on a per-query basis)

<CRSGEntityDbContext>(options => (["ConnectionStrings:FiinGroupDB"], o => ()));

Generated sql

SELECT [a].[Id], [a].[OtherColumns]
FROM [As] AS [a]

SELECT [b].[Id], [b].[AId], [b].[OtherColumns]
FROM [Bs] AS [b]
INNER JOIN [As] AS [a] ON [b].[AId] = [a].[Id]

SELECT [c].[Id], [c].[AId], [c].[OtherColumns]
FROM [Cs] AS [c]
INNER JOIN [As] AS [a] ON [c].[AId] = [a].[Id]

SELECT [d1].[Id], [d1].[CId], [d1].[OtherColumns]
FROM [D1s] AS [d1]
INNER JOIN [Cs] AS [c] ON [d1].[CId] = [c].[Id]
WHERE [c].[AId] IN (SELECT [a].[Id] FROM [As] AS [a])

SELECT [c1].[Id], [c1].[CId], [c1].[OtherColumns]
FROM [C1s] AS [c1]
INNER JOIN [Cs] AS [c] ON [c1].[CId] = [c].[Id]
WHERE [c].[AId] IN (SELECT [a].[Id] FROM [As] AS [a])

SELECT [d2].[Id], [d2].[C1Id], [d2].[OtherColumns]
FROM [D2s] AS [d2]
INNER JOIN [C1s] AS [c1] ON [d2].[C1Id] = [c1].[Id]
WHERE [c1].[CId] IN (SELECT [c].[Id] FROM [Cs] AS [c] WHERE [c].[AId] IN (SELECT [a].[Id] FROM [As] AS [a]))

You can see that the query is split into separate statements, the logic is clearer and the execution efficiency will be better for the database.

Note: Although split queries can be avoided by avoiding the performance problems caused by the explosion of Descartes, but also need to be based on the actual query scenarios to decide whether to use, for example, the need to sort data, paging, grouping and other operations, in order to ensure that the query results of the correctness of the query results, it is necessary to consider whether to use the split query

Related topics: about lazy loading, in fact, the cause of the problem of lazy loading is equivalent to the execution of sql statements in the loop, sample code:

// Loop through the subobjects directly without displaying the loaded ones
foreach (var blog in ())
{
    foreach (var post in )
    {
        ($"Blog {}, Post: {}");
    }
}

Observe the sql log:

info: [20101]
      Executed DbCommand (1ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT [b].[BlogId], [b].[Rating], [b].[Url]
      FROM [Blogs] AS [b]
info: [20101]
      Executed DbCommand (5ms) [Parameters=[@__p_0='1'], CommandType='Text', CommandTimeout='30']
      SELECT [p].[PostId], [p].[BlogId], [p].[Content], [p].[Title]
      FROM [Post] AS [p]
      WHERE [p].[BlogId] = @__p_0
info: [20101]
      Executed DbCommand (1ms) [Parameters=[@__p_0='2'], CommandType='Text', CommandTimeout='30']
      SELECT [p].[PostId], [p].[BlogId], [p].[Content], [p].[Title]
      FROM [Post] AS [p]
      WHERE [p].[BlogId] = @__p_0
info: [20101]
      Executed DbCommand (1ms) [Parameters=[@__p_0='3'], CommandType='Text', CommandTimeout='30']
      SELECT [p].[PostId], [p].[BlogId], [p].[Content], [p].[Title]
      FROM [Post] AS [p]
      WHERE [p].[BlogId] = @__p_0
... and so on

The correct way to do this is to use Include or Load to display the loaded data.

3. Use of batch statements

Batch statement is an important feature updated in EFCore7, which solves the problem of using third-party libraries to realize batch update and deletion of data in previous versions, and brings greater performance improvement.

3.1 Batch deletion

What previous versions did (without resorting to third-party libraries)

foreach (var blog in (b =>  < 3))
{
    (blog);
}
();

With ExecuteDelete, batch operations are superior both syntactically and performance-wise.

(b =>  < 3).ExecuteDelete();

If the EFCore version is lower than 7.0, you can also use the direct execution of the sql statement ExecuteSqlRaw to perform the operation.

("DELETE FROM [Blogs] WHERE [Rating] < 3");

3.2 Batch updates

Usage is basically the same as Delete


    .Where(b =>  < 3)
    .ExecuteUpdate(setters => (b => , false));

Note: Currently only relational databases are supported, and it is necessary to send contextual requests in a timely manner, so if you want to support transactions, you need to use display transactions to combine with other code

4. Use of non-tracking queries

This is relatively simple, in scenarios where you don't need to perform any update operations on the query results, try to use a non-tracking query

var blogs = 
    .AsNoTracking()
    .ToList();

 = ;
var blogs = ();

test code

//The database is warmed up before the code is executed
//fulfillment5substandard
double elapsedTime4 = MeasureTime(() => (x => == 1), 5);
double elapsedTime5 = MeasureTime(() => ().FirstOrDefault(x => == 1), 5);

($"Tracked time took : {elapsedTime4} ms");
($"AsNoTracking() time took : {elapsedTime5} ms");

//Consoles:
//Tracked time took : 318.26 ms
//AsNoTracking() time took : 229.86 ms

5. Project only the required fields

Strictly speaking, this is a matter of awareness. In most cases, in order to save the amount of code, you can directly use the objects defined by the DataSet to query directly, or use Include to load the associated table data, but when encountering a large number of data queries or a large number of table joins, the precise attribute projection will play a significant role in performance

Sample code:

var data = 
    .Where(x => ("xxx"))
    .ToList();

foreach (var item in )
{
    ($"Name :{},Id: {}");
}

In the above code, we only need to query the id and name information of the main table and sub-tables, but it loads all the relevant main table and sub-table fields, which is a waste of performance

Solution: Project the fields to be queried by Select

var data = 
    .Where(x => =1)
    .Select(x => new {, })
    .ToList();

Individual customary practices


//1，Not dependent on database foreign key setup
var query = from b in
            join c in on equals
            join d in on equals
            select new A{blogId = ,postId = ,postValue = }

This approach is useful for multi-table lookups and queries with large amounts of data, but it should be noted that this approach is not suitable for scenarios where the data needs to be updated, because EF's change tracking only applies to entity instances.

6. Maximize the use of asynchronous methods

EFCore basically provides corresponding asynchronous methods for all synchronous operation methods, try to use them to avoid blocking, reduce the need for threads and the number of thread context switches that must occur, and thus improve performance.

//ToListAsync
var data = await ();
//FirstOrDefaultAsync
var item = await (it =>  == 1);
=2;
//SaveChangesAsync
await ();
//AsAsyncEnumerable
var groupedHighlyRatedBlogs = await 
    .AsQueryable()
    .Where(b =>  > 3) // server-evaluated
    .AsAsyncEnumerable()
    .GroupBy(b => ) // client-evaluated
    .ToListAsync();

Asynchronous programming is recommended in efcore in most cases, but care needs to be taken to avoid using asynchronous methods to query the contents of text or binary data types, which can cause performance problems instead (sqlclientof the United Nations), issue report.EF Core - Memory and performance issues with async methods Reading large data (binary, text) asynchronously is extremely slow

Avoid mixing synchronous and asynchronous methods, which is likely to lead to performance problems caused by connection pool exhaustion when your program has a high volume of requests.

7. Using Find to find individual target data

Designed to efficiently find a single entity when the primary key is known. Find first checks to see if the entity is already tracked, and if so, returns the entity immediately. A database query is performed only if the entity is not tracked locally, and First/FirstOrDefault immediately queries the database.

//The database is warmed up before the code is executed
double elapsedTime4 = MeasureTime(() => (1);
double elapsedTime5 = MeasureTime(() => (1);

($"Find() first time took : {elapsedTime4} ms");
($"Find() second time took : {elapsedTime5} ms");

//Consoles:
//Find() first time took : 268.41 ms
//Find() second time took : 0.16 ms

Note that it can only be used when querying by key.

8. Use Any to determine the content of the data

When checking whether certain data exists, Any is preferred so that the query stops after the first data is matched, First because it needs to return data, increasing the overhead of data transfer and object instantiation, and Count because it needs to scan the table

double elapsedTime1 = MeasureTime(() => (it =>  == 1));
double elapsedTime2 = MeasureTime(() => (it =>  == 1), 1);
double elapsedTime3 = MeasureTime(() => (it =>  == 1), 1);

($"Any() time took: {elapsedTime1} ms");
($"Count() time took: {elapsedTime2} ms");
($"FirstOrDefault() time took: {elapsedTime3} ms");

//Consoles:
//Any() time took: 237.42 ms
//Count() time took: 239.69 ms
//FirstOrDefault() time took: 258.28 ms

9. Using streaming processing

First understand what buffering and streaming are

Buffering: Load all needed data into memory for subsequent business logic processing
Streaming: Get the data you need on demand and apply it to subsequent logical processing

Figuratively speaking, buffering picks up water in a bucket and pours it into a tank, and flow-through treatment is pumping water through a hose into a tank

In principle, the memory requirements for streaming queries are fixed: they are the same whether the query returns 1 row or 1000 rows. On the other hand, the more rows returned, the more memory is required to buffer the query. For queries that produce large result sets, this can be an important performance factor. Conversely, if your query has a small amount of results, then using caching may return better results.

// Load the data out all at once
var blogsList = (p => ("A")).ToList();
var blogsArray = (p => ("A")).ToArray();

// Use streaming to process one row at a time
foreach (var blog in (p => ("A")))
{
    //do some things...
    SomeDotNetMethod(blog)
}

// Can also be implemented using AsEnumerable
var doubleFilteredBlogs =
    .Where(p => ("A")) // perform a database query
    .AsEnumerable()
    .Where(p => SomeDotNetMethod(p)); // Perform client-side operations

Streaming processing is suitable for processing a large amount of data need to carry out certain business logic processing or execution, but the database can not support the response method or function, this time you can apply streaming processing to carry out the operation.

10. Using SQL queries

In some special cases, such as some complex sql queries, which can not be realized directly using linq syntax, EFCore also supports the use of SQL statements directly for querying or data update operations.

10.1 Basic queries (entities)

Scenario: the final result returned matches the entities defined in the Dataset

// Use FromSql

//Execute a table lookup
var blogs =
    .FromSql($"SELECT * FROM ")
    .ToList();

//Execute a stored procedure query to return entities
var blogs = .
    .FromSql($"EXECUTE ")
    .ToList();

10.2 Scalar queries (non-entity)

Scenario: the final result returned is a custom structure, not a database entity

// Use SqlQuery

//Execute the query to return a single field
var ids =
    .SqlQuery<int>($"SELECT [BlogId] FROM [Blogs]")
    .ToList();

//Execute the query and return the custom data structure
var comments =
    .SqlQuery<int>($"SELECT b.[BlogId],c.[CommnetContent] FROM [Blogs] b JOIN [Comments] c on = ")
    .ToList();

public class CustomBlog{
    public int BlogId
    public string CommnetContent
public string CommnetContent}

10.3. Executing non-query SQL

Scenario: submit updates, deletions, etc., without focusing on the returned results

//Use ExecuteSql

//Execute the update
($"UPDATE [Blogs] SET [Url] = NULL WHERE Id =1");
//Execute Delete
($"DELETE FROM [Blogs] WHERE Id =1");

10.3. SQL parameters

// Use FromSql

//This code is invalid because the database does not allow parameterizing column names (or any other part of the architecture)
var propertyName = "User";
var propertyValue = "johndoe";

var blogs =
    .FromSql($"SELECT * FROM [Blogs] WHERE {propertyName} = {propertyValue}")
    .ToList();

// Correct posture: use FromSqlRaw
var columnName = "Url";
var columnValue = new SqlParameter("columnValue", "http://SomeURL");

var blogs =
    .FromSqlRaw($"SELECT * FROM [Blogs] WHERE {columnName} = @columnValue", columnValue)
    .ToList(); var blogs = .

Other correlation optimizations

In addition to some optimization techniques for EFCore itself, there are also some techniques to help us improve the efficiency of the data query, we can use vs debugging tools to help us listen to the use of memory, CPU occupancy and other indicators, to find bottlenecks, summed up from the following aspects of the main optimization

Avoid in-cycle queries as much as possible, analyze the actual business logic, load all the required data from the database at once as much as possible, and then loop through the processing
Slicing and dicing conditional data, e.g., using chunks, using streaming to handle operations on large datasets.
Use sensible data structures, e.g. Dictionary or HashSet instead of List in scenarios where data order is not a concern.
Use caching to reduce access to hotspot data (on-demand design)
Using Data Table Indexes and Materialized Views (Database)
Use of separate libraries and tables, read/write separation, and use of ES for retrieval (architecture-level optimization)
Utilize multi-threaded concurrency to improve efficiency (use with caution as a last resort)

summarize

EFCore is optimized in several ways:
1. Reduce database interactions through connection reuse, context caching, etc.
2. Reduce the use of memory, such as the use of streaming processing, paging queries, etc.
3. Reduce query complexity, try to handle complex logic in the program

Maintain good coding habits, use the correct data structure and processing logic, optimization should be progressive, first correctly meet the needs, when encountering performance problems with the help of code or tools to analyze bottlenecks, and then go to targeted optimization, do not optimize for the sake of sacrificing the needs and waste of workload.

Finally leave you a sample of the problem code, interested children can try to use the above means to optimize this code, see how much efficiency gains:


var configuration = new ConfigurationBuilder()
            .SetBasePath()
            .AddUserSecrets<Program>()
            .AddJsonFile("", optional: true, reloadOnChange: true)
            .Build();

var serviceProvider = new ServiceCollection()
            .AddDbContext<YouContext>(options =>
                (configuration["ConnectionStrings:YourContext"])
                        .EnableSensitiveDataLogging()
                        .UseLoggerFactory((builder =>
                            {
                                ().AddFilter((category, level) => category ==  && level == );
                            })))
            .BuildServiceProvider();

using (var scope = ())
{
var context = <YourContext>();
(999);
    var data = (x => x.workflow_state == 3).OrderBy(it => it.relationship_guid).Take(100000).ToList();
    var tempData = (it => new Temp { aId = it.entity_from_guid, bId = it.entity_to_guid, deg = 0 }).ToList();

    foreach (var item in tempData)
    {
         = GetInterConnectResult(, );
    }

    int GetInterConnectResult(Guid aId, Guid bId)
    {
        HashSet<Bo> boData = new();
        for (int i = 1; i <= 3; i++)
        {
            if ((it =>  == bId)) break;
            var addIds = (it =>  == i - 1).Select(it => ).Distinct().ToList();
            var addRelationships = context.(it => () || ());

            var addDegEntities = (it => new
            {
                efguid = ,
                etguid = 
            }).Union((it => new
            {
                efguid = ,
                etguid = 
            }))
            .Select(it => new Bo { guid = , deg = i })
            .ToHashSet() ?? new();
            (addDegEntities ?? new());
        }

        return boData?.OrderByDescending(it => )?.FirstOrDefault(it => (bId))?.deg ?? 0;
    }
}

Author: Baibao Men - Zhou Zhifei