How not to use Includes in Lambda Statements

Published: 3rd June 2015

I had been looking into performance issues with one of our products. This is the second in a series of posts in where I have found major issues in query performance. Read Why not to use In Memory Collections in LINQ to Entities for more information. As I stated in that post, data access is using entity framework talking to a SQL Database with a lot of queries being built using LINQ or Lambda statements. For the most part this is good, however it is very important to stop and look at the SQL that is generated by these LINQ queries. In the example I am about to describe, the SQL that was generated was quite excessive.

The following example was found in our production code, I have cleansed this to hide business logic. The includes are string parameters passed in the Lambda statement that tell Entity Framework to return not only the selected table of data, but the associated child records in that table. Where a string of table names separated by periods is specified, entity framework looks across the relationships to get the data.

var filter = "some string";
var list = context.Record.FindWhere(
    x => x.RecordCategories.Id == RecordCategoryId
        && x.FieldToFilter == filter
        && x.SomeBoolean1,
    "SubRecord1",
    "SubRecord2",
    "SubRecord3.ParameterTable.SubRecord3")
    .OrderBy(x => x.Id).ToList();

You may notice that the SubRecord3 line looks a little odd. Not only are we getting records in the SubRecord3 table, we’re also joining out to a parameters table and then finding all the records in the SubRecords3 table that are linked to the same parameters table, even if the record doesn’t belong to the parent table. This alone would be bad, however given the purpose of this query is to return six total values it gets even worse. On our ten thousand record dataset I found the following:

This was very bad, so I set about rebuilding the data queries to get just what I wanted. Granted this required more code than the original query:

var filter = "some string";

var subRecord3Classification = OurEnum.OurClassification.ToString();

var subRecord2s = (from item in context.SubRecord2.FindAll()
                   group item by new { item.ParentId }
                   into items
                   select new
                       {
                           items.Key.ParentId,
                           Total1 = items.Sum(x => x.Field1),
                           Total2 = items.Sum(x => x.Field2)
                       }).ToList();

var subRecord3s = (from item in context.SubRecord3.FindWhere(x => x.ParameterTable.Classification == subRecord3Classification)
                   group item by new { item.ParentId }
                   into items
                   select new
                       {
                           items.Key.ParentId,
                           Total3 = items.Sum(x => x.Field3)
                       }).ToList();

var parentRecords = (from item in context.Record.FindWhere(x => x => x.RecordCategories.Id == RecordCategoryId && x.FieldToFilter == filter && x.SomeBoolean1, "SubRecord1")
                     select new
                         {
                             RecordId = item.Id,
                             RecordStatus = item.RecordStatus,
                             OwnerId = item.SubRecord1.OwnerId,
                             OwnerName = item.SubRecord1.OwnerName,
                             Value1 = item.RecordStatus ? item.Value1 : 0,
                             Value2 = item.RecordStatus ? item.Value2 : 0
                         }).ToList();

Since most of the data in the previous query was surplus to requirements, I re-built this as three separate queries on each required table pulling back just the bare minimum I needed. In both cases there is code I have omitted that converts the results into a simple object model returning a row per parent record, however it can be argued that another step would have been to return a single set of totals. The difference however was as follows:

For the most part the original functionality works well enough on our smaller clients, but on our bigger clients it is very sluggish. The question could easily be raised as to how we didn’t spot this before, the answer to that there were tight time-scales imposed by government legislation meaning that one way or another the base functionality had to go to the customer by a set date.

Now if you can imagine my groans at finding out what the original query above was doing, imagine how I felt when I found something like the following shortly afterwards. This one took 2 minutes to execute and SQL Profiler refused to show the SQL Generated:

var list = context.Record.FindWhere(
    x => x.RecordCategories.Id == RecordCategoryId
        && x.FieldToFilter == filter
        && x.SomeBoolean1,
    "SubRecord1.TertiaryRecord1",
    "SubRecord2",
    "SubRecord3.ParameterTable1.SubRecord3",
    "SubRecord4",
    "SubRecord5",
    "SubRecord6",
    "SubRecord7",
    "SubRecord8",
    "SubRecord9",
    "SubRecord10",
    "SubRecord11",
    "SubRecord12",
    "SubRecord13.ParameterTable2").OrderBy(x => x.ID).ToListd);

When it comes to writing any database query using LINQ to Entities or Lambda Expressions I have the following advice from my investigations:

Development