Whilst the benefits of using an OR/M such as Entity Framework are obvious, so are some of the disadvantages. The most obvious one (to me at least, and as I’m writing this article, I can drone on about what bothers me, OK?) is the performance impact. This comes in two flavours, the first-time hit and the not-first-time-hit (OK, so not a great name, feel free to suggest something better).
The first-time hit is caused when you do your first query after starting the application. Entity Framework starts up a conversation with the database, queries the structure of all of the tables, and builds the model that you will use to query the database. Other than doing a quiet query when the application loads, while distracting the user with a dancing ferret or something, I don’t think there is a lot you can do about this.
The other type of performance issue comes when hydrating entities from the database. No, that isn’t my expression, it’s something that Julie Lerman (the leading expert on Entity Framework outside of Microsoft) popularised, and being the gullible type, I picked up on. It’s quite a cute expression really. Anyway, I digress… when you make a query, EF generates the SQL, runs it, and then uses the returned data to assemble (ie hydrate) the entities. It’s this last part that can be a stinger.
For example, in the one database, we have a table cunningly named GeographicalLocations, which contains, erm geographical locations (told you it was cunningly named). As it includes a whole pile of obscure American one-dog towns, there are about 57K rows in there. Even though the entire table consists of just a few fields, pulling them all out (sorry, hydrating them) took around 11 seconds. Unsurprisingly, the users were not impressed by this delay. Annoyingly, the SQL generated by EF took less than 2 seconds to run directly.
The query that took so long looked like this:
List<GeographicalLocation> geographicalLocations = getContext().CreateObjectSet<GeographicalLocation>() .Where(g => g.Active) .ToList();
So, what’s a boy to do? The query is so simple that it doesn’t leave any scope for optimisation.
The good news
After some investigation, I discovered that the delay is caused by EF setting up the entity tracking. Whilst this is useful, it’s not vital in a query like this, and can be abandoned in favour of a faster query.
The following, slightly modified query takes less than 200 milliseconds to run! yes, you read that right, something like 50 times faster.
List<GeographicalLocation> geographicalLocations = getContext().CreateObjectSet<GeographicalLocation>() .AsNoTracking() .Where(g => g.Active) .ToList();
Note that extra line? That made all the difference.
The not so good news
OK, so you knew it was too good to be true didn’t you? You can’t go from 11 seconds down to 200 milliseconds without paying a price.
It turns out that the tracking is what’s responsible for creating all the navigation properties. For those not familiar with the EF jargon, those are the entity properties that EF adds for each foreign key in the table. So, if you have a Orders table, which has a foreign key reference to the Customers table, the Customer entity created by EF will have an Orders property, and the Order property will have a Customer property. This makes navigating the object graph really easy. If you have a customer called jim, and you wanted to know the total value of his orders, you could do this…
decimal jimsSpending = jim.Orders.Sum(o => o.Cost);
When you have tracking enabled, these navigation properties are created for you. If you have lazy loading enabled, then the navigations properties are added as you reference them, otherwise you have to specify them yourself when you create the query, as follows:
var jim = GetObjectSet() .Include(c => c.Sites) .Include(c => c.Systems) .FirstOrDefault(c => c.Name == "Jim");
This will create the Customer entity for Jim, and create the navigation properties for Jim’s sites and systems.
Sadly, when you disable tracking, you lose that goodness. However, all is not lost, we can just add it in manually.
var jim = GetObjectSet() .FirstOrDefault(c => c.Name == "Jim"); jim.Sites = getContext().CreateObjectSet<Site>() .Where(s => s.CustomerID == jim.ID); jim.Systems = getContext().CreateObjectSet<VRTSystem>() .Where(s => s.CustomerID == jim.ID);
Now, this can become a pain if you have a lot of navigation properties, so you have to make sure it’s worth it. This neatly brings us onto…
The dangers of premature optimisation
Before you run off to rewrite all of your EF queries, you have to consider if it’s really worth it. Whilst performance is a very important factor when writing code, so is maintainability. If you write a query that is so complex that even you won’t be able to work out what it does when you come back to it in six months, then you have to question the wisdom of committing such a query.
The most important thing to do before beginning to optimise your code is to benchmark the existing code. If this is a new feature, and you don’t have existing code, then ignore what you’ve read above and write a normal EF query. If and only if you find performance is an issue, then you can think about rewriting the query without tracking.
Thankfully benchmarking the code is very easy. You just need a System.Diagnostics.Stopwatch object. Usage is trivial…
Stopwatch sw = new Stopwatch(); sw.Start(); List<GeographicalLocation> geographicalLocations = getContext().CreateObjectSet<GeographicalLocation>() .Where(g => g.Active) .ToList(); sw.Stop(); Debug.WriteLine("Geolocs query with Linq2Entities took " + sw.ElapsedMilliseconds + "ms");
You can then rewrite the query (preferably in a separate method, so you can benchmark the two side by side), and compare the execution speed.
As always, you have to balance being clever with being clear.