Note 1: that most (all?) of what is written here applies equally when using Linq with in-memory collections. However, as it’s rare to have performance issues in such cases, I decided the page would be more usefully placed here, even though the information is not actually specific to Linq2Entities
Note 2: Like all optimisation techniques, caution should be exercised before applying this one. You need to be certain that you know the root cause of your your performance issue, otherwise you end up optimising the wrong thing, which is a waste of time and can make the code harder to read without any performance benefit
Many of the extension methods that we use have a signature like this…
public static IQueryable<TSource> Where<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate)
Notice how it takes an
Expression<Func<TSource, bool>> rather than just a
Func<TSource, bool>? Ever wondered why? Do you even know what the difference is? No, nor do most people, but it’s worth getting to know, as it can be useful.
Very simple explanation of the difference between a Func and an Expression
When you create a Func, which is what you do when you use a lambda, you are creating a piece of code that can be executed, just as if you had written it as a separate method.
When you create an Expression, all you are doing is telling the framework what you want to happen at some point, it doesn’t actually create the method for you (yet). In other words, an Expression is the representation of a method that has yet to be turned into code. Normally, the only time you would want to create an Expression is if you want the run-time ability to manipulate what the method will eventually do. That’s a pretty edge-case scenario for most of us.
“So who cares?” I hear you think.
Well, it turns out that there is a fundamental difference in the way the framework handles them, specifically when using Linq2Something. The provider that turns your Linq into Something (such as SQL in the case of Linq2Sql and Linq2Entities) has to convert the Linq into something the data source understands, such as SQL. If you use a Func, it has a ready-created method, and will apply that to your data source. If it has an Expression, it can be a little more intelligent in how it handles it.
For example, the same code when passed as a Func can end up pulling entire tables out of the database and then filtering them. This can result in poor performance, and even OutOfMemoryExceptions being thrown. By contrast, an Expression has the ability to apply the filtering at source.
Hmm, that wasn’t very clear was it? Truth is, you don’t really need to understand the details (although it’s worth it if you want to be a Grown Up Developer), you can just apply the changes shown below.
Note: Having revisited this issue recently, I can clarify the above a little. The quick answer is that whatever you do to an IQueryable will be sent to the server, whereas what you do to an IEnumerable will be executed in memory. Therefore, if you have any filtering to do, you should do it on an IQueryable, as that will reduce the amount of data pulled out of the database, resulting in faster execution. You’ll need to enumerate the collection (say by calling ToList) to convert from IQueryable to IEnumerable, before doing anything that the database can’t translate into SQL.
How to change your code to use an Expression
Consider the following piece of Linq…
List<Systems> systemsFunc = Systems .Where(s => s.DHRNumber.StartsWith("100-001")).ToList();
The lambda inside the
Where() method gets turned into a
Func<System, bool>. You can force this to be an Expression as follows…
List<Systems> systemsFunc = Systems .Where((Expression<Func<Systems, bool>>)(s => s.DHRNumber.StartsWith("100-001"))).ToList();
Note that we cast the lambda as an Expression<Func<Systems, bool>>, and then have to wrap the actual lambda in brackets to make sure the compiler knows what it’s casting. This makes the code harder to read, but can result in a noticeable improvement in performance, especially with large data sets.
You can do the same into two lines of code as follows…
Expression<Func<Systems, bool>> exp = s => s.DHRNumber.StartsWith("100-001"); List<Systems> systemsFunc = Systems.Where(exp).ToList();
You lose a lot of the benefits of the functional style of Linq this way, but if it helps performance, it may well be worth it.
However, it needs to be said again that you should only try this at home when you are sure that the database query is the cause of your performance issues. Even then, proceed with caution and test the performance carefully. Only use this technique if you are certain that it really makes a difference. When I ran the queries above against a table with 1362 rows, the Func version took around 30ms, and the Expression version took about 2ms. That’s an impressive improvement, until you realise that the actual query only took 30ms in the first place, so is unlikely to be the cause of a performance issue!
There are no end of blog posts and questions about this subject, and about the difference between a Func and an Expression, but one that contains some useful information is this StackOverflow question.
Final word – to be noted
Whilst testing this, I noticed an interesting difference between the two. When using Linq methods that matched text, such as Contains() and StartsWith(), a Func will cause a case-sensitive search to be done, whereas an Expression will cause a case-insensitive search to be done. I spotted this with the following queries…
Func<Users, bool> func = u => u.Surname.StartsWith("Smith"); List<Users> usersFunc = Users.Where(func).ToList(); Expression<Func<Users, bool>> exp = u => u.Surname.StartsWith("Smith"); List<Users> usersExp = Users.Where(exp).ToList();
The second query produced one more user than the first, which confused me until I realised that the first query is case-sensitive, whereas the second one isn’t. I suspect that this is due to the difference in where the filtering takes place (in the database or in the C#), but am not 100% sure. Worth noting the point though.