Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)
APPENDIX A. Lean LINQ Tips
LINQ allows users to query any data source in a unified way. However, even with the LINQ standard query operators, used to create these queries, it’s easy to misuse these operators unless you have a solid understanding of how they work internally. Such misuse leads to inefficient queries that are slower—in some cases much slower—than the equivalent properly tuned query.
For example, .NET 4.0 offers implicit typing, in which the compiler figures out the intended data type of a query result at runtime. That’s convenient, so developers tend to be comfortable projecting the result to a strongly typed collection like List<T> via the projection operatorsToList() and such. But such conversions and projections are computationally expensive and offer little benefit. People may also use the convenient range methods such as ForEach() that List<T>) exposes. Again, these provide no benefits; in fact, they simply contribute to the performance issues.
This appendix contains a list of tips (or in some cases “micro-tips,” if you will) that you can use to tune your queries to yield results faster.
Tip 1
Avoid projection such as ToList() or ToArray(), when possible They make the program slow.
Explanation
ToList(), and ToArray() project a list to a newly created list. These projections also force the list values to be evaluated, which is not helpful because that negates the benefits of lazy evaluation and deferred execution that LINQ offers.
Tip 2
Combine multiple Where clauses into a single Where clause unless they are done on separate lists.
Explanation
For each Where clause, you must loop through the collection once. Therefore, the looping process will occur as many times as there are Where loops on any given list. So while refactoring multiple nested if statements together in a Where clause, try putting all of the clauses together.
Tip 3
Use the TrueForAll() method available on the IList<T> implementation instead of All().
Explanation
For long-running operations and cross-collection membership checks, All<T>() offers worse performance than TrueForAll. TrueForAll() uses a for loop, whereas All<T>() uses a foreach loop.
Tip 4
Use OrderBy() and OrderByDescending() wherever you can instead of the native Sort() implementations for lists.
Explanation
To use Sort(), you have to project the collection as a List<T>(). Moreover, Sort() is an in-place implementation. This way, using Sort() the original collection will be corrupted.
Tip 5
Avoid membership lookup using Contains() on any native container within lambda expressions.
Explanation
With the exception of HashSet<T>, collections are not optimized for membership lookup. Therefore, using Contains() inside a lambda means it will have to loop over the entire collection to find the membership status. This makes the query slower.
Tip 6
Use public property Count in the Boolean expression list Object.Count > 0 instead of Any().
Explanation
The Count property is updated whenever a new entry gets added to the collection. Therefore, using Count > 0 to determine whether the collection contains any elements is faster than calling Any(), which has to loop through the collection and update a local variable with the count, as long as the collection remains iterable. Thus, Any() is slower.
Tip 7
Avoid using ElementAt() on containers that support native array-based indexing.
Explanation
For collections that aren’t IList based, ElementAt() loops through the collection to find the element at the specified index, which takes time. So when you need to repeatedly use indexing over the collection, projecting it to an array by using ToArray() once is more efficient than using ElementAt().
Tip 8
Don’t use ElementAt() on dictionaries. It’s not guaranteed to return what you think it should.
Explanation
Dictionaries are implemented by using binary search trees, which by nature are nonlinear data structures. So, by default, integer indexing is not available for dictionaries. In general, if you need integer indexing over a dictionary, it’s time to rethink the design. For example, you might be better off using a List<KeyValuePair<T,U>> type.
Tip 9
Prefer exception handling over ElementAtOrDefault(), FirstOrDefault(), or SingleOrDefault().
Explanation
ElementAtOrDefault() and the other methods in this tip throw exceptions when no element is found at the specified index or when no element is found that matches the given criterion. So don’t rely on these methods to deal with exceptions—it won’t happen. The default versions give you the default values of the type of the collection.
Tip 10
For IList-based containers that support native indexing, don't use First() or Last(). Instead, use [0] and [Count-1].
Explanation
Integer indexing over an array or an IList implementation is the fastest performance you can get in terms of indexing over a generic collection in .NET. In contrast, Last() loops through the entire collection as long as it remains iterable. However, whenever a new element is added to a collection, the Count property gets updated. Thus Count – 1 gets evaluated in constant time, regardless of the size of the collection. Therefore, retrieving the last element using [Count – 1] is usually faster than using Last(). The exception is when you are using the overload ofLast() that takes a predicate. In that case, the execution time is essentially the same.
Tip 11
Use IEnumerable<T> as much as possible in public APIs.
Explanation
Because the entire LINQ API is based on the IEnumerable<T> type, if you take advantage of that by exposing IEnumerable<T> in your public APIs, consumers of your API can immediately take full advantage of everything that LINQ has to offer—most important, deferred execution. In LINQ, unless you’re using a projection operator (such as ToList() and so forth), a query doesn’t get executed immediately. However, when you expose strongly typed collections, any queries performed against your collections must be evaluated immediately, which leads to slower performance.
Tip 12
Consider using home-grown mathematical routines over Sum(), and Average(). Typically, these are about three times slower than their loop-based cousins.
Explanation
Sum(), and Average() use foreach loops internally. In addition, they check for null references. These are slower than a home-grown solution that uses a straight for loop. When you need to find a sum or average, particularly repeatedly, resist the temptation to use the built-in methods and write your own, using a for loop.
Tip 13
Create your own ForEach(Action<T>) that can work on an IEnumerable<T> instance.
Explanation
By doing that, you save the time required to convert the collection to an IList<T> implementation. If you recall, you can use the ForEach operator from MoreLINQ, as discussed in Chapter 5.
Tip 14
Prefer Select<T>() over Cast<T>().
Explanation
Projecting is essentially casting with some sidebars to hold on to when things go wrong (think “exceptions”). Cast<T> throws an exception when it can’t cast the collection to the type specified. However, if you use Select<T>(), you gain complete control on how the casting happens and what to do when an exception occurs.
Tip 15
For any container that supports a public Count property, prefer <instance of container>.Count == 0 over Any().
Explanation
As explained in earlier tips, calling the Count property is a constant-time operation, whereas Any() iterates over the collection. Thus the first version is much faster.
Tip 16
Don’t use Aggregate over a long list. Use a straight for loop instead.
Explanation
At each step, Aggregate has to determine whether it’s reached the edge of the collection. This makes it slower than a straight for loop, where programmers are responsible for handling the number of elements and edge conditions.
Tip 17
Don’t use FirstOrDefault in a loop. It can be terribly slow when called repeatedly (such as in a loop).
Explanation
FirstOrDefault loops from the beginning to the end to find an element that matches a specified predicate condition. Therefore, when placed inside a loop, it results in quadratic time increases.