Lean LINQ Tips - Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)

Thinking in LINQ: Harnessing the power of functional programing in .NET applications (2014)

APPENDIX A. Lean LINQ Tips

LINQ allows users to query any data source in a unified way. However, even with the LINQ standard query operators, used to create these queries, it’s easy to misuse these operators unless you have a solid understanding of how they work internally. Such misuse leads to inefficient queries that are slower—in some cases much slower—than the equivalent properly tuned query.

For example, .NET 4.0 offers implicit typing, in which the compiler figures out the intended data type of a query result at runtime. That’s convenient, so developers tend to be comfortable projecting the result to a strongly typed collection like List<T> via the projection operatorsToList() and such. But such conversions and projections are computationally expensive and offer little benefit. People may also use the convenient range methods such as ForEach() that List<T>) exposes. Again, these provide no benefits; in fact, they simply contribute to the performance issues.

This appendix contains a list of tips (or in some cases “micro-tips,” if you will) that you can use to tune your queries to yield results faster.

Tip 1

Avoid projection such as ToList() or ToArray(), when possible They make the program slow.

Explanation

ToList(), and ToArray() project a list to a newly created list. These projections also force the list values to be evaluated, which is not helpful because that negates the benefits of lazy evaluation and deferred execution that LINQ offers.

Tip 2

Combine multiple Where clauses into a single Where clause unless they are done on separate lists.

Explanation

For each Where clause, you must loop through the collection once. Therefore, the looping process will occur as many times as there are Where loops on any given list. So while refactoring multiple nested if statements together in a Where clause, try putting all of the clauses together.

Tip 3

Use the TrueForAll() method available on the IList<T> implementation instead of All().

Explanation

For long-running operations and cross-collection membership checks, All<T>() offers worse performance than TrueForAll. TrueForAll() uses a for loop, whereas All<T>() uses a foreach loop.

Tip 4

Use OrderBy() and OrderByDescending() wherever you can instead of the native Sort() implementations for lists.

Explanation

To use Sort(), you have to project the collection as a List<T>(). Moreover, Sort() is an in-place implementation. This way, using Sort() the original collection will be corrupted.

Tip 5

Avoid membership lookup using Contains() on any native container within lambda expressions.

Explanation

With the exception of HashSet<T>, collections are not optimized for membership lookup. Therefore, using Contains() inside a lambda means it will have to loop over the entire collection to find the membership status. This makes the query slower.

Tip 6

Use public property Count in the Boolean expression list Object.Count > 0 instead of Any().

Explanation

The Count property is updated whenever a new entry gets added to the collection. Therefore, using Count > 0 to determine whether the collection contains any elements is faster than calling Any(), which has to loop through the collection and update a local variable with the count, as long as the collection remains iterable. Thus, Any() is slower.

Tip 7

Avoid using ElementAt() on containers that support native array-based indexing.

Explanation

For collections that aren’t IList based, ElementAt() loops through the collection to find the element at the specified index, which takes time. So when you need to repeatedly use indexing over the collection, projecting it to an array by using ToArray() once is more efficient than using ElementAt().

Tip 8

Don’t use ElementAt() on dictionaries. It’s not guaranteed to return what you think it should.

Explanation

Dictionaries are implemented by using binary search trees, which by nature are nonlinear data structures. So, by default, integer indexing is not available for dictionaries. In general, if you need integer indexing over a dictionary, it’s time to rethink the design. For example, you might be better off using a List<KeyValuePair<T,U>> type.

Tip 9

Prefer exception handling over ElementAtOrDefault(), FirstOrDefault(), or SingleOrDefault().

Explanation

ElementAtOrDefault() and the other methods in this tip throw exceptions when no element is found at the specified index or when no element is found that matches the given criterion. So don’t rely on these methods to deal with exceptions—it won’t happen. The default versions give you the default values of the type of the collection.

Tip 10

For IList-based containers that support native indexing, don't use First() or Last(). Instead, use [0] and [Count-1].

Explanation

Integer indexing over an array or an IList implementation is the fastest performance you can get in terms of indexing over a generic collection in .NET. In contrast, Last() loops through the entire collection as long as it remains iterable. However, whenever a new element is added to a collection, the Count property gets updated. Thus Count – 1 gets evaluated in constant time, regardless of the size of the collection. Therefore, retrieving the last element using [Count – 1] is usually faster than using Last(). The exception is when you are using the overload ofLast() that takes a predicate. In that case, the execution time is essentially the same.

Tip 11

Use IEnumerable<T> as much as possible in public APIs.

Explanation

Because the entire LINQ API is based on the IEnumerable<T> type, if you take advantage of that by exposing IEnumerable<T> in your public APIs, consumers of your API can immediately take full advantage of everything that LINQ has to offer—most important, deferred execution. In LINQ, unless you’re using a projection operator (such as ToList() and so forth), a query doesn’t get executed immediately. However, when you expose strongly typed collections, any queries performed against your collections must be evaluated immediately, which leads to slower performance.

Tip 12

Consider using home-grown mathematical routines over Sum(), and Average(). Typically, these are about three times slower than their loop-based cousins.

Explanation

Sum(), and Average() use foreach loops internally. In addition, they check for null references. These are slower than a home-grown solution that uses a straight for loop. When you need to find a sum or average, particularly repeatedly, resist the temptation to use the built-in methods and write your own, using a for loop.

Tip 13

Create your own ForEach(Action<T>) that can work on an IEnumerable<T> instance.

Explanation

By doing that, you save the time required to convert the collection to an IList<T> implementation. If you recall, you can use the ForEach operator from MoreLINQ, as discussed in Chapter 5.

Tip 14

Prefer Select<T>() over Cast<T>().

Explanation

Projecting is essentially casting with some sidebars to hold on to when things go wrong (think “exceptions”). Cast<T> throws an exception when it can’t cast the collection to the type specified. However, if you use Select<T>(), you gain complete control on how the casting happens and what to do when an exception occurs.

Tip 15

For any container that supports a public Count property, prefer <instance of container>.Count == 0 over Any().

Explanation

As explained in earlier tips, calling the Count property is a constant-time operation, whereas Any() iterates over the collection. Thus the first version is much faster.

Tip 16

Don’t use Aggregate over a long list. Use a straight for loop instead.

Explanation

At each step, Aggregate has to determine whether it’s reached the edge of the collection. This makes it slower than a straight for loop, where programmers are responsible for handling the number of elements and edge conditions.

Tip 17

Don’t use FirstOrDefault in a loop. It can be terribly slow when called repeatedly (such as in a loop).

Explanation

FirstOrDefault loops from the beginning to the end to find an element that matches a specified predicate condition. Therefore, when placed inside a loop, it results in quadratic time increases.