LINQ standard query operators - C# in Depth (2012)

C# in Depth (2012)

Appendix A. LINQ standard query operators

There are many standard query operators in LINQ, only some of which are supported directly in C# query expressions—the others have to be called manually as normal methods. Some of the standard query operators are demonstrated in the main text of the book, but they’re all listed in this appendix.

Most of the examples use the following two sample sequences:

string[] words = {"zero", "one", "two", "three", "four"};

int[] numbers = {0, 1, 2, 3, 4};

For completeness, I’ve included the operators we’ve already looked at, although in most cases chapter 11 contains more detail on them than I’ve provided here.

The behavior specified here is that of LINQ to Objects; other providers may work differently. For each operator, I’ve specified whether it uses deferred or immediate execution. If an operator uses deferred execution, I’ve also indicated whether it streams or buffers its data.

A while ago, I reimplemented LINQ to Objects from scratch in a project called Edulinq, blogging details about every single operator and considering possibilities for optimization, lazy evaluation, and so on. For more detail than you’re ever likely to want to know about LINQ to Objects, visit the Edulinq project home page at http://edulinq.googlecode.com.

A.1. Aggregation

The aggregation operators (see table A.1) all result in a single value rather than a sequence. Average and Sum operate either on a sequence of numbers (any of the built-in numeric types) or on a sequence of elements with a delegate to convert from each element to one of the built-in numeric types. Min and Max have overloads for numeric types but can also operate on any sequence either using the default comparer for the element type or using a conversion delegate. Count and LongCount are equivalent to each other, just with different return types. Both of these have two overloads—one that just counts the length of the sequence, and one that takes a predicate, and only elements matching the predicate are counted.

Table A.1. Examples of aggregation operators

Expression

Result

numbers.Sum()

10

numbers.Count()

5

numbers.Average()

2

numbers.LongCount(x => x % 2 == 0)

3 (as a long; there are three even numbers)

words.Min(word => word.Length)

3 ("one" and "two")

words.Max(word => word.Length)

5 ("three")

numbers.Aggregate("seed",
(current, item) => current + item,
result=> result.ToUpper())

"SEED01234"

The most generalized aggregation operator (shown in the bottom row of table A.1) is just called Aggregate. All the other aggregation operators could be expressed as calls to Aggregate, although it’d be relatively painful to do so. The basic idea is that there’s always a “result so far,” starting with an initial seed. An aggregation delegate is applied for each element of the input sequence; the delegate takes the result so far and the input element, and produces the next result. As a final optional step, a conversion is applied from the aggregation result to the return value of the method. This conversion may result in a different type, if necessary. It’s not quite as complicated as it sounds, but still you’re unlikely to use it often.

All of the aggregation operators use immediate execution. The overload for Count that doesn’t use a predicate is optimized for implementations of ICollection and ICollection<T>; in that situation, it’ll use the Count property of the collection without reading any data.[1]

1 There’s no such shortcut for LongCount. I’ve personally never seen this method used in LINQ to Objects.

A.2. Concatenation

There’s a single concatenation operator: Concat (see table A.2). As you might expect, this operates on two sequences and returns a single sequence consisting of all the elements of the first sequence followed by all the elements of the second. The two input sequences must be of the same type, execution is deferred, and all data is streamed.

Table A.2. Concat example

Expression

Result

numbers.Concat(new[] {2, 3, 4, 5, 6})

0, 1, 2, 3, 4, 2, 3, 4, 5, 6

A.3. Conversion

The conversion operators cover a fair range of uses, but they all come in pairs.

The examples in table A.3 use two additional sequences to demonstrate Cast and OfType:

object[] allStrings = {"These", "are", "all", "strings"};

object[] notAllStrings = {"Number", "at", "the", "end", 5};

Table A.3. Conversion examples

Expression

Result

allStrings.Cast<string>()

"These", "are", "all", "strings" (as IEnumerable<string>)

allStrings.OfType<string>()

"These", "are", "all", "strings" (as IEnumerable<string>)

notAllStrings.Cast<string>()

Exception is thrown while iterating, at point of failing conversion

notAllStrings.OfType<string>()

"Number", "at", "the", "end" (as IEnumerable<string>)

numbers.ToArray()

0, 1, 2, 3, 4 (as int[])

numbers.ToList()

0, 1, 2, 3, 4 (as List<int>)

words.ToDictionary(w => w.Substring(0, 2))

Dictionary contents: "ze": "zero"
"on": "one"
"tw": "two"
"th": "three"
"fo": "four"

// Key is first character of word
words.ToLookup(word => word[0])

Lookup contents: 'z': "zero"
'o': "one"
't': "two", "three"
'f': "four"

words.ToDictionary(word => word[0])

Exception: can only have one entry per key, so fails on ’t’

ToArray and ToList are fairly self-explanatory: they read the whole sequence into memory, returning it either as an array or as a List<T>. Both use immediate execution.

Cast and OfType convert an untyped sequence into a typed one, either throwing an exception (for Cast) or ignoring (for OfType) elements of the input sequence that aren’t implicitly convertible to the output sequence element type using an unboxing or reference conversion. This may also be used to convert typed sequences into more specifically typed sequences, such as converting IEnumerable<object> to IEnumerable <string>. These use deferred execution and stream their input data.

ToDictionary and ToLookup both take delegates to obtain the key for any particular element. ToDictionary returns a dictionary mapping the key to the element type, whereas ToLookup returns an appropriately typed ILookup<,>. A lookup is like a dictionary where the value associated with a key isn’t one element but a sequence of elements. Lookups are generally used when duplicate keys are expected as part of normal operation, whereas a duplicate key will cause ToDictionary to throw an exception. More complicated overloads of both methods allow a custom IEqualityComparer<T> to be used to compare keys, and a conversion delegate to be applied to each element before it’s put into the dictionary or lookup. Both of these methods use immediate execution.

There are two additional operators that I haven’t provided examples for: AsEnumerable and AsQueryable. They don’t affect the results in an immediately obvious way, so they can’t really be demonstrated here. Instead, they affect the manner in which the query is executed.Queryable.AsQueryable is an extension method on IEnumerable that returns an IQueryable (both types being generic or nongeneric, depending on which overload you pick). If the IEnumerable you call it on is already an IQueryable, it returns the same reference; otherwise it creates a wrapper around the original sequence. The wrapper allows you to use all the normal Queryable extension methods, passing in expression trees, but when the query is executed the expression tree is compiled into normal IL and executed directly, using theLambdaExpression.Compile method shown in section 9.3.2.

Enumerable.AsEnumerable is an extension method on IEnumerable<T> and has a trivial implementation, simply returning the reference it was called on. No wrappers are involved—it just returns the same reference. This forces the Enumerable extension methods to be used in subsequent LINQ operators. Consider the following query expressions:

// Filter the users in the database with LIKE

from user in context.Users

where user.Name.StartsWith("Tim")

select user;

// Filter the users in memory

from user in context.Users.AsEnumerable()

where user.Name.StartsWith("Tim")

select user;

The second query expression forces the compile-time type of the source to be -IEnumerable<User> instead of IQueryable<User>, so all the processing is done in memory instead of at the database. The compiler will use the Enumerable extension methods (taking delegate parameters) instead of the Queryable extension methods (taking expression tree parameters). Normally you want to do as much processing as possible in SQL, but when there are transformations that require local code, you sometimes have to force LINQ to use the appropriate Enumerable extension methods. Of course, this isn’t specific to databases; the theme of forcing the tail of a query to use Enumerable is applicable for other providers too, if they’re based on IQueryable or something similar.

A.4. Element operators

This is another selection of query operators that are grouped in pairs (see table A.4). This time, the pairs all work the same way. There’s a simple version that picks a single element if it can or throws an exception if the specified element doesn’t exist, and a version with OrDefault at the end of the name. All of these operators use immediate execution.

Table A.4. Single element selection examples

Expression

Result

words.ElementAt(2)

"two"

words.ElementAtOrDefault(10)

null

words.First()

"zero"

words.First(w => w.Length == 3)

"one"

words.First(w => w.Length == 10)

Exception: no matching elements

words.FirstOrDefault
(w =>w.Length == 10)

null

words.Last()

"four"

words.Single()

Exception: more than one element

words.SingleOrDefault()

Exception: more than one element

words.Single(word => word.Length == 5)

"three"

words.Single(word => word.Length == 10)

Exception: no matching elements

words.SingleOrDefault
(w =>w.Length == 10)

null

The operator names are easily understood: First and Last return the first and last elements of the sequence, respectively, throwing an InvalidOperationException if the sequence is empty. Single returns the only element in a sequence, throwing an exception if the sequence is empty or has more than one element. ElementAt returns a specific element by index—the fifth element, for example. An ArgumentOutOfRangeException is thrown if the index is negative or too large for the actual number of elements in the collection. In addition, there’s an overload for all of the operators other than ElementAt to filter the sequence first—for example, First can return the first element that matches a given condition.

The OrDefault versions of these methods suppress the exceptions I’ve just described (returning the default value for the element type instead) except in one case: SingleOrDefault will return a default value if the sequence is empty, but if there’s more than one element, it’ll still throw an exception, just like Single. This is designed for situations where if everything’s correct, the sequence will have zero or one element. If you want to cope with sequences that may have more elements, use FirstOrDefault instead.

All of the overloads that don’t have a predicate parameter are optimized for instances of IList<T>, as they can access the correct element without iterating. There’s no optimization when a predicate is involved—it wouldn’t make sense for most calls, although it could make a big difference when finding the last matching element in a list, by moving backward from the end. At the time of writing, that case isn’t optimized, but it could change in a future version.

A.5. Equality

There’s only one standard equality operator: SequenceEqual (see table A.5). This just compares two sequences for element-by-element equality, including order. For instance, the sequence 0, 1, 2, 3, 4 isn’t equal to 4, 3, 2, 1, 0. An overload allows a specific IEqualityComparer<T> to be used when comparing elements. The return value is a Boolean, and it’s computed with immediate execution.

Table A.5. Sequence equality examples

Expression

Result

words.SequenceEqual
(new[]{"zero","one","two","three","four"})

True

words.SequenceEqual
(new[]{"ZERO","ONE","TWO","THREE","FOUR"})

False

words.SequenceEqual
(new[]{"ZERO","ONE","TWO","THREE","FOUR"},
StringComparer.OrdinalIgnoreCase)

True

Again, LINQ to Objects misses a trick here in terms of optimization: if both sequences have an efficient way of retrieving their counts, it would make sense to check whether those are equal before comparing the elements themselves. As it is, the implementation just walks through both sequences until it reaches the end or finds an inequality.

A.6. Generation

Of all the generation operators (see table A.6), only one acts on an existing sequence: DefaultIfEmpty. This returns either the original sequence if it’s not empty, or a sequence with a single element otherwise. The element is normally the default value for the sequence type, but an overload allows you to specify which value to use.

Table A.6. Generation examples

Expression

Result

numbers.DefaultIfEmpty()

0, 1, 2, 3, 4

new int[0].DefaultIfEmpty()

0 (within an IEnumerable<int>)

new int[0].DefaultIfEmpty(10)

10 (within an IEnumerable<int>)

Enumerable.Range(15, 2)

15, 16

Enumerable.Repeat(25, 2)

25, 25

Enumerable.Empty<int>()

An empty IEnumerable<int>

There are three other generation operators that are just static methods in Enumerable:

· Range generates a sequence of integers, with the parameters specifying the first value and how many values to generate.

· Repeat generates a sequence of any type by repeating a specified single value for a specified number of times.

· Empty generates an empty sequence of any type.

All of the generation operators use deferred execution and stream their output—in other words, they don’t just prepopulate a collection and return that. The exception is Empty, which returns an empty array of the correct type. An empty array is completely immutable, so the same array can be returned for every call for the same element type.

A.7. Grouping

There are two grouping operators, but one of them is ToLookup, which you’ve already seen in section A.3 as a conversion operator. That just leaves GroupBy, which we examined in section 11.6.1 in the form of the group ... by clause in query expressions. It uses deferred execution but buffers its results: when you start iterating over the resulting sequence of groups, the whole of the input is consumed.

The result of GroupBy is a sequence of appropriately typed IGrouping<,> elements. Each element has a key and a sequence of elements that match that key. In many ways, this is just a different way of looking at a lookup—instead of having random access to the groups by key, the groups are enumerated in turn. The order in which the groups are returned is the order in which their respective keys are discovered. Within a group, the order is the same as in the original sequence.

GroupBy has a daunting number of overloads, allowing you to specify not only how a key is derived from an element (which is always required) but also optionally the following:

· How to compare keys.

· A projection from an original element to the element within a group.

· A projection that takes both a key and a sequence of matching elements. The overall result in this case is a sequence of elements of the result type of the projection.

Table A.7 contains examples of the second and third options, as well as the simplest form. Custom key comparisons are slightly more long-winded to demonstrate, but they work in the obvious way.

Table A.7. GroupBy examples

Expression

Result

words.GroupBy(word => word.Length)

Key: 4; Sequence: "zero", "four" Key: 3; Sequence: "one", "two" Key: 5; Sequence: "three"

words.GroupBy
(word => word.Length, // Key
word => word.ToUpper() // Group element
)

Key: 4; Sequence: "zero", "four" Key: 3; Sequence: "one", "two" Key: 5; Sequence: "three"

// Project each (key, group) pair to stringwords.GroupBy
(word => word.Length,
(key, g) => key + ": " + g.Count())

"4: 2", "3: 2", "5: 1"

The option specified by the last bullet point is rarely used, in my experience.

A.8. Joins

Two operators are specified as join operators, Join and GroupJoin, both of which you saw in section 11.5 using the join and join ... into query expression clauses, respectively. Each method takes several parameters: two sequences, a key selector for each sequence, a projection to apply to each matching pair of elements, and optionally a key comparison.

For Join the projection takes one element from each sequence and produces a result; for GroupJoin the projection takes an element from the left sequence and a sequence of matching elements from the right sequence. Both use deferred execution and stream the left sequence but read the right sequence in its entirety when the first result is requested.

For the join examples in table A.8, we’ll match a sequence of names (Robin, Ruth, Bob, Emma) against a sequence of colors (Red, Blue, Beige, Green) by looking at the first character of both the name and the color, so Robin will join with Red and Bob will join with both Blue and Beige, for example.

Table A.8. Join examples

Expression

Result

names.Join // Left sequence

(colors, // Right sequence

name => name[0], // Left key selector

color=> color[0], // Right key selector

// Projection for result pairs

(name, color) => name + " - " + color

)

"Robin - Red",

"Ruth - Red",

"Bob - Blue",

"Bob - Beige"

names.GroupJoin

(colors,

name => name[0],

color => color[0],

// Projection for key/sequence pairs

(name, matches) => name + ": " +

string.Join("/", matches.ToArray())

)

"Robin: Red",

"Ruth: Red",

"Bob: Blue/Beige",

"Emma: "

Note that Emma doesn’t match any of the colors—the name doesn’t appear at all in the results of the first example, but it does appear in the second, with an empty sequence of colors.

A.9. Partitioning

The partitioning operators either skip an initial part of the sequence, returning only the rest, or take only the initial part of a sequence, ignoring the rest. In each case, you can either specify how many elements are in the first part of the sequence or specify a condition—the first part of the sequence continues until the condition fails. After the condition fails for the first time, it isn’t tested again—it doesn’t matter whether later elements in the sequence match. All of the partitioning operators use deferred execution and stream their data.

Partitioning effectively divides the sequence into two distinct parts, either by position or by predicate. In each case, if you concatenate the results of Take or TakeWhile with the results of the corresponding Skip or SkipWhile, providing the same argument to both calls, you’ll end up with the original sequence: each element will occur exactly once, in the original order. This is demonstrated by the calls in table A.9.

Table A.9. Partitioning examples

Expression

Result

words.Take(2)

"zero", "one"

words.Skip(2)

"two", "three", "four"

words.TakeWhile(word => word.Length <= 4)

"zero", "one", "two"

words.SkipWhile(word => word.Length <= 4)

"three", "four"

A.10. Projection

You’ve seen two projection operators (Select and SelectMany) in chapter 11. Select is a simple one-to-one projection from a source element to a result element. SelectMany is used when there are multiple from clauses in a query expression; each element in the original sequence is used to generate a new sequence. Both projection operators (see table A.10) use deferred execution.

Table A.10. Projection examples

Expression

Result

words.Select(word => word.Length)

4, 3, 3, 5, 4

words.Select
((word, index) =>
index.ToString() + ": " +word)

"0: zero", "1: one", "2: two",
"3: three", "4: four"

words.SelectMany
(word => word.ToCharArray())

'z', 'e', 'r', 'o', 'o', 'n', 'e', 't',
'w', 'o', 't', 'h', 'r', 'e', 'e', 'f',
'o', 'u', 'r'

words.SelectMany
((word, index) =>
Enumerable.Repeat(word, index))

"one", "two", "two",
"three", "three", "three",
"four", "four", "four", "four"

There are additional overloads you didn’t see in chapter 11. Both methods have overloads that allow the index within the original sequence to be used in the projection, and SelectMany either flattens all of the generated sequences into a single sequence without including the original element at all, or it uses a projection to generate a result element for each pair of elements. Multiple from clauses always use the overload that takes a projection. (Examples of this are long-winded and not included here. See chapter 11 for more details.)

.NET 4 introduced a new operator called Zip. This isn’t officially a standard query operator according to MSDN, but it’s worth knowing about anyway. It takes two sequences and applies the specified projection to each pair: the first element from each sequence, then the second element from each sequence, and so on. The resulting sequence finishes when either of the source sequences does. Table A.11 shows two examples of Zip, using the names and colors from section A.8. Zip uses deferred execution and streams its data.

Table A.11. Zip examples

Expression

Result

names.Zip(colors, (x, y) => x + "-" + y)

"Robin-Red",
"Ruth-Blue",
"Bob-Beige",
"Emma-Green"

// Second sequence stops early
names.Zip(colors.Take(3),
(x, y) => x + "-" + y)

"Robin-Red",
"Ruth-Blue",
"Bob-Beige"

A.11. Quantifiers

The quantifier operators shown in table A.12 all return a Boolean value, using immediate execution:

· All checks whether all the elements in the sequence satisfy the given predicate.

· Any checks whether any of the elements in the sequence satisfy the given predicate, or whether there are any elements at all for the parameterless overload.

· Contains checks whether the sequence contains a particular element, optionally specifying a comparison to use.

Table A.12. Quantifier examples

Expression

Result

words.All(word => word.Length > 3)

false ("one" and "two" have exactly three letters)

words.All(word => word.Length > 2)

true

words.Any()

true (the sequence isn’t empty)

words.Any(word => word.Length == 6)

false (no six-letter words)

words.Any(word => word.Length == 5)

true ("three" satisfies the condition)

words.Contains("FOUR")

false

words.Contains("FOUR",
StringComparer.OrdinalIgnoreCase)

true

Any is a particularly useful operator that’s often forgotten. If you’re trying to find out whether a sequence contains any items (or any items matching a predicate), it’s much better to use source.Any(...) than source.Count(...) > 0. They should give the same results, but Any can stop as soon as it’s found the first item, whereas Count has to count all the items, even though you only need to know whether the result is nonzero.

The overload for Contains that doesn’t specify a custom comparison is optimized if the source implements ICollection<T> by delegating to the interface implementation. This means Enumerable.Contains() will still be fast when called on a HashSet<T>, for example.

A.12. Filtering

The two filtering operators are OfType and Where. For details and examples of the OfType operator, see section A.3. The Where operator returns a sequence containing all the elements matching the given predicate. It has an overload to allow the predicate to take account of the element’s index. It’s unusual to require the index, and the where clause in query expressions doesn’t use this overload. Where always uses deferred execution and streams its data. Table A.13 demonstrates both overloads.

Table A.13. Filtering examples

Expression

Result

words.Where(word => word.Length > 3)

"zero", "three", "four"

words.Where
((word, index) => index < word.Length)

"zero", // index=0, length=4
"one", // index=1, length=3
"two", // index=2, length=2
"three", // index=3, length=5
// Not "four", index=4, length=4

A.13. Set-based operators

It’s natural to be able to consider two sequences as sets of elements. The four set-based operators all have two overloads, one using the default equality comparison for the element type, and one where the comparison is specified in an extra parameter. All of them use deferred execution.

The Distinct operator is the simplest—it acts on a single sequence and just returns a new sequence of all the distinct elements, discarding duplicates. The other operators also make sure they only return distinct values, but they act on two sequences:

· Intersect returns elements that appear in both sequences.

· Union returns the elements that are in either sequence.

· Except returns elements that are in the first sequence but not in the second. (Elements that are in the second sequence but not the first are not returned.)

The examples of these operators in table A.14 use two new sequences: abbc ("a", "b", "b", "c") and cd ("c", "d").

Table A.14. Set-based examples

Expression

Result

abbc.Distinct()

"a", "b", "c"

abbc.Intersect(cd)

"c"

abbc.Union(cd)

"a", "b", "c", "d"

abbc.Except(cd)

"a", "b"

cd.Except(abbc)

"d"

All of these operators use deferred execution, but the buffering/streaming distinction is slightly more complicated. Distinct and Union both stream their input sequences, whereas Intersect and Except read the whole of the right input sequence to start with, but then stream the left input sequence in a way similar to the join operators. All these operators keep a set of the elements they’ve already returned so as not to return duplicates. This means that even Distinct and Union are unsuitable for sequences that are too large to fit into memory, unless you know that there will be a limited set of distinct elements.

A.14. Sorting

You’ve seen all the sorting operators before: OrderBy and OrderByDescending provide a primary ordering, whereas ThenBy and ThenByDescending provide subsequent orderings for elements that aren’t differentiated by the primary one. In each case a projection is specified from an element to its sorting key, and a comparison (between keys) can also be specified. Unlike some other sorting algorithms in the framework (such as List<T>.Sort), the LINQ orderings are stable—in other words, if two elements are regarded as equal in terms of their sorting key, they’ll be returned in the order they appeared in the original sequence.

The final sorting operator is Reverse, which simply reverses the order of the sequence. All of the sorting operators (see table A.15) use deferred execution, but buffer their data.

Table A.15. Sorting examples

Expression

Result

words.OrderBy(word => word)

"four", "one", "three",
"two", "zero"

// Order words by second character
words.OrderBy(word => word[1])

"zero", "three", "one",
"four", "two"

// Order words by length;
// equal lengths returned in original
// order
words.OrderBy(word => word.Length)

"one", "two", "zero",
"four", "three"

words.OrderByDescending
(word => word.Length)

"three", "zero",s
"four", "one", "two"

// Order words by length and then
// alphabetically
words.OrderBy(word => word.Length)
.ThenBy(word => word)

"one", "two", "four",
"zero", "three"

// Order words by length and then
// alphabetically backwards
words.OrderBy(word => word.Length)
.ThenByDescending(word => word)

"two", "one", "zero",
"four", "three"

words.Reverse()

"four", "three", "two",
"one", "zero"