LINQ with Query Expressions - Essential C# 6.0 (2016)

Essential C# 6.0 (2016)

15. LINQ with Query Expressions

The end of Chapter 14 showed a query using standard query operators for GroupJoin(), SelectMany(), and Distinct(), in addition to the creation of two anonymous types. The result was a statement that spanned multiple lines and was rather more complex and difficult to comprehend than statements typically written using only features of earlier versions of C#. Modern programs that manipulate rich data sets often require such complex queries; it would therefore be nice if the language made them easier to read. Domain-specific query languages such as SQL make it much easier to read and understand a query, but lack the full power of the C# language. That is why the C# language designers added query expressions syntax to C# 3.0. With query expressions, many standard query operator expressions are transformed into more readable code, much like SQL.

Image

In this chapter, we introduce query expressions and use them to express many of the queries from the preceding chapter.

Introducing Query Expressions

Two of the operations that developers most frequently perform are filtering the collection to eliminate unwanted items and projecting the collection so that the items take a different form. For example, given a collection of files, we could filter it to create a new collection of only the files with a “.cs” extension, or only the files larger than 1 million bytes. We could also project the file collection to create a new collection of paths to the directories where the files are located and the corresponding directory size. Query expressions provide straightforward syntaxes for both of these common operations. Listing 15.1 shows a query expression that filters a collection of strings; Output 15.1 shows the results.

LISTING 15.1: Simple Query Expression


using System;
using System.Collections.Generic;
using System.Linq;

// ...

static string[] Keywords = {
"abstract", "add*", "alias*", "as", "ascending*",
"async*", "await*", "base","bool", "break",
"by*", "byte", "case", "catch", "char", "checked",
"class", "const", "continue", "decimal", "default",
"delegate", "descending*", "do", "double",
"dynamic*", "else", "enum", "event", "equals*",
"explicit", "extern", "false", "finally", "fixed",
"from*", "float", "for", "foreach", "get*", "global*",
"group*", "goto", "if", "implicit", "in", "int",
"into*", "interface", "internal", "is", "lock", "long",
"join*", "let*", "nameof*", "namespace", "new", "null",
"object", "on*", "operator", "orderby*", "out",
"override", "params", "partial*", "private", "protected",
"public", "readonly", "ref", "remove*", "return", "sbyte",
"sealed", "select*", "set*", "short", "sizeof",
"stackalloc", "static", "string", "struct", "switch",
"this", "throw", "true", "try", "typeof", "uint", "ulong",
"unsafe", "ushort", "using", "value*", "var*", "virtual",
"unchecked", "void", "volatile", "where*", "while", "yield*"};

private static void ShowContextualKeywords1()
{
IEnumerable<string> selection =
from word in Keywords
where !word.Contains('*')
select word;

foreach (string keyword in selection)
{
Console.Write(keyword + " ");
}
}

// ...


OUTPUT 15.1

abstract as base bool break byte case catch char checked class const
continue decimal default delegate do double else enum event explicit
extern false finally fixed float for foreach goto if implicit in int
interface internal is lock long namespace new null object operator out
override params private protected public readonly ref return sbyte
sealed short sizeof stackalloc static string struct switch this throw
true try typeof uint ulong unchecked unsafe ushort using virtual void
volatile while

In this query expression, selection is assigned the collection of C# reserved keywords. The query expression in this example includes a where clause that filters out the noncontextual keywords.

Query expressions always begin with a “from clause” and end with a “select clause” or a “group clause,” identified by the from, select, or group contextual keyword, respectively. The identifier word in the from clause is called a range variable; it represents each item in the collection, much as the loop variable in a foreach loop represents each item in a collection.

Developers familiar with SQL will notice that query expressions have a syntax that is similar to that of SQL. This design was deliberate—it was intended that LINQ should be easy to learn for programmers who already know SQL. However, there are some obvious differences. The first difference that most SQL-experienced developers will notice is that the C# query expression shown here has the clauses in the following order: from, then where, then select. The equivalent SQL query puts the SELECT clause first, then the FROM clause, and finally the WHERE clause.

One reason for this change in sequence is to enable use of IntelliSense, the feature of the IDE whereby the editor produces helpful user interface elements such as drop-down lists that describe the members of a given object. Because from appears first and identifies the string arrayKeywords as the data source, the code editor can deduce that the range variable word is of type string. When you are entering the code into the editor and reach the dot following word, the editor will display only the members of string.

If the from clause appeared after the select, as it does in SQL, as you were typing in the query the editor would not know what the data type of word was, so it would not be able to display a list of word’s members. In Listing 15.1, for example, it wouldn’t be possible to predict thatContains() was a possible member of word.

The C# query expression order also more closely matches the order in which operations are logically performed. When evaluating the query, you begin by identifying the collection (described by the from clause), then filter out the unwanted items (with the where clause), and finally describe the desired result (with the select clause).

Finally, the C# query expression order ensures that the rules for “where” (range) variables are in scope are mostly consistent with the scoping rules for local variables. For example, a (range) variable must be declared by a clause (typically a from clause) before the variable can be used, much as a local variable must always be declared before it can be used.

Projection

The result of a query expression is a collection of type IEnumerable<T> or IQueryable<T>.1 The actual type T is inferred from the select or group by clause. In Listing 15.1, for example, the compiler knows that Keywords is of type string[], which is convertible toIEnumerable<string>, and deduces that word is therefore of type string. The query ends with select word, which means the result of the query expression must be a collection of strings, so the type of the query expression is IEnumerable<string>.

1. The result of a query expression is, as a practical matter, almost always IEnumerable<T> or a type derived from it. It is legal, though somewhat perverse, to create an implementation of the query methods that return other types; there is no requirement in the language that the result of a query expression be convertible to IEnumerable<T>.

In this case, the “input” and the “output” of the query are both a collection of strings. However, the “output” type can be quite different from the “input” type if the expression in the select clause is of an entirely different type. Consider the query expression in Listing 15.2, and its corresponding output in Output 15.2.

LISTING 15.2: Projection Using Query Expressions


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...

static void List1(string rootDirectory, string searchPattern)
{
IEnumerable<string> fileNames = Directory.GetFiles(
rootDirectory, searchPattern);
IEnumerable<FileInfo> fileInfos =
from fileName in fileNames
select new FileInfo(fileName);

foreach (FileInfo fileInfo in fileInfos)
{
Console.WriteLine(
$@".{ fileInfo.Name } ({
fileInfo.LastWriteTime })");
}
}

// ...


OUTPUT 15.2

Account.cs (11/22/2011 11:56:11 AM)
Bill.cs (8/10/2011 9:33:55 PM)
Contact.cs (8/19/2011 11:40:30 PM)
Customer.cs (11/17/2011 2:02:52 AM)
Employee.cs (8/17/2011 1:33:22 AM)
Person.cs (10/22/2011 10:00:03 PM)

This query expression results in an IEnumerable<FileInfo> rather than the IEnumerable<string> data type returned by Directory.GetFiles(). The select clause of the query expression can potentially project out a data type that is different from what was collected by the from clause expression.

In this example, the type FileInfo was chosen because it has the two relevant fields needed for the desired output: the filename and the last write time. There might not be such a convenient type if you needed other information not captured in the FileInfo object. Anonymous types provide a convenient and concise way to project the exact data you need without having to find or create an explicit type. (In fact, this scenario was the key motivator for adding anonymous types to the language.) Listing 15.3 provides output similar to that in Listing 15.2, but via anonymous types rather than FileInfo.

LISTING 15.3: Anonymous Types within Query Expressions


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...

static void List2(string rootDirectory, string searchPattern)
{
var fileNames =Directory.EnumerateFiles(
rootDirectory, searchPattern)
var fileResults =
from fileName in fileNames
select new
{
Name = fileName,
LastWriteTime = File.GetLastWriteTime(fileName)
};

foreach (var fileResult in fileResults)
{
Console.WriteLine(
$@"{ fileResult.Name } ({
fileResult.LastWriteTime })");
}
}

// ...


In this example, the query projects out only the filename and its last file write time. A projection such as the one in Listing 15.3 makes little difference when working with something small, such as FileInfo. However, “horizontal” projection that filters down the amount of data associated with each item in the collection is extremely powerful when the amount of data is significant and retrieving it (perhaps from a different computer over the Internet) is expensive. Rather than retrieving all the data when a query executes, the use of anonymous types enables the capability of storing and retrieving only the required data into the collection.

Imagine, for example, a large database that has tables with 30 or more columns. If there were no anonymous types, developers would be required either to use objects containing unnecessary information or to define small, specialized classes useful only for storing the specific data required. Instead, anonymous types enable support for types to be defined by the compiler—types that contain only the data needed for their immediate scenario. Other scenarios can have a different projection of only the properties needed for that scenario.


Beginner Topic: Deferred Execution with Query Expressions

Queries written using query expression notation exhibit deferred execution, just as the queries written in Chapter 14 did. Consider again the assignment of a query object to variable selection in Listing 15.1. The creation of the query and the assignment to the variable do not execute the query; rather, they simply build an object that represents the query. The method word.Contains("*") is not called when the query object is created. Rather, the query expression saves the selection criteria to be used when iterating over the collection identified by theselection variable.

To demonstrate this point, consider Listing 15.4 and the corresponding output (Output 15.3).

LISTING 15.4: Deferred Execution and Query Expressions (Example 1)


using System;
using System.Collections.Generic;
using System.Linq;

// ...

private static void ShowContextualKeywords2()
{
IEnumerable<string> selection = from word in Keywords
where IsKeyword(word)
select word;
Console.WriteLine("Query created.");
foreach (string keyword in selection)
{
// No space output here.
Console.Write(keyword);
}
}

// The side effect of console output is included
// in the predicate to demonstrate deferred execution;
// predicates with side effects are a poor practice in
// production code.
private static bool IsKeyword(string word)
{
if (word.Contains('*'))
{
Console.Write(" ");
return true;
}
else
{
return false;
}
}
// ...


OUTPUT 15.3

Query created.
add* alias* ascending* async* await* by* descending* dynamic*
equals* from* get* global* group* into* join* let* nameof* on*
orderby* partial* remove* select* set* value* var* where* yield*

In Listing 15.4, no space is output within the foreach loop. The side effect of printing a space when the predicate IsKeyword() is executed happens when the query is iterated over—not when the query is created. Thus, although selection is a collection (it is of typeIEnumerable<T> after all), at the time of assignment everything following the from clause comprises the selection criteria. Not until we begin to iterate over selection are the criteria applied.

Now consider a second example (see Listing 15.5 and Output 15.4).

LISTING 15.5: Deferred Execution and Query Expressions (Example 2)


using System;
using System.Collections.Generic;
using System.Linq;

// ...

private static void CountContextualKeywords()
{
int delegateInvocations = 0;
Func<string, string> func =
text=>
{
delegateInvocations++;
return text;
};

IEnumerable<string> selection =
from keyword in Keywords
where keyword.Contains('*')
select func(keyword);


Console.WriteLine(
$"1. delegateInvocations={ delegateInvocations }");

// Executing count should invoke func once for
// each item selected.
Console.WriteLine(
$"2. Contextual keyword count={ selection.Count() }");

Console.WriteLine(
$"3. delegateInvocations={ delegateInvocations }");

// Executing count should invoke func once for
// each item selected.
Console.WriteLine(
$"4. Contextual keyword count={ selection.Count() }");

Console.WriteLine(
$"5. delegateInvocations={ delegateInvocations }");

// Cache the value so future counts will not trigger
// another invocation of the query.
List<string> selectionCache = selection.ToList();

Console.WriteLine(
$"6. delegateInvocations={ delegateInvocations }");

// Retrieve the count from the cached collection.
Console.WriteLine(
$"7. selectionCache count={ selectionCache.Count() }");

Console.WriteLine(
$"8. delegateInvocations={ delegateInvocations }");

}

// ...


OUTPUT 15.4

1. delegateInvocations=0
2. Contextual keyword count=27
3. delegateInvocations=27
4. Contextual keyword count=27
5. delegateInvocations=54
6. delegateInvocations=81
7. selectionCache count=27
8. delegateInvocations=81

Rather than defining a separate method, Listing 15.5 uses a statement lambda that counts the number of times the method is called.

Three things in the output are remarkable. First, notice that after selection is assigned, DelegateInvocations remains at zero. At the time of assignment to selection, no iteration over Keywords is performed. If Keywords were a property, the property call would run—in other words, the from clause executes at the time of assignment. However, neither the projection, nor the filtering, nor anything after the from clause will execute until the code iterates over the values within selection. It is as though at the time of assignment,selection would more appropriately be called “query.”

Once we call Count(), however, a term such as selection or items that indicates a container or collection is appropriate because we begin to count the items within the collection. In other words, the variable selection serves a dual purpose of saving the query information and acting like a container from which the data is retrieved.

A second important characteristic to notice is that calling Count() twice causes func to again be invoked once on each item selected. Given that selection behaves both as a query and as a collection, requesting the count requires that the query be executed again by iterating over the IEnumerable<string> collection that selection refers to and counting the items. The C# compiler does not know whether anyone has modified the strings in the array such that the count would now be different, so the counting has to happen anew every time to ensure that the answer is correct and up-to-date. Similarly, a foreach loop over selection would trigger func to be called again for each item. The same is true of all the other extension methods provided via System.Linq.Enumerable.



Advanced Topic: Implementing Deferred Execution

Deferred execution is implemented by using delegates and expression trees. A delegate provides the ability to create and manipulate a reference to a method that contains an expression that can be invoked later. An expression tree similarly provides the ability to create and manipulate information about an expression that can be examined and manipulated later.

In Listing 15.5, the predicate expressions of the where clauses and the projection expressions of the select clauses are transformed by the compiler into expression lambdas, and then the lambdas are transformed into delegate creations. The result of the query expression is an object that holds onto references to these delegates. Only when the query results are iterated over does the query object actually execute the delegates.


Filtering

In Listing 15.1, we include a where clause that filters out reserved keywords but not contextual keywords. This where clause filters the collection “vertically”; if you think of the collection as a vertical list of items, the where clause makes that vertical list shorter so that the collection holds fewer items. The filter criteria are expressed with a predicate—a lambda expression that returns a bool such as word.Contains() (as in Listing 15.1) or File.GetLastWriteTime(file) < DateTime.Now.AddMonths(-1). The latter is shown in Listing 15.6, whose output appears in Output 15.5.

LISTING 15.6: Query Expression Filtering Using where


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...

static void FindMonthOldFiles(
string rootDirectory, string searchPattern)
{
IEnumerable<FileInfo> files =
from fileName in Directory.EnumerateFiles(
rootDirectory, searchPattern)
where File.GetLastWriteTime(fileName) <
DateTime.Now.AddMonths(-1)
select new FileInfo(fileName);

foreach (FileInfo file in files)
{
// As simplification, current directory is
// assumed to be a subdirectory of
// rootDirectory
string relativePath = file.FullName.Substring(
Environment.CurrentDirectory.Length);
Console.WriteLine(
$".{ relativePath } ({ file.LastWriteTime })");
}
}

// ...


OUTPUT 15.5

.\TestData\Bill.cs (8/10/2011 9:33:55 PM)
.\TestData\Contact.cs (8/19/2011 11:40:30 PM)
.\TestData\Employee.cs (8/17/2011 1:33:22 AM)
.\TestData\Person.cs (10/22/2011 10:00:03 PM)

Sorting

To order the items using a query expression, you can use the orderby clause, as shown in Listing 15.7.

LISTING 15.7: Sorting Using a Query Expression with an orderby Clause


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...
static void ListByFileSize1(
string rootDirectory, string searchPattern)
{
IEnumerable<string> fileNames =
from fileName in Directory.EnumerateFiles(
rootDirectory, searchPattern)
orderby (new FileInfo(fileName)).Length descending,
fileName
select fileName;

foreach (string fileName in fileNames)
{
Console.WriteLine(fileName);
}
}
// ...


Listing 15.7 uses the orderby clause to sort the files returned by Directory.GetFiles() first by file size in descending order, and then by filename in ascending order. Multiple sort criteria are separated by commas, such that first the items are ordered by size, and then, if the size is the same, they are ordered by filename. ascending and descending are contextual keywords indicating the sort order direction. Specifying the order as ascending or descending is optional; if the direction is omitted (as it is here on filename), the default is ascending.

The let Clause

Listing 15.8 includes a query that is very similar to the query in Listing 15.7, except that the type argument of IEnumerable<T> is FileInfo. Notice that there is a problem with this query: We have to redundantly create a FileInfo twice, in both the orderby clause and the selectclause.

LISTING 15.8: Projecting a FileInfo Collection and Sorting by File Size


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...
static void ListByFileSize2(
string rootDirectory, string searchPattern)
{
IEnumerable<FileInfo> files =
from fileName in Directory.EnumerateFiles(
rootDirectory, searchPattern)
orderby new FileInfo(fileName).Length, fileName
select new FileInfo(fileName);

foreach (FileInfo file in files)
{
// As a simplification, the current directory
// is assumed to be a subdirectory of
// rootDirectory
string relativePath = file.FullName.Substring(
Environment.CurrentDirectory.Length);
Console.WriteLine(
$".{ relativePath }({ file.Length })");
}
}
// ...


Unfortunately, although the end result is correct, Listing 15.8 ends up instantiating a FileInfo object twice for each item in the source collection, which seems wasteful and unnecessary. To avoid this kind of unnecessary and potentially expensive overhead, you can use a let clause, as demonstrated in Listing 15.9.

LISTING 15.9: Ordering the Results in a Query Expression


// ...
IEnumerable<FileInfo> files =
from fileName in Directory.EnumerateFiles(
rootDirectory, searchPattern)
let file = new FileInfo(fileName)
orderby file.Length, fileName
select file;
// ...


The let clause introduces a new range variable that can hold the value of an expression that is used throughout the remainder of the query expression. You can add as many let clauses as you like; simply add each as an additional clause to the query after the first from clause but before the final select/group by clause.

Grouping

A common data manipulation scenario is the grouping of related items. In SQL, this generally involves aggregating the items to produce a summary or total or other aggregate value. LINQ, however, is notably more expressive. LINQ expressions allow for individual items to be grouped into a series of subcollections, and those groups can then be associated with items in the collection being queried. For example, Listing 15.10 and Output 15.6 demonstrate how to group together the contextual keywords and the regular keywords.

LISTING 15.10: Grouping Together Query Results


using System;
using System.Collections.Generic;
using System.Linq;

// ...

private static void GroupKeywords1()
{
IEnumerable<IGrouping<bool, string>> selection =
from word in Keywords
group word by word.Contains('*');

foreach (IGrouping<bool, string> wordGroup
in selection)
{
Console.WriteLine(Environment.NewLine + "{0}:",
wordGroup.Key ?
"Contextual Keywords" : "Keywords");
foreach (string keyword in wordGroup)
{
Console.Write(" " +
(wordGroup.Key ?
keyword.Replace("*", null) : keyword));
}
}
}

// ...


OUTPUT 15.6

Keywords:
abstract as base bool break byte case catch char checked class
const continue decimal default delegate do double else enum event
explicit extern false finally fixed float for foreach goto if
implicit in int interface internal is lock long namespace new null
operator out override object params private protected public
readonly ref return sbyte sealed short sizeof stackalloc static
string struct switch this throw true try typeof uint ulong unsafe
ushort using virtual unchecked void volatile while
Contextual Keywords:
add alias ascending async await by descending dynamic equals from
get global group into join let nameof on orderby partial remove
select set value var where yield

There are several things to note in this listing. First, the query result is a sequence of elements of type IGrouping<bool, string>. The first type argument indicates that the “group key” expression following by was of type bool, and the second type argument indicates that the “group element” expression following group was of type string. That is, the query produces a sequence of groups where the Boolean key is the same for each string in the group.

Because a query with a group by clause produces a sequence of collections, the common pattern for iterating over the results is to create nested foreach loops. In Listing 15.10, the outer loop iterates over the groupings and prints out the type of keyword as a header. The nestedforeach loop prints each keyword in the group as an item below the header.

The result of this query expression is itself a sequence, which you can then query like any other sequence. Listing 15.11 and Output 15.7 show how to create an additional query that adds a projection onto a query that produces a sequence of groups. (The next section, on query continuations, shows a preferable syntax for adding more query clauses to a complete query.)

LISTING 15.11: Selecting an Anonymous Type Following the group Clause


using System;
using System.Collections.Generic;
using System.Linq;

// ...

private static void GroupKeywords1()
{
IEnumerable<IGrouping<bool, string>> keywordGroups =
from word in Keywords
group word by word.Contains('*');

var selection =
from groups in keywordGroups
select new
{
IsContextualKeyword = groups.Key,
Items = groups
};

foreach (var wordGroup in selection)
{
Console.WriteLine(Environment.NewLine + "{0}:",
wordGroup.IsContextualKeyword ?
"Contextual Keywords" : "Keywords");
foreach (var keyword in wordGroup.Items)
{
Console.Write(" " +
keyword.Replace("*", null));
}
}
}

// ...


OUTPUT 15.7

Keywords:
abstract as base bool break byte case catch char checked class
const continue decimal default delegate do double else enum
event explicit extern false finally fixed float for foreach goto if
implicit in int interface internal is lock long namespace new null
operator out override object params private protected public
readonly ref return sbyte sealed short sizeof stackalloc static
string struct switch this throw true try typeof uint ulong unsafe
ushort using virtual unchecked void volatile while
Contextual Keywords:
add alias ascending async await by descending dynamic equals from
get global group into join let nameof on orderby partial remove
select set value var where yield

The group clause results in a query that produces a collection of IGrouping<TKey, TElement> objects—just as the GroupBy() standard query operator did (see Chapter 14). The select clause in the subsequent query uses an anonymous type to effectively renameIGrouping<TKey, TElement>.Key to IsContextualKeyword and to name the subcollection property Items. With this change, the nested foreach loop uses wordGroup.Items rather than wordGroup directly, as shown in Listing 15.10. Another potential property to add to the anonymous type would be a count of the items within the subcollection. This functionality is already available through wordGroup.Items.Count(), so the benefit of adding it to the anonymous type directly is questionable.

Query Continuation with into

As we saw in Listing 15.11, you can use an existing query as the input to a second query. However, it is not necessary to write an entirely new query expression when you want to use the results of one query as the input to another. You can extend any query with a query continuation clauseusing the contextual keyword into. A query continuation is nothing more than syntactic sugar for creating two queries and using the first as the input to the second. The range variable introduced by the into clause (groups in Listing 15.11) becomes the range variable for the remainder of the query; any previous range variables are logically a part of the earlier query and cannot be used in the query continuation. Listing 15.12 shows how to rewrite the code of Listing 15.11 to use a query continuation instead of two queries.

LISTING 15.12: Selecting without the Query Continuation


using System;
using System.Collections.Generic;
using System.Linq;

// ...

private static void GroupKeywords1()
{
var selection =
from word in Keywords
group word by word.Contains('*')
into groups
select new
{
IsContextualKeyword = groups.Key,
Items = groups
};

// ...

}

// ...


The ability to run additional queries on the results of an existing query using into is not specific to queries ending with group clauses, but rather can be applied to all query expressions. Query continuation is simply a shorthand for writing query expressions that consume the results of other query expressions. You can think of into as a “pipeline operator,” because it “pipes” the results of the first query into the second query. You can arbitrarily chain together many queries in this way.

“Flattening” Sequences of Sequences with Multiple from Clauses

It is often desirable to “flatten” a sequence of sequences into a single sequence. For example, each member of a sequence of customers might have an associated sequence of orders, or each member of a sequence of directories might have an associated sequence of files. The SelectManysequence operator (discussed in Chapter 14) concatenates together all the subsequences; to do the same thing with query expression syntax, you can use multiple from clauses, as shown in Listing 15.13.

LISTING 15.13: Multiple Selection



var selection =
from word in Keywords
from character in word
select character;


The preceding query will produce the sequence of characters a, b, s, t, r, a, c, t, a, d, d, *, a, l, i, a, ....

Multiple from clauses can also be used to produce the Cartesian product—the set of all possible combinations of several sequences—as shown in Listing 15.14.

LISTING 15.14: Cartesian Product


var numbers = new[] { 1, 2, 3 };
var product =
from word in Keywords
from number in numbers
select new {word, number};


This would produce a sequence of pairs (abstract, 1), (abstract, 2), (abstract, 3), (as, 1), (as, 2), ....


Beginner Topic: Distinct Members

Often, it is desirable to return only distinct (that is, unique) items from within a collection, discarding any duplicates. Query expressions do not have explicit syntax for distinct members, but the functionality is available via the query operator Distinct(), which was introduced inChapter 14. To apply a query operator to a query expression, the expression must be enclosed in parentheses so that the compiler does not think that the call to Distinct() is a part of the select clause. Listing 15.15 gives an example; Output 15.8 shows the results.

LISTING 15.15: Obtaining Distinct Members from a Query Expression


using System;
using System.Collections.Generic;
using System.Linq;

// ...

public static void ListMemberNames()
{
IEnumerable<string> enumerableMethodNames = (
from method in typeof(Enumerable).GetMembers(
System.Reflection.BindingFlags.Static |
System.Reflection.BindingFlags.Public)
orderby method.Name
select method.Name).Distinct();
foreach(string method in enumerableMethodNames)
{
Console.Write($"{ method }, ");
}
}

// ...


OUTPUT 15.8

Aggregate, All, Any, AsEnumerable, Average, Cast, Concat, Contains,
Count, DefaultIfEmpty, Distinct, ElementAt, ElementAtOrDefault,
Empty, Except, First, FirstOrDefault, GroupBy, GroupJoin,
Intersect, Join, Last, LastOrDefault, LongCount, Max, Min, OfType,
OrderBy, OrderByDescending, Range, Repeat, Reverse, Select,
SelectMany, SequenceEqual, Single, SingleOrDefault, Skip,
SkipWhile, Sum, Take, TakeWhile, ThenBy, ThenByDescending, ToArray,
ToDictionary, ToList, ToLookup, Union, Where, Zip,

In this example, typeof(Enumerable).GetMembers() returns a list of all the members (methods, properties, and so on) on System.Linq.Enumerable. However, many of these members are overloaded, sometimes more than once. Rather than displaying the same member multiple times, Distinct() is called from the query expression. This eliminates the duplicate names from the list. (We cover the details of typeof() and reflection [where methods like GetMembers() are available] in Chapter 17.)


Query Expressions Are Just Method Invocations

Somewhat surprisingly, adding query expressions to C# 3.0 required no changes to the CLR or to the CIL language. Rather, the C# compiler simply translates query expressions into a series of method calls. Consider, for example, the query expression from Listing 15.1, a portion of which appears in Listing 15.16.

LISTING 15.16: Simple Query Expression


private static void ShowContextualKeywords1()
{
IEnumerable<string> selection =
from word in Keywords
where word.Contains('*')
select word;
// ...
}

// ...


After compilation, the expression from Listing 15.16 is converted to an IEnumerable<T> extension method call from System.Linq.Enumerable, as shown in Listing 15.17.

LISTING 15.17: Query Expression Translated to Standard Query Operator Syntax


private static void ShowContextualKeywords3()
{
IEnumerable<string> selection =
Keywords.Where(word => word.Contains('*'));

// ...
}

// ...


As discussed in Chapter 14, the lambda expression is then itself translated by the compiler to emit a method with the body of the lambda, and the usage of it becomes allocation of a delegate to that method.

Every query expression can (and must) be translated into method calls, but not every sequence of method calls has a corresponding query expression. For example, there is no query expression equivalent for the extension method TakeWhile<T>(Func<T, bool> predicate), which repeatedly returns items from the collection as long as the predicate returns true.

For those queries that do have both a method call form and a query expression form, which is better? This is a judgment call; some queries are better suited for query expressions, whereas others are more readable as method invocations.


Guidelines

DO use query expression syntax to make queries easier to read, particularly if they involve complex from, let, join, or group clauses.

CONSIDER using the standard query operators (method call form) if the query involves operations that do not have a query expression syntax, such as Count(), TakeWhile(), or Distinct().


Summary

This chapter introduced a new syntax—namely, query expressions. Readers familiar with SQL will immediately see the similarities between query expressions and SQL. However, query expressions also introduce additional functionality, such as grouping into a hierarchical set of new objects, which is unavailable with SQL. All of the functionality of query expressions was already available via standard query operators, but query expressions frequently provide a simpler syntax for expressing such a query. Whether through standard query operators or query expression syntax, however, the end result is a significant improvement in the way developers can code against collection APIs—an improvement that ultimately provides a paradigm shift in the way object-oriented languages are able to interface with relational databases.

In the next chapter, we continue our discussion of collections, by investigating some of the .NET Framework collection types and exploring how to define custom collections.

End 3.0