LINQ to Objects - Advanced C# Programming - C# 6.0 and the .NET 4.6 Framework (2015)

C# 6.0 and the .NET 4.6 Framework (2015)

PART IV

image

Advanced C# Programming

CHAPTER 12

image

LINQ to Objects

Regardless of the type of application you are creating using the .NET platform, your program will certainly need to access some form of data as it executes. To be sure, data can be found in numerous locations, including XML files, relational databases, in-memory collections, and primitive arrays. Historically speaking, based on the location of said data, programmers needed to make use of different and unrelated APIs. The Language Integrated Query (LINQ) technology set, introduced initially in .NET 3.5, provides a concise, symmetrical, and strongly typed manner to access a wide variety of data stores. In this chapter, you will begin your investigation of LINQ by focusing on LINQ to Objects.

Before you dive into LINQ to Objects proper, the first part of this chapter quickly reviews the key C# programming constructs that enable LINQ. As you work through this chapter, you will find that implicitly typed local variables, object initialization syntax, lambda expressions, extension methods, and anonymous types will be quite useful (if not occasionally mandatory).

After this supporting infrastructure is reviewed, the remainder of the chapter will introduce you to the LINQ programming model and its role in the .NET platform. Here, you will come to learn the role of query operators and query expressions, which allow you to define statements that will interrogate a data source to yield the requested result set. Along the way, you will build numerous LINQ examples that interact with data contained within arrays as well as various collection types (both generic and nongeneric) and understand the assemblies, namespaces, and types that represent the LINQ to Objects API.

Image Note The information in this chapter is the foundation for future chapters of the book that examine additional LINQ technologies, including LINQ to XML (Chapter 24), Parallel LINQ (Chapter 19), and LINQ to Entities (Chapter 23).

LINQ-Specific Programming Constructs

From a high level, LINQ can be understood as a strongly typed query language, embedded directly into the grammar of C#. Using LINQ, you can build any number of expressions that have a look and feel similar to that of a database SQL query. However, a LINQ query can be applied to any number of data stores, including stores that have nothing to do with a literal relational database.

Image Note Although LINQ queries look similar to SQL queries, the syntax is not identical. In fact, many LINQ queries seem to be the exact opposite format of a similar database query! If you attempt to map LINQ directly to SQL, you will surely become frustrated. To keep your sanity, I recommend you try your best to regard LINQ queries as unique statements, which just “happen to look” similar to SQL.

When LINQ was first introduced to the .NET platform in version 3.5, the C# and VB languages were each expanded with a large number of new programming constructs used to support the LINQ technology set. Specifically, the C# language uses the following core LINQ-centric features:

· Implicitly typed local variables

· Object/collection initialization syntax

· Lambda expressions

· Extension methods

· Anonymous types

These features have already been explored in detail within various chapters of the text. However, to get the ball rolling, let’s quickly review each feature in turn, just to make sure we are all in the proper mind-set.

Image Note Because the following sections are reviews of material covered elsewhere in the book, I have not included a C# code project for this content.

Implicit Typing of Local Variables

In Chapter 3, you learned about the var keyword of C#. This keyword allows you to define a local variable without explicitly specifying the underlying data type. The variable, however, is strongly typed, as the compiler will determine the correct data type based on the initial assignment. Recall the following code example from Chapter 3:

static void DeclareImplicitVars()
{
// Implicitly typed local variables.
var myInt = 0;
var myBool = true;
var myString = "Time, marches on...";

// Print out the underlying type.
Console.WriteLine("myInt is a: {0}", myInt.GetType().Name);
Console.WriteLine("myBool is a: {0}", myBool.GetType().Name);
Console.WriteLine("myString is a: {0}", myString.GetType().Name);
}

This language feature is helpful, and often mandatory, when using LINQ. As you will see during this chapter, many LINQ queries will return a sequence of data types, which are not known until compile time. Given that the underlying data type is not known until the application is compiled, you obviously can’t declare a variable explicitly!

Object and Collection Initialization Syntax

Chapter 5 explored the role of object initialization syntax, which allows you to create a class or structure variable and to set any number of its public properties in one fell swoop. The end result is a compact (yet still easy on the eyes) syntax that can be used to get your objects ready for use. Also recall from Chapter 9, the C# language allows you to use a similar syntax to initialize collections of objects. Consider the following code snippet, which uses collection initialization syntax to fill a List<T> of Rectangle objects, each of which maintains two Point objects to represent an (x,y) position:

List<Rectangle> myListOfRects = new List<Rectangle>
{
new Rectangle {TopLeft = new Point { X = 10, Y = 10 },
BottomRight = new Point { X = 200, Y = 200}},
new Rectangle {TopLeft = new Point { X = 2, Y = 2 },
BottomRight = new Point { X = 100, Y = 100}},
new Rectangle {TopLeft = new Point { X = 5, Y = 5 },
BottomRight = new Point { X = 90, Y = 75}}
};

While you are never required to use collection/object initialization syntax, doing so results in a more compact code base. Furthermore, this syntax, when combined with implicit typing of local variables, allows you to declare an anonymous type, which is useful when creating a LINQ projection. You’ll learn about LINQ projections later in this chapter.

Lambda Expressions

The C# lambda operator (=>) was fully explored in Chapter 10. Recall that this operator allows you to build a lambda expression, which can be used any time you invoke a method that requires a strongly typed delegate as an argument. Lambdas greatly simplify how you work with .NET delegates, in that they reduce the amount of code you have to author by hand. Recall that a lambda expression can be broken down into the following usage:

( ArgumentsToProcess ) => { StatementsToProcessThem }

In Chapter 10, I walked you through how to interact with the FindAll() method of the generic List<T> class using three different approaches. After working with the raw Predicate<T> delegate and a C# anonymous method, you eventually arrived with the following (extremely concise) iteration that used the following lambda expression:

static void LambdaExpressionSyntax()
{
// Make a list of integers.
List<int> list = new List<int>();
list.AddRange(new int[] { 20, 1, 4, 8, 9, 44 });

// C# lambda expression.
List<int> evenNumbers = list.FindAll(i => (i % 2) == 0);

Console.WriteLine("Here are your even numbers:");
foreach (int evenNumber in evenNumbers)
{
Console.Write("{0}\t", evenNumber);
}
Console.WriteLine();
}

Lambdas will be useful when working with the underlying object model of LINQ. As you will soon find out, the C# LINQ query operators are simply a shorthand notation for calling true-blue methods on a class named System.Linq.Enumerable. These methods typically always require delegates (the Func<> delegate in particular) as parameters, which are used to process your data to yield the correct result set. Using lambdas, you can streamline your code and allow the compiler to infer the underlying delegate.

Extension Methods

C# extension methods allow you to tack on new functionality to existing classes without the need to subclass. As well, extension methods allow you to add new functionality to sealed classes and structures, which could never be subclassed in the first place. Recall from Chapter 11, when you author an extension method, the first parameter is qualified with the this keyword and marks the type being extended. Also recall that extension methods must always be defined within a static class and must, therefore, also be declared using the static keyword. Here’s an example:

namespace MyExtensions
{
static class ObjectExtensions
{
// Define an extension method to System.Object.
public static void DisplayDefiningAssembly(this object obj)
{
Console.WriteLine("{0} lives here:\n\t->{1}\n", obj.GetType().Name,
Assembly.GetAssembly(obj.GetType()));
}
}
}

To use this extension, an application must import the namespace defining the extension (and possibly add a reference to the external assembly). At this point, simply import the defining namespace and code away.

static void Main(string[] args)
{
// Since everything extends System.Object, all classes and structures
// can use this extension.
int myInt = 12345678;
myInt.DisplayDefiningAssembly();

System.Data.DataSet d = new System.Data.DataSet();
d.DisplayDefiningAssembly();
Console.ReadLine();
}

When you are working with LINQ, you will seldom, if ever, be required to manually build your own extension methods. However, as you create LINQ query expressions, you will actually be making use of numerous extension methods already defined by Microsoft. In fact, each C# LINQ query operator is a shorthand notation for making a manual call on an underlying extension method, typically defined by the System.Linq.Enumerable utility class.

Anonymous Types

The final C# language feature I’d like to quickly review is that of anonymous types, which was explored in Chapter 11. This feature can be used to quickly model the “shape” of data by allowing the compiler to generate a new class definition at compile time, based on a supplied set of name-value pairs. Recall that this type will be composed using value-based semantics, and each virtual method of System.Object will be overridden accordingly. To define an anonymous type, declare an implicitly typed variable and specify the data’s shape using object initialization syntax.

// Make an anonymous type that is composed of another.
var purchaseItem = new {
TimeBought = DateTime.Now,
ItemBought = new {Color = "Red", Make = "Saab", CurrentSpeed = 55},
Price = 34.000};

LINQ makes frequent use of anonymous types when you want to project new forms of data on the fly. For example, assume you have a collection of Person objects and want to use LINQ to obtain information on the age and Social Security number of each. Using a LINQ projection, you can allow the compiler to generate a new anonymous type that contains your information.

Understanding the Role of LINQ

That wraps up the quick review of the C# language features that allow LINQ to work its magic. However, why have LINQ in the first place? Well, as software developers, it is hard to deny that the vast majority of our programming time is spent obtaining and manipulating data. When speaking of “data,” it is easy to immediately envision information contained within relational databases. However, another popular location for data is within XML documents or simple text files.

Data can be found in numerous places beyond these two common homes for information. For instance, say you have an array or generic List<T> type containing 300 integers and you want to obtain a subset that meets a given criterion (e.g., only the odd or even members in the container, only prime numbers, only nonrepeating numbers greater than 50). Or perhaps you are making use of the reflection APIs and need to obtain only metadata descriptions for each class deriving from a particular parent class within an array of Types. Indeed, data is everywhere.

Prior to .NET 3.5, interacting with a particular flavor of data required programmers to use very diverse APIs. Consider, for example, Table 12-1, which illustrates several common APIs used to access various types of data (I’m sure you can think of many other examples).

Table 12-1. Ways to Manipulate Various Types of Data

The Data You Want

How to Obtain It

Relational data

System.Data.dll, System.Data.SqlClient.dll, and so on

XML document data

System.Xml.dll

Metadata tables

The System.Reflection namespace

Collections of objects

System.Array and the System.Collections/System.Collections.Generic namespaces

Of course, nothing is wrong with these approaches to data manipulation. In fact, you can (and will) certainly make direct use of ADO.NET, the XML namespaces, reflection services, and the various collection types. However, the basic problem is that each of these APIs is an island unto itself, which offers little in the way of integration. True, it is possible (for example) to save an ADO.NET DataSet as XML and then manipulate it via the System.Xml namespaces, but nonetheless, data manipulation remains rather asymmetrical.

The LINQ API is an attempt to provide a consistent, symmetrical manner in which programmers can obtain and manipulate “data” in the broad sense of the term. Using LINQ, you are able to create directly within the C# programming language constructs called query expressions. These query expressions are based on numerous query operators that have been intentionally designed to look and feel similar (but not quite identical) to a SQL expression.

The twist, however, is that a query expression can be used to interact with numerous types of data—even data that has nothing to do with a relational database. Strictly speaking, “LINQ” is the term used to describe this overall approach to data access. However, based on where you are applying your LINQ queries, you will encounter various terms, such as the following:

· LINQ to Objects: This term refers to the act of applying LINQ queries to arrays and collections.

· LINQ to XML: This term refers to the act of using LINQ to manipulate and query XML documents.

· LINQ to DataSet: This term refers to the act of applying LINQ queries to ADO.NET DataSet objects.

· LINQ to Entities: This aspect of LINQ allows you to make use of LINQ queries within the ADO.NET Entity Framework (EF) API.

· Parallel LINQ (aka PLINQ): This allows for parallel processing of data returned from a LINQ query.

Today, LINQ is an integral part of the .NET base class libraries, managed languages, and Visual Studio itself.

LINQ Expressions Are Strongly Typed

It is also important to point out that a LINQ query expression (unlike a traditional SQL statement) is strongly typed. Therefore, the C# compiler will keep you honest and make sure that these expressions are syntactically well-formed. Tools such as Visual Studio can use metadata for useful features such as IntelliSense, autocompletion, and so forth.

The Core LINQ Assemblies

As mentioned in Chapter 2, the New Project dialog of Visual Studio has the option of selecting which version of the .NET platform you want to compile against. When you opt to compile against .NET 3.5 or higher, each of the project templates will automatically reference the key LINQ assemblies, which can be viewed using the Solution Explorer. Table 12-2 documents the role of the key LINQ assemblies. However, you will encounter additional LINQ libraries over the remainder of this book.

Table 12-2. Core LINQ-Centric Assemblies

Assembly

Meaning in Life

System.Core.dll

Defines the types that represent the core LINQ API. This is the one assembly you must have access to if you want to use any LINQ API, including LINQ to Objects.

System.Data.DataSetExtensions.dll

Defines a handful of types to integrate ADO.NET types into the LINQ programming paradigm (LINQ to DataSet).

System.Xml.Linq.dll

Provides functionality for using LINQ with XML document data (LINQ to XML).

To work with LINQ to Objects, you must make sure that every C# code file that contains LINQ queries imports the System.Linq namespace (primarily defined within System.Core.dll). If you do not do so, you will run into a number of problems. As a good rule of thumb, if you see a compiler error looking similar to this:

Error 1 Could not find an implementation of the query pattern for source type ’int[]’. ’Where’ not found. Are you missing a reference to ’System.Core.dll’ or a using directive for ’System.Linq’?

the chances are extremely good that your C# file does not have the following using directive:

using System.Linq;

Applying LINQ Queries to Primitive Arrays

To begin examining LINQ to Objects, let’s build an application that will apply LINQ queries to various array objects. Create a Console Application project named LinqOverArray, and define a static helper method within the Program class named QueryOverStrings(). In this method, create a string array containing six or so items of your liking (here I listed a batch of video games in my library). Make sure to have at least two entries that contain numerical values and a few that have embedded spaces.

static void QueryOverStrings()
{
// Assume we have an array of strings.
string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};
}

Now, update Main() to invoke QueryOverStrings().

static void Main(string[] args)
{
Console.WriteLine("***** Fun with LINQ to Objects *****\n");
QueryOverStrings();
Console.ReadLine();
}

When you have any array of data, it is common to extract a subset of items based on a given requirement. Maybe you want to obtain only the subitems that contain a number (e.g., System Shock 2, Uncharted 2, and Fallout 3), have more or less than some number of characters, or don’t contain embedded spaces (e.g., Morrowind or Daxter). While you could certainly perform such tasks using members of the System.Array type and a bit of elbow grease, LINQ query expressions can greatly simplify the process.

Going on the assumption that you want to obtain from the array only items that contain an embedded blank space and you want these items listed in alphabetical order, you could build the following LINQ query expression:

static void QueryOverStrings()
{
// Assume we have an array of strings.
string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

// Build a query expression to find the items in the array
// that have an embedded space.
IEnumerable<string> subset = from g in currentVideoGames
where g.Contains(" ") orderby g select g;

// Print out the results.
foreach (string s in subset)
Console.WriteLine("Item: {0}", s);
}

Notice that the query expression created here makes use of the from, in, where, orderby, and select LINQ query operators. You will dig into the formalities of query expression syntax later in this chapter. However, even now you should be able to read this statement roughly as “Give me the items inside of currentVideoGames that contain a space, ordered alphabetically.”

Here, each item that matches the search criteria has been given the name g (as in “game”); however, any valid C# variable name would do:

IEnumerable<string> subset = from game in currentVideoGames
where game.Contains(" ") orderby
game select game;

Notice that the returned sequence is held in a variable named subset, typed as a type that implements the generic version of IEnumerable<T>, where T is of type System.String (after all, you are querying an array of strings). After you obtain the result set, you then simply print out each item using a standard foreach construct. If you run your application, you will find the following output:

***** Fun with LINQ to Objects *****
Item: Fallout 3
Item: System Shock 2
Item: Uncharted 2

Once Again, Without LINQ

To be sure, LINQ is never mandatory. If you so choose, you could have found the same result set by forgoing LINQ altogether and making use of programming primitives such as if statements and for loops. Here is a method that yields the same result as the QueryOverStrings()method but in a much more verbose manner:

static void QueryOverStringsLongHand()
{
// Assume we have an array of strings.
string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

string[] gamesWithSpaces = new string[5];

for (int i = 0; i < currentVideoGames.Length; i++)
{
if (currentVideoGames[i].Contains(" "))
gamesWithSpaces[i] = currentVideoGames[i];
}

// Now sort them.
Array.Sort(gamesWithSpaces);

// Print out the results.
foreach (string s in gamesWithSpaces)
{
if( s != null)
Console.WriteLine("Item: {0}", s);
}
Console.WriteLine();
}

While I am sure you can think of ways to tweak the previous method, the fact remains that LINQ queries can be used to radically simplify the process of extracting new subsets of data from a source. Rather than building nested loops, complex if/else logic, temporary data types, and so on, the C# compiler will perform the dirty work on your behalf, once you create a fitting LINQ query.

Reflecting over a LINQ Result Set

Now, assume the Program class defines an additional helper function named ReflectOverQueryResults() that will print out various details of the LINQ result set (note the parameter is a System.Object, to account for multiple types of result sets).

static void ReflectOverQueryResults(object resultSet)
{
Console.WriteLine("***** Info about your query *****");
Console.WriteLine("resultSet is of type: {0}", resultSet.GetType().Name);
Console.WriteLine("resultSet location: {0}",
resultSet.GetType().Assembly.GetName().Name);
}

Assuming you have called this method within QueryOverStrings() directly after printing out the obtained subset, if you run the application, you will see the subset is really an instance of the generic OrderedEnumerable<TElement, TKey> type (represented in terms of CIL code as OrderedEnumerable`2), which is an internal abstract type residing in the System.Core.dll assembly.

***** Info about your query *****

resultSet is of type: OrderedEnumerable`2
resultSet location: System.Core

Image Note Many of the types that represent a LINQ result are hidden by the Visual Studio object browser. These are low-level types not intended for direct use in your applications.

LINQ and Implicitly Typed Local Variables

While the current sample program makes it relatively easy to determine that the result set can be captured as an enumeration of string object (e.g., IEnumerable<string>), I would guess that it is not clear that subset is really of type OrderedEnumerable<TElement, TKey>.

Given that LINQ result sets can be represented using a good number of types in various LINQ-centric namespaces, it would be tedious to define the proper type to hold a result set, because in many cases the underlying type may not be obvious or even directly accessible from your code base (and as you will see, in some cases the type is generated at compile time).

To further accentuate this point, consider the following additional helper method defined within the Program class (which I assume you will invoke from within the Main() method):

static void QueryOverInts()
{
int[] numbers = {10, 20, 30, 40, 1, 2, 3, 8};

// Print only items less than 10.
IEnumerable<int> subset = from i in numbers where i < 10 select i;

foreach (int i in subset)
Console.WriteLine("Item: {0}", i);
ReflectOverQueryResults(subset);
}

In this case, the subset variable is a completely different underlying type. This time, the type implementing the IEnumerable<int> interface is a low-level class named WhereArrayIterator<T>.

Item: 1
Item: 2
Item: 3
Item: 8

***** Info about your query *****
resultSet is of type: WhereArrayIterator`1
resultSet location: System.Core

Given that the exact underlying type of a LINQ query is certainly not obvious, these first examples have represented the query results as an IEnumerable<T> variable, where T is the type of data in the returned sequence (string, int, etc.). However, this is still rather cumbersome. To add insult to injury, given that IEnumerable<T> extends the nongeneric IEnumerable interface, it would also be permissible to capture the result of a LINQ query as follows:

System.Collections.IEnumerable subset =
from i in numbers where i < 10 select i;

Thankfully, implicit typing cleans things up considerably when working with LINQ queries.

static void QueryOverInts()
{
int[] numbers = {10, 20, 30, 40, 1, 2, 3, 8};

// Use implicit typing here...
var subset = from i in numbers where i < 10 select i;

// ...and here.
foreach (var i in subset)
Console.WriteLine("Item: {0} ", i);
ReflectOverQueryResults(subset);
}

As a rule of thumb, you will always want to make use of implicit typing when capturing the results of a LINQ query. Just remember, however, that (in a vast majority of cases) the real return value is a type implementing the generic IEnumerable<T> interface.

Exactly what this type is under the covers (OrderedEnumerable<TElement, TKey>, WhereArrayIterator<T>, etc.) is irrelevant and not necessary to discover. As seen in the previous code example, you can simply use the var keyword within a foreach construct to iterate over the fetched data.

LINQ and Extension Methods

Although the current example does not have you author any extension methods directly, you are in fact using them seamlessly in the background. LINQ query expressions can be used to iterate over data containers that implement the generic IEnumerable<T> interface. However, the .NETSystem.Array class type (used to represent the array of strings and array of integers) does not implement this contract.

// The System.Array type does not seem to implement the correct
// infrastructure for query expressions!
public abstract class Array : ICloneable, IList, ICollection,
IEnumerable, IStructuralComparable, IStructuralEquatable
{
...
}

While System.Array does not directly implement the IEnumerable<T> interface, it indirectly gains the required functionality of this type (as well as many other LINQ-centric members) via the static System.Linq.Enumerable class type.

This utility class defines a good number of generic extension methods (such as Aggregate<T>(), First<T>(), Max<T>(), etc.), which System.Array (and other types) acquire in the background. Thus, if you apply the dot operator on the currentVideoGames local variable, you will find a good number of members not found within the formal definition of System.Array (see Figure 12-1).

image

Figure 12-1. The System.Array type has been extended with members of System.Linq.Enumerable

The Role of Deferred Execution

Another important point regarding LINQ query expressions is that they are not actually evaluated until you iterate over the sequence. Formally speaking, this is termed deferred execution. The benefit of this approach is that you are able to apply the same LINQ query multiple times to the same container and rest assured you are obtaining the latest and greatest results. Consider the following update to the QueryOverInts() method:

static void QueryOverInts()
{
int[] numbers = { 10, 20, 30, 40, 1, 2, 3, 8 };

// Get numbers less than ten.
var subset = from i in numbers where i < 10 select i;

// LINQ statement evaluated here!
foreach (var i in subset)
Console.WriteLine("{0} < 10", i);
Console.WriteLine();
// Change some data in the array.
numbers[0] = 4;

// Evaluated again!
foreach (var j in subset)
Console.WriteLine("{0} < 10", j);

Console.WriteLine();
ReflectOverQueryResults(subset);
}

If you were to execute the program yet again, you would find the following output. Notice that the second time you iterate over the requested sequence, you find an additional member, as you set the first item in the array to be a value less than ten.

1 < 10
2 < 10
3 < 10
8 < 10

4 < 10
1 < 10
2 < 10
3 < 10
8 < 10

One useful aspect of Visual Studio is that if you set a breakpoint before the evaluation of a LINQ query, you are able to view the contents during a debugging session. Simply locate your mouse cursor above the LINQ result set variable (subset in Figure 12-2). When you do, you will be given the option of evaluating the query at that time by expanding the Results View option.

image

Figure 12-2. Debugging LINQ expressions

The Role of Immediate Execution

When you need to evaluate a LINQ expression from outside the confines of foreach logic, you are able to call any number of extension methods defined by the Enumerable type such as ToArray<T>(), ToDictionary<TSource,TKey>(), and ToList<T>(). These methods will cause a LINQ query to execute at the exact moment you call them, to obtain a snapshot of the data. After you have done so, the snapshot of data may be independently manipulated.

static void ImmediateExecution()
{
int[] numbers = { 10, 20, 30, 40, 1, 2, 3, 8 };

// Get data RIGHT NOW as int[].
int[] subsetAsIntArray =
(from i in numbers where i < 10 select i).ToArray<int>();

// Get data RIGHT NOW as List<int>.
List<int> subsetAsListOfInts =
(from i in numbers where i < 10 select i).ToList<int>();
}

Notice that the entire LINQ expression is wrapped within parentheses to cast it into the correct underlying type (whatever that might be) in order to call the extension methods of Enumerable.

Also recall from Chapter 9 that when the C# compiler can unambiguously determine the type parameter of a generic, you are not required to specify the type parameter. Thus, you could also call ToArray<T>() (or ToList<T>() for that matter) as follows:

int[] subsetAsIntArray =
(from i in numbers where i < 10 select i).ToArray();

The usefulness of immediate execution is obvious when you need to return the results of a LINQ query to an external caller. And, as luck would have it, this happens to be the next topic of this chapter.

Image Source Code The LinqOverArray project can be found in the Chapter 12 subdirectory.

Returning the Result of a LINQ Query

It is possible to define a field within a class (or structure) whose value is the result of a LINQ query. To do so, however, you cannot make use of implicit typing (as the var keyword cannot be used for fields), and the target of the LINQ query cannot be instance-level data; therefore, it must be static. Given these limitations, you will seldom need to author code like the following:

class LINQBasedFieldsAreClunky
{
private static string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

// Can’t use implicit typing here! Must know type of subset!
private IEnumerable<string> subset = from g in currentVideoGames
where g.Contains(" ") orderby g select g;

public void PrintGames()
{
foreach (var item in subset)
{
Console.WriteLine(item);
}
}
}

More often than not, LINQ queries are defined within the scope of a method or property. Moreover, to simplify your programming, the variable used to hold the result set will be stored in an implicitly typed local variable using the var keyword. Now, recall from Chapter 3 that implicitly typed variables cannot be used to define parameters, return values, or fields of a class or structure.

Given this point, you might wonder exactly how you could return a query result to an external caller. The answer is, it depends. If you have a result set consisting of strongly typed data, such as an array of strings or a List<T> of Cars, you could abandon the use of the var keyword and use a proper IEnumerable<T> or IEnumerable type (again, as IEnumerable<T> extends IEnumerable). Consider the following example for a new Console Application named LinqRetValues:

class Program
{
static void Main(string[] args)
{
Console.WriteLine("***** LINQ Return Values *****\n");
IEnumerable<string> subset = GetStringSubset();

foreach (string item in subset)
{
Console.WriteLine(item);
}

Console.ReadLine();
}
static IEnumerable<string> GetStringSubset()
{
string[] colors = {"Light Red", "Green",
"Yellow", "Dark Red", "Red", "Purple"};

// Note subset is an IEnumerable<string>-compatible object.
IEnumerable<string> theRedColors = from c in colors
where c.Contains("Red") select c;

return theRedColors;
}
}

The results are as expected:

Light Red
Dark Red
Red

Returning LINQ Results via Immediate Execution

This example works as expected, only because the return value of the GetStringSubset() and the LINQ query within this method has been strongly typed. If you used the var keyword to define the subset variable, it would be permissible to return the value only if the method is still prototyped to return IEnumerable<string> (and if the implicitly typed local variable is in fact compatible with the specified return type).

Because it is a bit inconvenient to operate on IEnumerable<T>, you could make use of immediate execution. For example, rather than returning IEnumerable<string>, you could simply return a string[], provided that you transform the sequence to a strongly typed array. Consider this new method of the Program class, which does this very thing:

static string[] GetStringSubsetAsArray()
{
string[] colors = {"Light Red", "Green",
"Yellow", "Dark Red", "Red", "Purple"};

var theRedColors = from c in colors
where c.Contains("Red") select c;

// Map results into an array.
return theRedColors.ToArray();
}

With this, the caller can be blissfully unaware that their result came from a LINQ query and simply work with the array of strings as expected. Here’s an example:

foreach (string item in GetStringSubsetAsArray())
{
Console.WriteLine(item);
}

Immediate execution is also critical when attempting to return to the caller the results of a LINQ projection. You’ll examine this topic a bit later in the chapter. Next up, let’s look at how to apply LINQ queries to generic and nongeneric collection objects.

Image Source Code The LinqRetValues project can be found in the Chapter 12 subdirectory.

Applying LINQ Queries to Collection Objects

Beyond pulling results from a simple array of data, LINQ query expressions can also manipulate data within members of the System.Collections.Generic namespace, such as the List<T> type. Create a new Console Application project named LinqOverCollections, and define a basic Car class that maintains a current speed, color, make, and pet name, as shown in the following code:

class Car
{
public string PetName {get; set;} = "";
public string Color {get; set;} = "";
public int Speed {get; set;}
public string Make {get; set;} = "";
}

Now, within your Main() method, define a local List<T> variable of type Car, and make use of object initialization syntax to fill the list with a handful of new Car objects.

static void Main(string[] args)
{
Console.WriteLine("***** LINQ over Generic Collections *****\n");

// Make a List<> of Car objects.
List<Car> myCars = new List<Car>() {
new Car{ PetName = "Henry", Color = "Silver", Speed = 100, Make = "BMW"},
new Car{ PetName = "Daisy", Color = "Tan", Speed = 90, Make = "BMW"},
new Car{ PetName = "Mary", Color = "Black", Speed = 55, Make = "VW"},
new Car{ PetName = "Clunker", Color = "Rust", Speed = 5, Make = "Yugo"},
new Car{ PetName = "Melvin", Color = "White", Speed = 43, Make = "Ford"}
};

Console.ReadLine();
}

Accessing Contained Subobjects

Applying a LINQ query to a generic container is no different from doing so with a simple array, as LINQ to Objects can be used on any type implementing IEnumerable<T>. This time, your goal is to build a query expression to select only the Car objects within the myCars list, where the speed is greater than 55.

After you get the subset, you will print out the name of each Car object by calling the PetName property. Assume you have the following helper method (taking a List<Car> parameter), which is called from within Main():

static void GetFastCars(List<Car> myCars)
{
// Find all Car objects in the List<>, where the Speed is
// greater than 55.
var fastCars = from c in myCars where c.Speed > 55 select c;

foreach (var car in fastCars)
{
Console.WriteLine("{0} is going too fast!", car.PetName);
}
}

Notice that your query expression is grabbing only those items from the List<T> where the Speed property is greater than 55. If you run the application, you will find that Henry and Daisy are the only two items that match the search criteria.

If you want to build a more complex query, you might want to find only the BMWs that have a Speed value above 90. To do so, simply build a compound Boolean statement using the C# && operator.

static void GetFastBMWs(List<Car> myCars)
{
// Find the fast BMWs!
var fastCars = from c in myCars where c.Speed > 90 && c.Make == "BMW" select c;
foreach (var car in fastCars)
{
Console.WriteLine("{0} is going too fast!", car.PetName);
}
}

In this case, the only pet name printed out is Henry.

Applying LINQ Queries to Nongeneric Collections

Recall that the query operators of LINQ are designed to work with any type implementing IEnumerable<T> (either directly or via extension methods). Given that System.Array has been provided with such necessary infrastructure, it might surprise you that the legacy (nongeneric) containers within System.Collections have not. Thankfully, it is still possible to iterate over data contained within nongeneric collections using the generic Enumerable.OfType<T>() extension method.

When calling OfType<T>() from a nongeneric collection object (such as the ArrayList), simply specify the type of item within the container to extract a compatible IEnumerable<T> object. In code, you can store this data point using an implicitly typed variable.

Consider the following new method, which fills an ArrayList with a set of Car objects (be sure to import the System.Collections namespace into your Program.cs file):

static void LINQOverArrayList()
{
Console.WriteLine("***** LINQ over ArrayList *****");

// Here is a nongeneric collection of cars.
ArrayList myCars = new ArrayList() {
new Car{ PetName = "Henry", Color = "Silver", Speed = 100, Make = "BMW"},
new Car{ PetName = "Daisy", Color = "Tan", Speed = 90, Make = "BMW"},
new Car{ PetName = "Mary", Color = "Black", Speed = 55, Make = "VW"},
new Car{ PetName = "Clunker", Color = "Rust", Speed = 5, Make = "Yugo"},
new Car{ PetName = "Melvin", Color = "White", Speed = 43, Make = "Ford"}
};

// Transform ArrayList into an IEnumerable<T>-compatible type.
var myCarsEnum = myCars.OfType<Car>();

// Create a query expression targeting the compatible type.
var fastCars = from c in myCarsEnum where c.Speed > 55 select c;

foreach (var car in fastCars)
{
Console.WriteLine("{0} is going too fast!", car.PetName);
}
}

Similar to the previous examples, this method, when called from Main(), will display only the names Henry and Daisy, based on the format of the LINQ query.

Filtering Data Using OfType<T>( )

As you know, nongeneric types are capable of containing any combination of items, as the members of these containers (again, such as the ArrayList) are prototyped to receive System.Objects. For example, assume an ArrayList contains a variety of items, only a subset of which are numerical. If you want to obtain a subset that contains only numerical data, you can do so using OfType<T>() since it filters out each element whose type is different from the given type during the iterations.

static void OfTypeAsFilter()
{
// Extract the ints from the ArrayList.
ArrayList myStuff = new ArrayList();
myStuff.AddRange(new object[] { 10, 400, 8, false, new Car(), "string data" });
var myInts = myStuff.OfType<int>();

// Prints out 10, 400, and 8.
foreach (int i in myInts)
{
Console.WriteLine("Int value: {0}", i);
}
}

At this point, you have had a chance to apply LINQ queries to arrays, generic collections, and nongeneric collections. These containers held both C# primitive types (integers, string data) as well as custom classes. The next task is to learn about many additional LINQ operators that can be used to build more complex and useful queries.

Image Source Code The LinqOverCollections project can be found in the Chapter 12 subdirectory.

Investigating the C# LINQ Query Operators

C# defines a good number of query operators out of the box. Table 12-3 documents some of the more commonly used query operators.

Image Note The .NET Framework SDK documentation provides full details regarding each of the C# LINQ operators. Look up the topic “LINQ General Programming Guide” for more information.

In addition to the partial list of operators shown in Table 12-3, the System.Linq.Enumerable class provides a set of methods that do not have a direct C# query operator shorthand notation but are instead exposed as extension methods. These generic methods can be called to transform a result set in various manners (Reverse<>(), ToArray<>(), ToList<>(), etc.). Some are used to extract singletons from a result set, others perform various set operations (Distinct<>(), Union<>(), Intersect<>(), etc.), and still others aggregate results (Count<>(), Sum<>(), Min<>(), Max<>(), etc.).

Table 12-3. Common LINQ Query Operators

Query Operators

Meaning in Life

from, in

Used to define the backbone for any LINQ expression, which allows you to extract a subset of data from a fitting container.

where

Used to define a restriction for which items to extract from a container.

select

Used to select a sequence from the container.

join, on, equals, into

Performs joins based on specified key. Remember, these “joins” do not need to have anything to do with data in a relational database.

orderby, ascending, descending

Allows the resulting subset to be ordered in ascending or descending order.

group, by

Yields a subset with data grouped by a specified value.

To begin digging into more intricate LINQ queries, create a new Console Application project named FunWithLinqExpressions. Next, you need to define an array or collection of some sample data. For this project, you will make an array of ProductInfo objects, defined in the following code:

class ProductInfo
{
public string Name {get; set;} = "";
public string Description {get; set;} = "";
public int NumberInStock {get; set;} = 0;

public override string ToString()
{
return string.Format("Name={0}, Description={1}, Number in Stock={2}",
Name, Description, NumberInStock);
}
}

Now populate an array with a batch of ProductInfo objects within your Main() method.

static void Main(string[] args)
{
Console.WriteLine("***** Fun with Query Expressions *****\n");

// This array will be the basis of our testing...
ProductInfo[] itemsInStock = new[] {
new ProductInfo{ Name = "Mac’s Coffee",
Description = "Coffee with TEETH",
NumberInStock = 24},
new ProductInfo{ Name = "Milk Maid Milk",
Description = "Milk cow’s love",
NumberInStock = 100},
new ProductInfo{ Name = "Pure Silk Tofu",
Description = "Bland as Possible",
NumberInStock = 120},
new ProductInfo{ Name = "Cruchy Pops",
Description = "Cheezy, peppery goodness",
NumberInStock = 2},
new ProductInfo{ Name = "RipOff Water",
Description = "From the tap to your wallet",
NumberInStock = 100},
new ProductInfo{ Name = "Classic Valpo Pizza",
Description = "Everyone loves pizza!",
NumberInStock = 73}
};

// We will call various methods here!
Console.ReadLine();
}

Basic Selection Syntax

Because the syntactical correctness of a LINQ query expression is validated at compile time, you need to remember that the ordering of these operators is critical. In the simplest terms, every LINQ query expression is built using the from, in, and select operators. Here is the general template to follow:

var result = from matchingItem in container select matchingItem;

The item after the from operator represents an item that matches the LINQ query criteria, which can be named anything you choose. The item after the in operator represents the data container to search (an array, collection, XML document, etc.).

Here is a simple query, doing nothing more than selecting every item in the container (similar in behavior to a database Select * SQL statement). Consider the following:

static void SelectEverything(ProductInfo[] products)
{
// Get everything!
Console.WriteLine("All product details:");
var allProducts = from p in products select p;

foreach (var prod in allProducts)
{
Console.WriteLine(prod.ToString());
}
}

To be honest, this query expression is not entirely useful, given that your subset is identical to that of the data in the incoming parameter. If you want, you could extract only the Name values of each car using the following selection syntax:

static void ListProductNames(ProductInfo[] products)
{
// Now get only the names of the products.
Console.WriteLine("Only product names:");
var names = from p in products select p.Name;

foreach (var n in names)
{
Console.WriteLine("Name: {0}", n);
}
}

Obtaining Subsets of Data

To obtain a specific subset from a container, you can use the where operator. When doing so, the general template now becomes the following code:

var result = from item in container where BooleanExpression select item;

Notice that the where operator expects an expression that resolves to a Boolean. For example, to extract from the ProductInfo[] argument only the items that have more than 25 items on hand, you could author the following code:

static void GetOverstock(ProductInfo[] products)
{
Console.WriteLine("The overstock items!");

// Get only the items where we have more than
// 25 in stock.
var overstock = from p in products where p.NumberInStock > 25 select p;

foreach (ProductInfo c in overstock)
{
Console.WriteLine(c.ToString());
}
}

As shown earlier in this chapter, when you are building a where clause, it is permissible to make use of any valid C# operators to build complex expressions. For example, recall the query that extracts out only the BMWs going at least 100 mph:

// Get BMWs going at least 100 mph.
var onlyFastBMWs = from c in myCars
where c.Make == "BMW" && c.Speed >= 100 select c;
foreach (Car c in onlyFastBMWs)
{
Console.WriteLine("{0} is going {1} MPH", c.PetName, c.Speed);
}

Projecting New Data Types

It is also possible to project new forms of data from an existing data source. Let’s assume you want to take the incoming ProductInfo[] parameter and obtain a result set that accounts only for the name and description of each item. To do so, you can define a select statement that dynamically yields a new anonymous type.

static void GetNamesAndDescriptions(ProductInfo[] products)
{
Console.WriteLine("Names and Descriptions:");
var nameDesc = from p in products select new { p.Name, p.Description };

foreach (var item in nameDesc)
{
// Could also use Name and Description properties directly.
Console.WriteLine(item.ToString());
}
}

Always remember that when you have a LINQ query that makes use of a projection, you have no way of knowing the underlying data type, as this is determined at compile time. In these cases, the var keyword is mandatory. As well, recall that you cannot create methods with implicitly typed return values. Therefore, the following method would not compile:

static var GetProjectedSubset(ProductInfo[] products)
{
var nameDesc = from p in products select new { p.Name, p.Description };
return nameDesc; // Nope!
}

When you need to return projected data to a caller, one approach is to transform the query result into a .NET System.Array object using the ToArray() extension method. Thus, if you were to update your query expression as follows:

// Return value is now an Array.
static Array GetProjectedSubset(ProductInfo[] products)
{
var nameDesc = from p in products select new { p.Name, p.Description };

// Map set of anonymous objects to an Array object.
return nameDesc.ToArray();
}

you could invoke and process the data from Main() as follows:

Array objs = GetProjectedSubset(itemsInStock);
foreach (object o in objs)
{
Console.WriteLine(o); // Calls ToString() on each anonymous object.
}

Note that you must use a literal System.Array object and cannot make use of the C# array declaration syntax, given that you don’t know the underlying type of type because you are operating on a compiler-generated anonymous class! Also note that you are not specifying the type parameter to the generic ToArray<T>() method, as you once again don’t know the underlying data type until compile time, which is too late for your purposes.

The obvious problem is that you lose any strong typing, as each item in the Array object is assumed to be of type Object. Nevertheless, when you need to return a LINQ result set that is the result of a projection operation, transforming the data into an Array type (or another suitable container via other members of the Enumerable type) is mandatory.

Obtaining Counts Using Enumerable

When you are projecting new batches of data, you may need to discover exactly how many items have been returned into the sequence. Any time you need to determine the number of items returned from a LINQ query expression, simply use the Count() extension method of theEnumerable class. For example, the following method will find all string objects in a local array that have a length greater than six characters:

static void GetCountFromQuery()
{
string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

// Get count from the query.
int numb =
(from g in currentVideoGames where g.Length > 6 select g).Count();

// Print out the number of items.
Console.WriteLine("{0} items honor the LINQ query.", numb);
}

Reversing Result Sets

You can reverse the items within a result set quite simply using the Reverse<>() extension method of the Enumerable class. For example, the following method selects all items from the incoming ProductInfo[] parameter, in reverse:

static void ReverseEverything(ProductInfo[] products)
{
Console.WriteLine("Product in reverse:");
var allProducts = from p in products select p;
foreach (var prod in allProducts.Reverse())
{
Console.WriteLine(prod.ToString());
}
}

Sorting Expressions

As you have seen over this chapter’s initial examples, a query expression can take an orderby operator to sort items in the subset by a specific value. By default, the order will be ascending; thus, ordering by a string would be alphabetical, ordering by numerical data would be lowest to highest, and so forth. If you need to view the results in a descending order, simply include the descending operator. Ponder the following method:

static void AlphabetizeProductNames(ProductInfo[] products)
{
// Get names of products, alphabetized.
var subset = from p in products orderby p.Name select p;

Console.WriteLine("Ordered by Name:");
foreach (var p in subset)
{
Console.WriteLine(p.ToString());
}
}

Although ascending order is the default, you are able to make your intentions clear by using the ascending operator.

var subset = from p in products orderby p.Name ascending select p;

If you want to get the items in descending order, you can do so via the descending operator.

var subset = from p in products orderby p.Name descending select p;

LINQ As a Better Venn Diagramming Tool

The Enumerable class supports a set of extension methods that allows you to use two (or more) LINQ queries as the basis to find unions, differences, concatenations, and intersections of data. First, consider the Except() extension method, which will return a LINQ result set that contains the differences between two containers, which in this case, is the value Yugo:

static void DisplayDiff()
{
List<string> myCars = new List<String> {"Yugo", "Aztec", "BMW"};
List<string> yourCars = new List<String>{"BMW", "Saab", "Aztec" };

var carDiff =(from c in myCars select c)
.Except(from c2 in yourCars select c2);

Console.WriteLine("Here is what you don’t have, but I do:");
foreach (string s in carDiff)
Console.WriteLine(s); // Prints Yugo.
}

The Intersect() method will return a result set that contains the common data items in a set of containers. For example, the following method returns the sequence Aztec and BMW:

static void DisplayIntersection()
{
List<string> myCars = new List<String> { "Yugo", "Aztec", "BMW" };
List<string> yourCars = new List<String> { "BMW", "Saab", "Aztec" };

// Get the common members.
var carIntersect = (from c in myCars select c)
.Intersect(from c2 in yourCars select c2);

Console.WriteLine("Here is what we have in common:");
foreach (string s in carIntersect)
Console.WriteLine(s); // Prints Aztec and BMW.
}

The Union() method, as you would guess, returns a result set that includes all members of a batch of LINQ queries. Like any proper union, you will not find repeating values if a common member appears more than once. Therefore, the following method will print out the values Yugo,Aztec, BMW, and Saab:

static void DisplayUnion()
{
List<string> myCars = new List<String> { "Yugo", "Aztec", "BMW" };
List<string> yourCars = new List<String> { "BMW", "Saab", "Aztec" };

// Get the union of these containers.
var carUnion = (from c in myCars select c)
.Union(from c2 in yourCars select c2);

Console.WriteLine("Here is everything:");
foreach (string s in carUnion)
Console.WriteLine(s); // Prints all common members.
}

Finally, the Concat() extension method returns a result set that is a direct concatenation of LINQ result sets. For example, the following method prints out the results Yugo, Aztec, BMW, BMW, Saab, and Aztec:

static void DisplayConcat()
{
List<string> myCars = new List<String> { "Yugo", "Aztec", "BMW" };
List<string> yourCars = new List<String> { "BMW", "Saab", "Aztec" };

var carConcat = (from c in myCars select c)
.Concat(from c2 in yourCars select c2);

// Prints:
// Yugo Aztec BMW BMW Saab Aztec.
foreach (string s in carConcat)
Console.WriteLine(s);
}

Removing Duplicates

When you call the Concat() extension method, you could very well end up with redundant entries in the fetched result, which could be exactly what you want in some cases. However, in other cases, you might want to remove duplicate entries in your data. To do so, simply call theDistinct() extension method, as shown here:

static void DisplayConcatNoDups()
{
List<string> myCars = new List<String> { "Yugo", "Aztec", "BMW" };
List<string> yourCars = new List<String> { "BMW", "Saab", "Aztec" };

var carConcat = (from c in myCars select c)
.Concat(from c2 in yourCars select c2);

// Prints:
// Yugo Aztec BMW Saab Aztec.
foreach (string s in carConcat.Distinct())
Console.WriteLine(s);
}

LINQ Aggregation Operations

LINQ queries can also be designed to perform various aggregation operations on the result set. The Count() extension method is one such aggregation example. Other possibilities include obtaining an average, maximum, minimum, or sum of values using the Max(), Min(),Average(), or Sum() members of the Enumerable class. Here is a simple example:

static void AggregateOps()
{
double[] winterTemps = { 2.0, -21.3, 8, -4, 0, 8.2 };

// Various aggregation examples.
Console.WriteLine("Max temp: {0}",
(from t in winterTemps select t).Max());

Console.WriteLine("Min temp: {0}",
(from t in winterTemps select t).Min());

Console.WriteLine("Average temp: {0}",
(from t in winterTemps select t).Average());

Console.WriteLine("Sum of all temps: {0}",
(from t in winterTemps select t).Sum());
}

These examples should give you enough knowledge to feel comfortable with the process of building LINQ query expressions. While there are additional operators you have not yet examined, you will see further examples later in this text when you learn about related LINQ technologies. To wrap up your first look at LINQ, the remainder of this chapter will dive into the details between the C# LINQ query operators and the underlying object model.

Image Source Code The FunWithLinqExpressions project can be found in the Chapter 12 subdirectory.

The Internal Representation of LINQ Query Statements

At this point, you have been introduced to the process of building query expressions using various C# query operators (such as from, in, where, orderby, and select). Also, you discovered that some functionality of the LINQ to Objects API can be accessed only when calling extension methods of the Enumerable class. The truth of the matter, however, is that when compiled, the C# compiler actually translates all C# LINQ operators into calls on methods of the Enumerable class.

A great many of the methods of Enumerable have been prototyped to take delegates as arguments. In particular, many methods require a generic delegate named Func<>, which was introduced to you during your examination of generic delegates in Chapter 9. Consider the Where()method of Enumerable, which is called on your behalf when you use the C# where LINQ query operator.

// Overloaded versions of the Enumerable.Where<T>() method.
// Note the second parameter is of type System.Func<>.
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source,
System.Func<TSource,int,bool> predicate)

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source,
System.Func<TSource,bool> predicate)

The Func<> delegate (as the name implies) represents a pattern for a given function with a set of up to 16 arguments and a return value. If you were to examine this type using the Visual Studio object browser, you would notice various forms of the Func<> delegate. Here’s an example:

// The various formats of the Func<> delegate.
public delegate TResult Func<T1,T2,T3,T4,TResult>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)

public delegate TResult Func<T1,T2,T3,TResult>(T1 arg1, T2 arg2, T3 arg3)

public delegate TResult Func<T1,T2,TResult>(T1 arg1, T2 arg2)

public delegate TResult Func<T1,TResult>(T1 arg1)

public delegate TResult Func<TResult>()

Given that many members of System.Linq.Enumerable demand a delegate as input, when invoking them, you can either manually create a new delegate type and author the necessary target methods, make use of a C# anonymous method, or define a proper lambda expression. Regardless of which approach you take, the end result is identical.

While it is true that making use of C# LINQ query operators is far and away the simplest way to build a LINQ query expression, let’s walk through each of these possible approaches, just so you can see the connection between the C# query operators and the underlying Enumerabletype.

Building Query Expressions with Query Operators (Revisited)

To begin, create a new Console Application project named LinqUsingEnumerable. The Program class will define a series of static helper methods (each of which is called within the Main() method) to illustrate the various manners in which you can build LINQ query expressions.

The first method, QueryStringsWithOperators(), offers the most straightforward way to build a query expression and is identical to the code shown in the LinqOverArray example earlier in this chapter.

static void QueryStringWithOperators()
{
Console.WriteLine("***** Using Query Operators *****");

string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

var subset = from game in currentVideoGames
where game.Contains(" ") orderby game select game;

foreach (string s in subset)
Console.WriteLine("Item: {0}", s);
}

The obvious benefit of using C# query operators to build query expressions is that the Func<> delegates and calls on the Enumerable type are out of sight and out of mind, as it is the job of the C# compiler to perform this translation. To be sure, building LINQ expressions using various query operators (from, in, where, or orderby) is the most common and straightforward approach.

Building Query Expressions Using the Enumerable Type and Lambda Expressions

Keep in mind that the LINQ query operators used here are simply shorthand versions for calling various extension methods defined by the Enumerable type. Consider the following QueryStringsWithEnumerableAndLambdas() method, which is processing the local string array now making direct use of the Enumerable extension methods:

static void QueryStringsWithEnumerableAndLambdas()
{
Console.WriteLine("***** Using Enumerable / Lambda Expressions *****");

string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

// Build a query expression using extension methods
// granted to the Array via the Enumerable type.
var subset = currentVideoGames.Where(game => game.Contains(" "))
.OrderBy(game => game).Select(game => game);

// Print out the results.
foreach (var game in subset)
Console.WriteLine("Item: {0}", game);
Console.WriteLine();
}

Here, you begin by calling the Where() extension method on the currentVideoGames string array. Recall that the Array class receives this via an extension method granted by Enumerable. The Enumerable.Where() method requires a System.Func<T1, TResult>delegate parameter. The first type parameter of this delegate represents the IEnumerable<T> compatible data to process (an array of strings in this case), while the second type parameter represents the method result data, which is obtained from a single statement fed into the lambda expression.

The return value of the Where() method is hidden from view in this code example, but under the covers you are operating on an OrderedEnumerable type. From this object, you call the generic OrderBy() method, which also requires a Func<> delegate parameter. This time, you are simply passing each item in turn via a fitting lambda expression. The end result of calling OrderBy() is a new ordered sequence of the initial data.

Last but not least, you call the Select() method off the sequence returned from OrderBy(), which results in the final set of data that is stored in an implicitly typed variable named subset.

To be sure, this “longhand” LINQ query is a bit more complex to tease apart than the previous C# LINQ query operator example. Part of the complexity is, no doubt, due to the chaining together of calls using the dot operator. Here is the same query, with each step broken into discrete chunks (as you might guess, you could break down the overall query in various manners):

static void QueryStringsWithEnumerableAndLambdas2()
{
Console.WriteLine("***** Using Enumerable / Lambda Expressions *****");

string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

// Break it down!
var gamesWithSpaces = currentVideoGames.Where(game => game.Contains(" "));
var orderedGames = gamesWithSpaces.OrderBy(game => game);
var subset = orderedGames.Select(game => game);

foreach (var game in subset)
Console.WriteLine("Item: {0}", game);
Console.WriteLine();
}

As you might agree, building a LINQ query expression using the methods of the Enumerable class directly is much more verbose than making use of the C# query operators. As well, given that the methods of Enumerable require delegates as parameters, you will typically need to author lambda expressions to allow the input data to be processed by the underlying delegate target.

Building Query Expressions Using the Enumerable Type and Anonymous Methods

Given that C# lambda expressions are simply shorthand notations for working with anonymous methods, consider the third query expression created within the QueryStringsWithAnonymousMethods() helper function:

static void QueryStringsWithAnonymousMethods()
{
Console.WriteLine("***** Using Anonymous Methods *****");

string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

// Build the necessary Func<> delegates using anonymous methods.
Func<string, bool> searchFilter =
delegate(string game) { return game.Contains(" "); };
Func<string, string> itemToProcess = delegate(string s) { return s; };

// Pass the delegates into the methods of Enumerable.
var subset = currentVideoGames.Where(searchFilter)
.OrderBy(itemToProcess).Select(itemToProcess);

// Print out the results.
foreach (var game in subset)
Console.WriteLine("Item: {0}", game);
Console.WriteLine();
}

This iteration of the query expression is even more verbose, because you are manually creating the Func<> delegates used by the Where(), OrderBy(), and Select() methods of the Enumerable class. On the plus side, the anonymous method syntax does keep all the delegate processing contained within a single method definition. Nevertheless, this method is functionally equivalent to the QueryStringsWithEnumerableAndLambdas() and QueryStringsWithOperators() methods created in the previous sections.

Building Query Expressions Using the Enumerable Type and Raw Delegates

Finally, if you want to build a query expression using the really verbose approach, you could avoid the use of lambdas/anonymous method syntax and directly create delegate targets for each Func<> type. Here is the final iteration of your query expression, modeled within a new class type named VeryComplexQueryExpression:

class VeryComplexQueryExpression
{
public static void QueryStringsWithRawDelegates()
{
Console.WriteLine("***** Using Raw Delegates *****");

string[] currentVideoGames = {"Morrowind", "Uncharted 2",
"Fallout 3", "Daxter", "System Shock 2"};

// Build the necessary Func<> delegates.
Func<string, bool> searchFilter = new Func<string, bool>(Filter);
Func<string, string> itemToProcess = new Func<string,string>(ProcessItem);

// Pass the delegates into the methods of Enumerable.
var subset = currentVideoGames
.Where(searchFilter).OrderBy(itemToProcess).Select(itemToProcess);

// Print out the results.
foreach (var game in subset)
Console.WriteLine("Item: {0}", game);
Console.WriteLine();
}

// Delegate targets.
public static bool Filter(string game) {return game.Contains(" ");}
public static string ProcessItem(string game) { return game; }
}

You can test this iteration of your string processing logic by calling this method within the Main() method of the Program class, as follows:

VeryComplexQueryExpression.QueryStringsWithRawDelegates();

If you were to now run the application to test each possible approach, it should not be too surprising that the output is identical, regardless of the path taken. Keep the following points in mind regarding how LINQ query expressions are represented under the covers:

· Query expressions are created using various C# query operators.

· Query operators are simply shorthand notations for invoking extension methods defined by the System.Linq.Enumerable type.

· Many methods of Enumerable require delegates (Func<> in particular) as parameters.

· Any method requiring a delegate parameter can instead be passed a lambda expression.

· Lambda expressions are simply anonymous methods in disguise (which greatly improve readability).

· Anonymous methods are shorthand notations for allocating a raw delegate and manually building a delegate target method.

Whew! That might have been a bit deeper under the hood than you wanted to have gone, but I hope this discussion has helped you understand what the user-friendly C# query operators are actually doing behind the scenes.

Image Note The LinqUsingEnumerable project can be found in the Chapter 12 subdirectory.

Summary

LINQ is a set of related technologies that attempts to provide a single, symmetrical manner to interact with diverse forms of data. As explained over the course of this chapter, LINQ can interact with any type implementing the IEnumerable<T> interface, including simple arrays as well as generic and nongeneric collections of data.

As you have seen, working with LINQ technologies is accomplished using several C# language features. For example, given that LINQ query expressions can return any number of result sets, it is common to make use of the var keyword to represent the underlying data type. As well, lambda expressions, object initialization syntax, and anonymous types can all be used to build functional and compact LINQ queries.

More importantly, you have seen how the C# LINQ query operators are simply shorthand notations for making calls on static members of the System.Linq.Enumerable type. As shown, most members of Enumerable operate on Func<T> delegate types, which can take literal method addresses, anonymous methods, or lambda expressions as input to evaluate the query.