Querying In-Memory Data by Using Query Expressions - Microsoft® Visual C#® 2012 Step by Step (2012)

Microsoft® Visual C#® 2012 Step by Step (2012)

Chapter 21. Querying In-Memory Data by Using Query Expressions

After completing this chapter, you will be able to

§ Define Language-Integrated Query (LINQ) queries to examine the contents of enumerable collections.

§ Use LINQ extension methods and query operators.

§ Explain how LINQ defers evaluation of a query and how you can force immediate execution and cache the results of a LINQ query.

You have now met most of the features of the C# language. However, so far I have glossed over one important aspect of the language that is likely to be used by many applications: the support that C# provides for querying data. You have seen that you can define structures and classes for modeling data and that you can use collections and arrays for temporarily storing data in memory. However, how do you perform common tasks such as searching for items in a collection that match a specific set of criteria? For example, if you have a collection of Customer objects, how do you find all customers that are located in London, or how can you find out which town has the most customers that have procured your services? You can write your own code to iterate through a collection and examine the fields in each object, but these types of tasks occur so often that the designers of C# decided to include features in the language to minimize the amount of code you need to write. In this chapter, you will learn how to use these advanced C# language features to query and manipulate data.

What Is Language-Integrated Query?

All but the most trivial of applications need to process data. Historically, most applications provided their own logic for performing these operations. However, this strategy can lead to the code in an application becoming very tightly coupled with the structure of the data that it processes. If the data structures change, you might need to make a significant number of changes to the code that handles the data. The designers of the Microsoft .NET Framework thought long and hard about these issues and decided to make the life of an application developer easier by providing features that abstract the mechanism that an application uses to query data from application code itself. These features are called Language-Integrated Query, or LINQ.

The creators of LINQ took an unabashed look at the way in which relational database management systems, such as Microsoft SQL Server, separate the language used to query a database from the internal format of the data in the database. Developers accessing a SQL Server database issue Structured Query Language (SQL) statements to the database management system. SQL provides a high-level description of the data that the developer wants to retrieve but does not indicate exactly how the database management system should retrieve this data. These details are controlled by the database management system itself. Consequently, an application that invokes SQL statements does not care how the database management system physically stores or retrieves data. The format used by the database management system can change (for example, if a new version is released) without the application developer needing to modify the SQL statements used by the application.

LINQ provides syntax and semantics very reminiscent of SQL, and with many of the same advantages. You can change the underlying structure of the data being queried without needing to change the code that actually performs the queries. You should be aware that although LINQ looks similar to SQL, it is far more flexible and can handle a wider variety of logical data structures. For example, LINQ can handle data organized hierarchically, such as that found in an XML document. However, this chapter concentrates on using LINQ in a relational manner.

Using LINQ in a C# Application

Perhaps the easiest way to explain how to use the C# features that support LINQ is to work through some simple examples based on the following sets of customer and address information:

Customer Information

CustomerID

FirstName

LastName

CompanyName

1

Kim

Abercrombie

Alpine Ski House

2

Jeff

Hay

Coho Winery

3

Charlie

Herb

Alpine Ski House

4

Chris

Preston

Trey Research

5

Dave

Barnett

Wingtip Toys

6

Ann

Beebe

Coho Winery

7

John

Kane

Wingtip Toys

8

David

Simpson

Trey Research

9

Greg

Chapman

Wingtip Toys

10

Tim

Litton

Wide World Importers

Address Information

CompanyName

City

Country

Alpine Ski House

Berne

Switzerland

Coho Winery

San Francisco

United States

Trey Research

New York

United States

Wingtip Toys

London

United Kingdom

Wide World Importers

Tetbury

United Kingdom

LINQ requires the data to be stored in a data structure that implements the IEnumerable or IEnumerable<T> interface, as described in Chapter 19 It does not matter what structure you use (an array, a HashSet<T>, a Queue<T>, or any of the other collection types, or even one that you define yourself) as long as it is enumerable. However, to keep things straightforward, the examples in this chapter assume that the customer and address information is held in the customers and addresses arrays shown in the following code example.

NOTE

In a real-world application, you would populate these arrays by reading the data from a file or a database.

var customers = new[] {

new { CustomerID = 1, FirstName = "Kim", LastName = "Abercrombie",

CompanyName = "Alpine Ski House" },

new { CustomerID = 2, FirstName = "Jeff", LastName = "Hay",

CompanyName = "Coho Winery" },

new { CustomerID = 3, FirstName = "Charlie", LastName = "Herb",

CompanyName = "Alpine Ski House" },

new { CustomerID = 4, FirstName = "Chris", LastName = "Preston",

CompanyName = "Trey Research" },

new { CustomerID = 5, FirstName = "Dave", LastName = "Barnett",

CompanyName = "Wingtip Toys" },

new { CustomerID = 6, FirstName = "Ann", LastName = "Beebe",

CompanyName = "Coho Winery" },

new { CustomerID = 7, FirstName = "John", LastName = "Kane",

CompanyName = "Wingtip Toys" },

new { CustomerID = 8, FirstName = "David", LastName = "Simpson",

CompanyName = "Trey Research" },

new { CustomerID = 9, FirstName = "Greg", LastName = "Chapman",

CompanyName = "Wingtip Toys" },

new { CustomerID = 10, FirstName = "Tim", LastName = "Litton",

CompanyName = "Wide World Importers" }

};

var addresses = new[] {

new { CompanyName = "Alpine Ski House", City = "Berne",

Country = "Switzerland"},

new { CompanyName = "Coho Winery", City = "San Francisco",

Country = "United States"},

new { CompanyName = "Trey Research", City = "New York",

Country = "United States"},

new { CompanyName = "Wingtip Toys", City = "London",

Country = "United Kingdom"},

new { CompanyName = "Wide World Importers", City = "Tetbury",

Country = "United Kingdom"}

};

NOTE

The remaining sections in this chapter show you the basic capabilities and syntax for querying data by using LINQ methods. The syntax can become a little complex at times, and you will see when you reach the section Using Query Operators that it is not actually necessary to remember how all the syntax works. However, it is useful for you to at least take a look at the following sections so that you can fully appreciate how the query operators provided with C# perform their tasks.

Selecting Data

Suppose you want to display a list consisting of the first name of each customer in the customers array. You can achieve this task with the following code:

IEnumerable<string> customerFirstNames =

customers.Select(cust => cust.FirstName);

foreach (string name in customerFirstNames)

{

Console.WriteLine(name);

}

Although this block of code is quite short, it does a lot, and it requires a degree of explanation, starting with the use of the Select method of the customers array.

The Select method enables you to retrieve specific data from the array—in this case, just the value in the FirstName field of each item in the array. How does it work? The parameter to the Select method is actually another method that takes a row from the customers array and returns the selected data from that row. You can define your own custom method to perform this task, but the simplest mechanism is to use a lambda expression to define an anonymous method, as shown in the preceding example. There are three important things that you need to understand at this point:

§ The variable cust is the parameter passed in to the method. You can think of cust as an alias for each row in the customers array. The compiler deduces this from the fact that you are calling the Select method on the customers array. You can use any legal C# identifier in place of cust.

§ The Select method does not actually retrieve the data at this time; it simply returns an enumerable object that will fetch the data identified by the Select method when you iterate over it later. We will return to this aspect of LINQ in the section LINQ and Deferred Evaluation later in this chapter.

§ The Select method is not actually a method of the Array type. It is an extension method of the Enumerable class. The Enumerable class is located in the System.Linq namespace and provides a substantial set of static methods for querying objects that implement the genericIEnumerable<T> interface.

The preceding example uses the Select method of the customers array to generate an IEnumerable<string> object named customerFirstNames. (It is of type IEnumerable<string> because the Select method returns an enumerable collection of customer first names, which are strings.) Theforeach statement iterates through this collection of strings, printing out the first name of each customer in the following sequence:

Kim

Jeff

Charlie

Chris

Dave

Ann

John

David

Greg

Tim

You can now display the first name of each customer. How do you fetch the first and last name of each customer? This task is slightly trickier. If you examine the definition of the Enumerable.Select method in the System.Linq namespace in the documentation supplied with Microsoft Visual Studio 2012, you will see that it looks like this:

public static IEnumerable<TResult> Select<TSource, TResult> (

this IEnumerable<TSource> source,

Func<TSource, TResult> selector

)

What this actually says is that Select is a generic method that takes two type parameters named TSource and TResult, as well as two ordinary parameters named source and selector. TSource is the type of the collection that you are generating an enumerable set of results for (customer objects in this example), and TResult is the type of the data in the enumerable set of results (string objects in this example). Remember that Select is an extension method, so the source parameter is actually a reference to the type being extended (a generic collection of customer objects that implements the IEnumerable interface in the example). The selector parameter specifies a generic method that identifies the fields to be retrieved. (Remember that Func is the name of a generic delegate type in the .NET Framework that you can use for encapsulating a generic method that returns a result.) The method referred to by the selector parameter takes a TSource (in this case, customer) parameter and yields a TResult (in this case, string) object. The value returned by the Select method is an enumerable collection of TResult (again string) objects.

NOTE

You can review how extension methods work and the role of the first parameter to an extension method by revisiting Chapter 12

The important point to understand from the preceding paragraph is that the Select method returns an enumerable collection based on a single type. If you want the enumerator to return multiple items of data, such as the first and last name of each customer, you have at least two options:

§ You can concatenate the first and last names together into a single string in the Select method, like this:

§ IEnumerable<string> customerNames =

customers.Select(cust => String.Format("{0} {1}", cust.FirstName, cust.LastName));

§ You can define a new type that wraps the first and last names, and use the Select method to construct instances of this type, like this:

§ class FullName

§ {

§ public string FirstName{ get; set; }

§ public string LastName{ get; set; }

§ }

§ ...

§ IEnumerable<FullName> customerNames =

§ customers.Select(cust => new FullName

§ {

§ FirstName = cust.FirstName,

§ LastName = cust.LastName

} );

The second option is arguably preferable, but if this is the only use that your application makes of the Names type, you might prefer to use an anonymous type instead of defining a new type specifically for a single operation, like this:

var customerNames =

customers.Select(cust => new { FirstName = cust.FirstName, LastName = cust.LastName } );

Notice the use of the var keyword here to define the type of the enumerable collection. The type of objects in the collection is anonymous, so you do not know the specific type for the objects in the collection.

Filtering Data

The Select method enables you to specify, or project, the fields that you want to include in the enumerable collection. However, you might also want to restrict the rows that the enumerable collection contains. For example, suppose you want to list the names of all companies in the addressesarray that are located in the United States only. To do this, you can use the Where method, as follows:

IEnumerable<string> usCompanies =

addresses.Where(addr => String.Equals(addr.Country, "United States"))

.Select(usComp => usComp.CompanyName);

foreach (string name in usCompanies)

{

Console.WriteLine(name);

}

Syntactically, the Where method is similar to Select. It expects a parameter that defines a method that filters the data according to whatever criteria you specify. This example makes use of another lambda expression. The variable addr is an alias for a row in the addresses array, and the lambda expression returns all rows where the Country field matches the string “United States”. The Where method returns an enumerable collection of rows containing every field from the original collection. The Select method is then applied to these rows to project only the CompanyName field from this enumerable collection to return another enumerable collection of string objects. (The variable usComp is an alias for the type of each row in the enumerable collection returned by the Where method.) The type of the result of this complete expression is thereforeIEnumerable<string>. It is important to understand this sequence of operations—the Where method is applied first to filter the rows, followed by the Select method to specify the fields. The foreach statement that iterates through this collection displays the following companies:

Coho Winery

Trey Research

Ordering, Grouping, and Aggregating Data

If you are familiar with SQL, you are aware that SQL enables you to perform a wide variety of relational operations besides simple projection and filtering. For example, you can specify that you want data to be returned in a specific order, you can group the rows returned according to one or more key fields, and you can calculate summary values based on the rows in each group. LINQ provides the same functionality.

To retrieve data in a particular order, you can use the OrderBy method. Like the Select and Where methods, OrderBy expects a method as its argument. This method identifies the expressions that you want to use to sort the data. For example, you can display the name of each company in theaddresses array in ascending order, like this:

IEnumerable<string> companyNames =

addresses.OrderBy(addr => addr.CompanyName).Select(comp => comp.CompanyName);

foreach (string name in companyNames)

{

Console.WriteLine(name);

}

This block of code displays the companies in the addresses table in alphabetical order:

Alpine Ski House

Coho Winery

Trey Research

Wide World Importers

Wingtip Toys

If you want to enumerate the data in descending order, you can use the OrderByDescending method instead. If you want to order by more than one key value, you can use the ThenBy or ThenByDescending method after OrderBy or OrderByDescending.

To group data according to common values in one or more fields, you can use the GroupBy method. The next example shows how to group the companies in the addresses array by country:

var companiesGroupedByCountry =

addresses.GroupBy(addrs => addrs.Country);

foreach (var companiesPerCountry in companiesGroupedByCountry)

{

Console.WriteLine("Country: {0}\t{1} companies",

companiesPerCountry.Key, companiesPerCountry.Count());

foreach (var companies in companiesPerCountry)

{

Console.WriteLine("\t{0}", companies.CompanyName);

}

}

By now, you should recognize the pattern. The GroupBy method expects a method that specifies the fields to group the data by. There are some subtle differences between the GroupBy method and the other methods that you have seen so far, though.

The main point of interest is that you don’t need to use the Select method to project the fields to the result. The enumerable set returned by GroupBy contains all the fields in the original source collection, but the rows are ordered into a set of enumerable collections based on the field identified by the method specified by GroupBy. In other words, the result of the GroupBy method is an enumerable set of groups, each of which is an enumerable set of rows. In the example just shown, the enumerable set companiesGroupedByCountry is a set of countries. The items in this set are themselves enumerable collections containing the companies for each country in turn. The code that displays the companies in each country uses a foreach loop to iterate through the companiesGroupedByCountry set to yield and display each country in turn, and then it uses a nested foreachloop to iterate through the set of companies in each country. Notice in the outer foreach loop that you can access the value you are grouping by using the Key field of each item, and you can also calculate summary data for each group by using methods such as Count, Max, Min, and many others. The output generated by the example code looks like this:

Country: Switzerland 1 companies

Alpine Ski House

Country: United States 2 companies

Coho Winery

Trey Research

Country: United Kingdom 2 companies

Wingtip Toys

Wide World Importers

You can use many of the summary methods such as Count, Max, and Min directly over the results of the Select method. If you want to know how many companies there are in the addresses array, you can use a block of code such as this:

int numberOfCompanies = addresses.Select(addr => addr.CompanyName).Count();

Console.WriteLine("Number of companies: {0}", numberOfCompanies);

Notice that the result of these methods is a single scalar value rather than an enumerable collection. The output from the preceding block of code looks like this:

Number of companies: 5

I should utter a word of caution at this point. These summary methods do not distinguish between rows in the underlying set that contain duplicate values in the fields you are projecting. This means that, strictly speaking, the preceding example shows you only how many rows in theaddresses array contain a value in the CompanyName field. If you wanted to find out how many different countries are mentioned in this table, you might be tempted to try this:

int numberOfCountries = addresses.Select(addr => addr.Country).Count();

Console.WriteLine("Number of countries: {0}", numberOfCountries);

The output looks like this:

Number of countries: 5

In fact, there are only three different countries in the addresses array—it just so happens that United States and United Kingdom both occur twice. You can eliminate duplicates from the calculation by using the Distinct method, like this:

int numberOfCountries =

addresses.Select(addr => addr.Country).Distinct().Count();

Console.WriteLine("Number of countries: {0}", numberOfCountries);

The Console.WriteLine statement will now output the expected result:

Number of countries: 3

Joining Data

Just like SQL, LINQ enables you to join multiple sets of data together over one or more common key fields. The following example shows how to display the first and last names of each customer, together with the name of the country where the customer is located:

var companiesAndCustomers = customers

.Select(c => new { c.FirstName, c.LastName, c.CompanyName })

.Join(addresses, custs => custs.CompanyName, addrs => addrs.CompanyName,

(custs, addrs) => new {custs.FirstName, custs.LastName, addrs.Country });

foreach (var row in companiesAndCustomers)

{

Console.WriteLine(row);

}

The customers’ first and last names are available in the customers array, but the country for each company that customers work for is stored in the addresses array. The common key between the customers array and the addresses array is the company name. The Select method specifies the fields of interest in the customers array (FirstName and LastName), together with the field containing the common key (CompanyName). You use the Join method to join the data identified by the Select method with another enumerable collection. The parameters to the Join method are as follows:

§ The enumerable collection with which to join

§ A method that identifies the common key fields from the data identified by the Select method

§ A method that identifies the common key fields on which to join the selected data

§ A method that specifies the columns you require in the enumerable result set returned by the Join method

In this example, the Join method joins the enumerable collection containing the FirstName, LastName, and CompanyName fields from the customers array with the rows in the addresses array. The two sets of data are joined where the value in the CompanyName field in the customers array matches the value in the CompanyName field in the addresses array. The result set comprises rows containing the FirstName and LastName fields from the customers array with the Country field from the addresses array. The code that outputs the data from the companiesAndCustomerscollection displays the following information:

{ FirstName = Kim, LastName = Abercrombie, Country = Switzerland }

{ FirstName = Jeff, LastName = Hay, Country = United States }

{ FirstName = Charlie, LastName = Herb, Country = Switzerland }

{ FirstName = Chris, LastName = Preston, Country = United States }

{ FirstName = Dave, LastName = Barnett, Country = United Kingdom }

{ FirstName = Ann, LastName = Beebe, Country = United States }

{ FirstName = John, LastName = Kane, Country = United Kingdom }

{ FirstName = David, LastName = Simpson, Country = United States }

{ FirstName = Greg, LastName = Chapman, Country = United Kingdom }

{ FirstName = Tim, LastName = Litton, Country = United Kingdom }

NOTE

Remember that collections in memory are not the same as tables in a relational database, and the data they contain is not subject to the same data integrity constraints. In a relational database, it could be acceptable to assume that every customer has a corresponding company and that each company has its own unique address. Collections do not enforce the same level of data integrity, meaning that you can quite easily have a customer referencing a company that does not exist in the addresses array, and you might even have the same company occurring more than once in the addresses array. In these situations, the results that you obtain might be accurate but unexpected. Join operations work best when you fully understand the relationships between the data you are joining.

Using Query Operators

The preceding sections have shown you many of the features available for querying in-memory data by using the extension methods for the Enumerable class defined in the System.Linq namespace. The syntax makes use of several advanced C# language features, and the resultant code can sometimes be quite hard to understand and maintain. To relieve you of some of this burden, the designers of C# added query operators to the language to enable you to employ LINQ features by using a syntax more akin to SQL.

As you saw in the examples shown earlier in this chapter, you can retrieve the first name for each customer like this:

IEnumerable<string> customerFirstNames =

customers.Select(cust => cust.FirstName);

You can rephrase this statement by using the from and select query operators, like this:

var customerFirstNames = from cust in customers

select cust.FirstName;

At compile time, the C# compiler resolves this expression into the corresponding Select method. The from operator defines an alias for the source collection, and the select operator specifies the fields to retrieve by using this alias. The result is an enumerable collection of customer first names. If you are familiar with SQL, notice that the from operator occurs before the select operator.

Continuing in the same vein, to retrieve the first and last names for each customer, you can use the following statement. (You might want to refer to the earlier example of the same statement based on the Select extension method.)

var customerNames = from cust in customers

select new { cust.FirstName, cust.LastName };

You use the where operator to filter data. The following example shows how to return the names of the companies based in the United States from the addresses array:

var usCompanies = from a in addresses

where String.Equals(a.Country, "United States")

select a.CompanyName;

To order data, use the orderby operator, like this:

var companyNames = from a in addresses

orderby a.CompanyName

select a.CompanyName;

You can group data by using the group operator:

var companiesGroupedByCountry = from a in addresses

group a by a.Country;

Notice that, as with the earlier example showing how to group data, you do not provide the select operator, and you can iterate through the results by using exactly the same code as the earlier example, like this:

foreach (var companiesPerCountry in companiesGroupedByCountry)

{

Console.WriteLine("Country: {0}\t{1} companies",

companiesPerCountry.Key, companiesPerCountry.Count());

foreach (var companies in companiesPerCountry)

{

Console.WriteLine("\t{0}", companies.CompanyName);

}

}

You can invoke the summary functions, such as Count, over the collection returned by an enumerable collection, like this:

int numberOfCompanies = (from a in addresses

select a.CompanyName).Count();

Notice that you wrap the expression in parentheses. If you want to ignore duplicate values, use the Distinct method:

int numberOfCountries = (from a in addresses

select a.Country).Distinct().Count();

TIP

In many cases, you probably want to count just the number of rows in a collection rather than the number of values in a field across all the rows in the collection. In this case, you can invoke the Count method directly over the original collection, like this:

int numberOfCompanies = addresses.Count();

You can use the join operator to combine two collections across a common key. The following example shows the query returning customers and addresses over the CompanyName column in each collection, this time rephrased using the join operator. You use the on clause with the equalsoperator to specify how the two collections are related.

NOTE

LINQ currently supports equi-joins (joins based on equality) only. Database developers who are used to SQL may be familiar with joins based on other operators such as > and <, but LINQ does not provide these features.

var citiesAndCustomers = from a in addresses

join c in customers

on a.CompanyName equals c.CompanyName

select new { c.FirstName, c.LastName, a.Country };

NOTE

In contrast with SQL, the order of the expressions in the on clause of a LINQ expression is important. You must place the item you are joining from (referencing the data in the collection in the from clause) to the left of the equals operator and the item you are joining with (referencing the data in the collection in the join clause) to the right.

LINQ provides a large number of other methods for summarizing information, joining, grouping, and searching through data. This section has covered just the most common features. For example, LINQ provides the Intersect and Union methods, which you can use to perform setwide operations. It also provides methods such as Any and All that you can use to determine whether at least one item in a collection or every item in a collection matches a specified predicate. You can partition the values in an enumerable collection by using the Take and Skip methods. For more information, see the material in the LINQ section of the documentation provided with Visual Studio 2012.

Querying Data in Tree<TItem> Objects

The examples you’ve seen so far in this chapter have shown how to query the data in an array. You can use exactly the same techniques for any collection class that implements the generic IEnumerable<T> interface. In the following exercise, you will define a new class for modeling employees for a company. You will create a BinaryTree object containing a collection of Employee objects, and then you will use LINQ to query this information. You will initially call the LINQ extension methods directly, but then you will modify your code to use query operators.

Retrieve data from a BinaryTree by using the extension methods

1. Start Visual Studio 2012 if it is not already running.

2. Open the QueryBinaryTree solution, located in the \Microsoft Press\Visual CSharp Step By Step\Chapter 21\Windows X\QueryBinaryTree folder in your Documents folder. The project contains the Program.cs file, which defines the Program class with the Main and doWork methods that you saw in previous exercises.

3. In Solution Explorer, right-click the QueryBinaryTree project, point to Add, and then click Class. In the Add New Item—Query BinaryTree dialog box, type Employee.cs in the Name text box, and then click Add.

4. Add the automatic properties shown below in bold to the Employee class:

5. class Employee

6. {

7. public string FirstName { get; set; }

8. public string LastName { get; set; }

9. public string Department { get; set; }

10. public int Id { get; set; }

}

11.Add the ToString method shown in bold to the Employee class. Types in the .NET Framework use this method when converting the object to a string representation, such as when displaying it by using the Console.WriteLine statement.

12.class Employee

13.{

14. ...

15. public override string ToString()

16. {

17. return String.Format("Id: {0}, Name: {1} {2}, Dept: {3}",

18. this.Id, this.FirstName, this.LastName,

19. this.Department);

20. }

}

21.Modify the definition of the Employee class to implement the IComparable<Employee> interface, as shown here:

22.class Employee : IComparable<Employee>

23.{

}

This step is necessary because the BinaryTree class specifies that its elements must be “comparable.”

24.Right-click the IComparable<Employee> interface in the class definition, point to Implement Interface, and then click Implement Interface Explicitly.

This action generates a default implementation of the CompareTo method. Remember that the BinaryTree class calls this method when it needs to compare elements when inserting them into the tree.

25.Replace the body of the CompareTo method with the code shown below in bold. This implementation of the CompareTo method compares Employee objects based on the value of the Id field.

26.int IComparable<Employee>.CompareTo(Employee other)

27.{

28. if (other == null)

29. {

30. return 1;

31. }

32.

33. if (this.Id > other.Id)

34. {

35. return 1;

36. }

37.

38. if (this.Id < other.Id)

39. {

40. return -1;

41. }

42.

43. return 0;

}

NOTE

For a description of the IComparable<T> interface, refer to Chapter 19.

44.In Solution Explorer, right-click the QueryBinaryTree solution, point to Add, and then click Existing Project. In the Add Existing Project dialog box, move to the folder Microsoft Press\Visual CSharp Step By Step\Chapter 21\Windows X\BinaryTree in your Documents folder, click the BinaryTree project, and then click Open.

The BinaryTree project contains a copy of the enumerable BinaryTree class that you implemented in Chapter 19.

45.In Solution Explorer, right-click the QueryBinaryTree project, and then click Add Reference. In the Reference Manager - QueryBinaryTree dialog box, in the left pane click Solution, in the middle pane select the BinaryTree project, and then click OK.

46.Display the Program.cs file for the QueryBinaryTree project in the Code and Text Editor window, and verify that the list of using directives at the top of the file includes the following line of code:

using System.Linq;

47.Add the following using directive to the list at the top of the Program.cs file to bring the BinaryTree namespace into scope:

using BinaryTree;

48.In the doWork method in the Program class, remove the // TODO: comment and add the following statements shown in bold type to construct and populate an instance of the BinaryTree class:

49.static void doWork()

50.{

51. Tree<Employee> empTree = new Tree<Employee>(new Employee {

52. Id = 1, FirstName = "Kim", LastName = "Abercrombie", Department = "IT"});

53. empTree.Insert(new Employee {

54. Id = 2, FirstName = "Jeff", LastName = "Hay", Department = "Marketing"});

55. empTree.Insert(new Employee {

56. Id = 4, FirstName = "Charlie", LastName = "Herb", Department = "IT"});

57. empTree.Insert(new Employee {

58. Id = 6, FirstName = "Chris", LastName = "Preston", Department = "Sales"});

59. empTree.Insert(new Employee {

60. Id = 3, FirstName = "Dave", LastName = "Barnett", Department = "Sales"});

61. empTree.Insert(new Employee {

62. Id = 5, FirstName = "Tim", LastName = "Litton", Department="Marketing"});

}

63.Add the following statements shown in bold to the end of the doWork method. This code invokes the Select method to list the departments found in the binary tree.

64.static void doWork()

65.{

66. ...

67. Console.WriteLine("List of departments");

68. var depts = empTree.Select(d => d.Department);

69.

70. foreach (var dept in depts)

71. {

72. Console.WriteLine("Department: {0}", dept);

73. }

}

74.On the DEBUG menu, click Start Without Debugging.

The application should output the following list of departments:

List of departments

Department: IT

Department: Marketing

Department: Sales

Department: IT

Department: Marketing

Department: Sales

Each department occurs twice because there are two employees in each department. The order of the departments is determined by the CompareTo method of the Employee class, which uses the Id property of each employee to sort the data. The first department is for the employee with the Id value 1, the second department is for the employee with the Id value 2, and so on.

75.Press Enter to return to Visual Studio 2012.

76.Modify the statement that creates the enumerable collection of departments as shown below in bold:

var depts = empTree.Select(d => d.Department).Distinct();

The Distinct method removes duplicate rows from the enumerable collection.

77.On the DEBUG menu, click Start Without Debugging.

Verify that the application now displays each department only once, like this:

List of departments

Department: IT

Department: Marketing

Department: Sales

78.Press Enter to return to Visual Studio 2012.

79.Add the following statements shown in bold to the end of the doWork method. This block of code uses the Where method to filter the employees and return only those in the IT department. The Select method returns the entire row rather than projecting specific columns.

80.static void doWork()

81.{

82. ...

83. Console.WriteLine("\nEmployees in the IT department");

84. var ITEmployees =

85. empTree.Where(e => String.Equals(e.Department, "IT"))

86. .Select(emp => emp);

87.

88. foreach (var emp in ITEmployees)

89. {

90. Console.WriteLine(emp);

91. }

}

92.Add the code shown below in bold to the end of the doWork method, after the code from the preceding step. This code uses the GroupBy method to group the employees found in the binary tree by department. The outer foreach statement iterates through each group, displaying the name of the department. The inner foreach statement displays the names of the employees in each department.

93.static void doWork()

94.{

95. ...

96. Console.WriteLine("\nAll employees grouped by department");

97. var employeesByDept = empTree.GroupBy(e => e.Department);

98.

99. foreach (var dept in employeesByDept)

100. {

101. Console.WriteLine("Department: {0}", dept.Key);

102. foreach (var emp in dept)

103. {

104. Console.WriteLine("\t{0} {1}", emp.FirstName, emp.LastName);

105. }

106. }

}

107. On the DEBUG menu, click Start Without Debugging. Verify that the output of the application looks like this:

108. List of departments

109. Department: IT

110. Department: Marketing

111. Department: Sales

112.

113. Employees in the IT department

114. Id: 1, Name: Kim Abercrombie, Dept: IT

115. Id: 4, Name: Charlie Herb, Dept: IT

116.

117. All employees grouped by department

118. Department: IT

119. Kim Abercrombie

120. Charlie Herb

121. Department: Marketing

122. Jeff Hay

123. Tim Litton

124. Department: Sales

125. Dave Barnett

Chris Preston

126. Press Enter to return to Visual Studio 2012.

Retrieve data from a BinaryTree by using query operators

1. In the doWork method, comment out the statement that generates the enumerable collection of departments, and replace it with the equivalent statement shown in bold, using the from and select query operators:

2. //var depts = empTree.Select(d => d.Department).Distinct();

3. var depts = (from d in empTree

select d.Department).Distinct();

4. Comment out the statement that generates the enumerable collection of employees in the IT department, and replace it with the following code shown in bold:

5. //var ITEmployees =

6. // empTree.Where(e => String.Equals(e.Department, "IT"))

7. // .Select(emp => emp);

8. var ITEmployees = from e in empTree

9. where String.Equals(e.Department, "IT")

select e;

10.Comment out the statement that generates the enumerable collection grouping employees by department, and replace it with the statement shown here in bold:

11.//var employeesByDept = empTree.GroupBy(e => e.Department);

12.var employeesByDept = from e in empTree

group e by e.Department;

13.On the DEBUG menu, click Start Without Debugging. Verify that the program displays the same results as before.

14.List of departments

15.Department: IT

16.Department: Marketing

17.Department: Sales

18.

19.Employees in the IT department

20.Id: 1, Name: Kim Abercrombie, Dept: IT

21.Id: 4, Name: Charlie Herb, Dept: IT

22.

23.All employees grouped by department

24.Department: IT

25. Kim Abercrombie

26. Charlie Herb

27.Department: Marketing

28. Jeff Hay

29. Tim Litton

30.Department: Sales

31. Dave Barnett

Chris Preston

32.Press Enter to return to Visual Studio 2012.

LINQ and Deferred Evaluation

When you use LINQ to define an enumerable collection, either by using the LINQ extension methods or by using query operators, you should remember that the application does not actually build the collection at the time that the LINQ extension method is executed; the collection is enumerated only when you iterate over the collection. This means that the data in the original collection can change between executing a LINQ query and retrieving the data that the query identifies; you will always fetch the most up-to-date data. For example, the following query (which you saw earlier) defines an enumerable collection of U.S. companies:

var usCompanies = from a in addresses

where String.Equals(a.Country, "United States")

select a.CompanyName;

The data in the addresses array is not retrieved, and any conditions specified in the Where filter are not evaluated until you iterate through the usCompanies collection:

foreach (string name in usCompanies)

{

Console.WriteLine(name);

}

If you modify the data in the addresses array between defining the usCompanies collection and iterating through the collection (for example, if you add a new company based in the United States), you will see this new data. This strategy is referred to as deferred evaluation.

You can force evaluation of a LINQ query when it is defined and generate a static, cached collection. This collection is a copy of the original data and will not change if the data in the collection changes. LINQ provides the ToList method to build a static List object containing a cached copy of the data. You use it like this:

var usCompanies = from a in addresses.ToList()

where String.Equals(a.Country, "United States")

select a.CompanyName;

This time, the list of companies is fixed when you create the query. If you add more U.S. companies to the addresses array, you will not see them when you iterate through the usCompanies collection. LINQ also provides the ToArray method that stores the cached collection as an array.

In the final exercise in this chapter, you will compare the effects of using deferred evaluation of a LINQ query to generating a cached collection.

Examine the effects of deferred and cached evaluation of a LINQ query

1. Return to Visual Studio 2012, display the QueryBinaryTree project, and edit the Program.cs file.

2. Comment out the contents of the doWork method apart from the statements that construct the empTree binary tree, as shown here:

3. static void doWork()

4. {

5. Tree<Employee> empTree = new Tree<Employee>(new Employee {

6. Id = 1, FirstName = "Kim", LastName = "Abercrombie", Department = "IT"});

7. empTree.Insert(new Employee {

8. Id = 2, FirstName = "Jeff", LastName = "Hay", Department = "Marketing"});

9. empTree.Insert(new Employee {

10. Id = 4, FirstName = "Charlie", LastName = "Herb", Department = "IT"});

11. empTree.Insert(new Employee {

12. Id = 6, FirstName = "Chris", LastName = "Preston", Department = "Sales"});

13. empTree.Insert(new Employee {

14. Id = 3, FirstName = "Dave", LastName = "Barnett", Department = "Sales"});

15. empTree.Insert(new Employee {

16. Id = 5, FirstName = "Tim", LastName = "Litton", Department="Marketing"});

17.

18. // comment out the rest of the method

19. ...

}

TIP

You can comment out a block of code by selecting the entire block in the Code and Text Editor window and then clicking the Comment Out the Selected Lines button on the toolbar or by pressing Ctrl+E and then pressing C.

20.Add the following statements shown in bold to the doWork method, after the code that creates and populates the empTree binary tree:

21.static void doWork()

22.{

23. ...

24. Console.WriteLine("All employees");

25. var allEmployees = from e in empTree

26. select e;

27.

28. foreach (var emp in allEmployees)

29. {

30. Console.WriteLine(emp);

31. }

}

This code generates an enumerable collection of employees named allEmployees and then iterates through this collection, displaying the details of each employee.

32.Add the following code immediately after the statements you typed in the preceding step:

33.static void doWork()

34.{

35. ...

36. empTree.Insert(new Employee

37. {

38. Id = 7,

39. FirstName = "David",

40. LastName = "Simpson",

41. Department = "IT"

42. });

43. Console.WriteLine("\nEmployee added");

44.

45. Console.WriteLine("All employees");

46. foreach (var emp in allEmployees)

47. {

48. Console.WriteLine(emp);

49. }

}

These statements add a new employee to the empTree tree and then iterate through the allEmployees collection again.

50.On the DEBUG menu, click Start Without Debugging. Verify that the output of the application looks like this:

51.All employees

52.Id: 1, Name: Kim Abercrombie, Dept: IT

53.Id: 2, Name: Jeff Hay, Dept: Marketing

54.Id: 3, Name: Dave Barnett, Dept: Sales

55.Id: 4, Name: Charlie Herb, Dept: IT

56.Id: 5, Name: Tim Litton, Dept: Marketing

57.Id: 6, Name: Chris Preston, Dept: Sales

58.

59.Employee added

60.All employees

61.Id: 1, Name: Kim Abercrombie, Dept: IT

62.Id: 2, Name: Jeff Hay, Dept: Marketing

63.Id: 3, Name: Dave Barnett, Dept: Sales

64.Id: 4, Name: Charlie Herb, Dept: IT

65.Id: 5, Name: Tim Litton, Dept: Marketing

66.Id: 6, Name: Chris Preston, Dept: Sales

Id: 7, Name: David Simpson, Dept: IT

Notice that the second time the application iterates through the allEmployees collection, the list displayed includes David Simpson, even though this employee was added only after the allEmployees collection was defined.

67.Press Enter to return to Visual Studio 2012.

68.In the doWork method, change the statement that generates the allEmployees collection to identify and cache the data immediately, as shown here in bold:

69.var allEmployees = from e in empTree.ToList<Employee>()

select e;

LINQ provides generic and nongeneric versions of the ToList and ToArray methods. If possible, it is better to use the generic versions of these methods to ensure the type safety of the result. The data returned by the select operator is an Employee object, and the code shown in this step generates allEmployees as a generic List<Employee> collection. If you specify the nongeneric ToList method, the allEmployees collection will be a List of object types.

70.On the DEBUG menu, click Start Without Debugging. Verify that the output of the application looks like this:

71.All employees

72.Id: 1, Name: Kim Abercrombie, Dept: IT

73.Id: 2, Name: Jeff Hay, Dept: Marketing

74.Id: 3, Name: Dave Barnett, Dept: Sales

75.Id: 4, Name: Charlie Herb, Dept: IT

76.Id: 5, Name: Tim Litton, Dept: Marketing

77.Id: 6, Name: Chris Preston, Dept: Sales

78.

79.Employee added

80.All employees

81.Id: 1, Name: Kim Abercrombie, Dept: IT

82.Id: 2, Name: Jeff Hay, Dept: Marketing

83.Id: 3, Name: Dave Barnett, Dept: Sales

84.Id: 4, Name: Charlie Herb, Dept: IT

85.Id: 5, Name: Tim Litton, Dept: Marketing

Id: 6, Name: Chris Preston, Dept: Sales

Notice that the second time the application iterates through the allEmployees collection, the list displayed does not include David Simpson. This is because the query is evaluated and the results are cached before David Simpson is added to the empTree binary tree.

86.Press Enter to return to Visual Studio 2012.

Summary

In this chapter, you learned how LINQ uses the IEnumerable<T> interface and extension methods to provide a mechanism for querying data. You also saw how these features support the query expression syntax in C#.

§ If you want to continue to the next chapter

Keep Visual Studio 2012 running, and turn to Chapter 22.

§ If you want to exit Visual Studio 2012 now

On the FILE menu, click Exit. If you see a Save dialog box, click Yes and save the project.

Chapter 21 Quick Reference

To

Do this

Project specified fields from an enumerable collection

Use the Select method, and specify a lambda expression that identifies the fields to project. For example:

var customerFirstNames = customers.Select(cust => cust.FirstName);

Or use the from and select query operators. For example:

var customerFirstNames =

from cust in customers

select cust.FirstName;

Filter rows from an enumerable collection

Use the Where method, and specify a lambda expression containing the criteria that rows should match. For example:

var usCompanies =

addresses.Where(addr =>

String.Equals(addr.Country, "United States"))

.Select(usComp => usComp.CompanyName);

Or use the where query operator. For example:

var usCompanies =

from a in addresses

where String.Equals(a.Country, "United States")

select a.CompanyName;

Enumerate data in a specific order

Use the OrderBy method, and specify a lambda expression identifying the field to use to order rows. For example:

var companyNames =

addresses.OrderBy(addr => addr.CompanyName)

.Select(comp => comp.CompanyName);

Or use the orderby query operator. For example:

var companyNames =

from a in addresses

orderby a.CompanyName

select a.CompanyName;

Group data by the values in a field

Use the GroupBy method, and specify a lambda expression identifying the field to use to group rows. For example:

var companiesGroupedByCountry =

addresses.GroupBy(addrs => addrs.Country);

Or use the group by query operator. For example:

var companiesGroupedByCountry =

from a in addresses

group a by a.Country;

Join data held in two different collections

Use the Join method specifying the collection to join with, the join criteria, and the fields for the result. For example:

var citiesAndCustomers =

customers

.Select(c => new { c.FirstName, c.LastName, c.CompanyName }).

Join(addresses, custs => custs.CompanyName,

addrs => addrs.CompanyName,

(custs, addrs) => new {custs.FirstName,

custs.LastName, addrs.Country });

Or use the join query operator. For example:

var citiesAndCustomers =

from a in addresses

join c in customers

on a.CompanyName equals c.CompanyName

select new { c.FirstName, c.LastName, a.Country };

Force immediate generation of the results for a LINQ query

Use the ToList or ToArray method to generate a list or an array containing the results. For example:

var allEmployees =

from e in empTree.ToList<Employee>()

select e;