LINQ to Objects - Specialized Topics - C# 24-Hour Trainer (2015)

C# 24-Hour Trainer (2015)

Section VII

Specialized Topics

Lesson 36

LINQ to Objects

Lessons 34 and 35 explain how you can use Visual Studio's wizards to build simple database programs. They show one of many ways to connect a program to a data source.

Language-Integrated Query (LINQ) provides another method for bridging the gap between a program and data. Instead of simply providing another way to access data in a database, however, LINQ can help a program access data stored in many places. LINQ lets a program use the same techniques to access data stored in databases, arrays, collections, or files.

LINQ provides four basic technologies that give you access to data stored in various places:

· LINQ to SQL—Data stored in SQL Server databases

· LINQ to Dataset—Data stored in other databases

· LINQ to XML—Data stored in XML (eXtensible Markup Language) files

· LINQ to Objects—Data stored in collections, lists, arrays, strings, files, and so forth

In this lesson you learn how to use LINQ to Objects. You learn how to extract data from lists, collections, and arrays and how to process the results.

LINQ Basics

Using LINQ to process data takes three steps:

1. Create a data source.

2. Build a LINQ query to select data from the data source.

3. Execute the query and process the result.

You might expect the third step to be two separate steps, “Execute the query” and “Process the result.” In practice, however, LINQ doesn't actually execute the query until it must—when the program tries to access the results. This is called deferred execution.

For example, the following code displays the even numbers between 0 and 99:

// Display the even numbers between 0 and 99.

private void Form1_Load(object sender, EventArgs e)

{

// 1. Create the data source.

int[] numbers = new int[100];

for (int i = 0; i < 100; i++) numbers[i] = i;

// 2. Build a query to select data from the data source.

var evenQuery =

from int num in numbers

where (num % 2 == 0)

select num;

// 3. Execute the query and process the result.

foreach (int num in evenQuery) Console.WriteLine(num.ToString());

}

The program starts by creating the data source: an array containing the numbers 0 through 99. In this example the data source is quite simple, but in other programs it could be much more complex. Instead of an array of numbers, it could be a list of Customer objects or an array of Order objects that contain lists of OrderItem objects.

Next the program builds a query to select the even numbers from the list. I explain queries in more detail later, but the following list describes the key pieces of this query:

· var—This is the data type of whatever is returned by the query. In this example the result will be an IEnumerable<int> but in general the results of LINQ queries can have some very strange data types. Rather than trying to figure out what a query will return, most developers use the implicit data type var. The var keyword tells the C# compiler to figure out what the data type is and use that so you don't need to use a specific data type.

· evenQuery—This is the name the code is giving to the query. You can think of it as a variable that represents the result that LINQ will later produce.

· from int num in numbers—This means the query will select data from the numbers array. It will use the int variable num to range over the values in the array. Because num ranges over the values, it is called the query's range variable. (If you omit the int data type, the compiler will implicitly figure out its data type.)

· where (num % 2 == 0)—This is the query's where clause. It determines which items are selected from the array. This example selects the even numbers (where num mod 2 is 0).

· select num—This tells the query what to return. In this case the query returns whatever is in the range variable num for the values that are selected. Often you will want to return the value of the range variable but you could return something else such as 2 * num or a new object created with a constructor that takes num as a parameter.

NOTE

I don't recommend using var for variables in general if you can figure out a more specific data type. When you use var, you can't be sure what data type the compiler will use. That can lead to confusion if the compiler picks different data types for variables that must later work together.

For example, in the following code the third statement is allowed because you can store an int value in a double but the fourth statement is not allowed because a double may not fit in an int:

var x = 1.2; // double.

var y = 1; // int.

x = y; // Allowed.

y = x; // Not allowed.

If you do know the data type, just use that instead of var.

In the final step to performing the query, the code loops through the result produced by LINQ. The code displays each int value in the Console window. It's only when the program tries to iterate over the results of the query that the query is actually executed.

The following sections provide more detailed descriptions of some of the key pieces of a LINQ query: where clauses, order by clauses, and select clauses.

where Clauses

Probably the most common reason to use LINQ is to filter the data with a where clause. The where clause can include normal boolean expressions that use &&, ||, >, and other boolean operators. It can use the range variable and any properties or methods that it provides (if it's an object). It can even perform calculations and invoke functions.

NOTE

The where clause is optional. If you omit it, the query selects all of the items in its range.

For example, the following query is similar to the earlier one that selects even numbers, except this one's where clause uses the IsPrime method to select only prime numbers. (How the IsPrime function works isn't important to this discussion, so it isn't shown here. You can see it in the Find Primes program in this lesson's download.)

var primeQuery =

from int num in numbers

where (IsPrime(num))

select num;

The Find Customers example program shown in Figure 36.1 (and available in this lesson's code download on the website) demonstrates several where clauses.

Screenshot of Find Customers (4/1/2020) window presenting four panels, namely, All Customers, Negative Balance, Overdue, and Owes more than $50, with each panel having its own list.

Figure 36.1

The following code shows the Customer class used by the Find Customers program. It includes some auto-implemented properties and an overridden ToString method that displays the Customer's values:

class Customer

{

public string FirstName { get; set; }

public string LastName { get; set; }

public decimal Balance { get; set; }

public DateTime DueDate { get; set; }

public override string ToString()

{

return FirstName + " " + LastName + "\t" +

Balance.ToString("C") + "\t" + DueDate.ToString("d");

}

}

The following code shows how the Find Customers program displays the same customer data selected with different where clauses:

// Display customers selected in various ways.

private void Form1_Load(object sender, EventArgs e)

{

DateTime today = new DateTime(2020, 4, 1);

//DateTime today = DateTime.Today;

this.Text = "Find Customers (" + today.ToString("d") + ")";

// Make the customers.

Customer[] customers =

{

new Customer() { FirstName="Ann", LastName="Ashler",

Balance = 100, DueDate = new DateTime(2020, 3, 10)},

new Customer() { FirstName="Bob", LastName="Boggart",

Balance = 150, DueDate = new DateTime(2020, 2, 5)},

// … Other Customers omitted …

};

// Display all customers.

allListBox.DataSource = customers;

// Display customers with negative balances.

var negativeQuery =

from Customer cust in customers

where cust.Balance < 0

select cust;

negativeListBox.DataSource = negativeQuery.ToArray();

// Display customers who owe at least $50.

var owes50Query =

from Customer cust in customers

where cust.Balance <= -50

select cust;

owes50listBox.DataSource = owes50Query.ToArray();

// Display customers who owe at least $50

// and are overdue at least 30 days.

var overdueQuery =

from Customer cust in customers

where (cust.Balance <= -50) &&

(DateTime.Now.Subtract(cust.DueDate).TotalDays > 30)

select cust;

overdueListBox.DataSource = overdueQuery.ToArray();

}

The program starts by creating a DateTime named today and setting it equal to April 1, 2020. In a real application you would probably use the current date (commented out), but this program uses that specific date so it works well with the sample data. The program then displays the date in its title bar (so you can compare it to the Customers' due dates) and creates an array of Customer objects.

Next the code sets the allListBox control's DataSource property to the array so that ListBox displays all of the Customer objects. The Customer class's overridden ToString method makes it display each Customer's name, balance, and due date.

The program then executes the following LINQ query:

// Display customers with negative balances.

var negativeQuery =

from Customer cust in customers

where cust.Balance < 0

select cust;

negativeListBox.DataSource = negativeQuery.ToArray();

This query's where clause selects Customers with Balance properties less than 0. The query returns an IEnumerable, but a ListBox's DataSource property requires an IList or IListSource and IEnumerable doesn't satisfy either of those interfaces. To handle that problem, the program calls the result's ToArray method to convert it into an array that the DataSource property can handle.

After displaying this result, the program executes two other LINQ queries and displays their results similarly. The first query selects Customers who owe at least $50. The final query selects Customers who owe at least $50 and who have a DueDate more than 30 days in the past.

Order By Clauses

Often the result of a query is easier to read if you sort the selected values. You can do this by inserting an order by clause between the where clause and the select clause.

The order by clause begins with the keyword orderby followed by one or more values separated by commas that determine how the results are ordered.

Optionally you can follow a value by the keyword ascending (the default) or descending to determine whether the results are ordered in ascending (1-2-3 or A-B-C) or descending (3-2-1 or C-B-A) order.

For example, the following query selects Customers with negative balances and orders them so those with the smallest (most negative) values come first:

var negativeQuery =

from Customer cust in customers

where cust.Balance < 0

orderby cust.Balance ascending

select cust;

The following version orders the results first by balance and then, if two customers have the same balance, by last name:

var negativeQuery =

from Customer cust in customers

where cust.Balance < 0

orderby cust.Balance, cust.LastName

select cust;

Select Clauses

The select clause determines what data is pulled from the data source and stored in the result. All of the previous examples select the data over which they are ranging. For example, the Find Customers example program ranges over an array of Customer objects and selects certain Customer objects.

Instead of selecting the objects in the query's range, a program can select only some properties of those objects, a result calculated from those properties, or even completely new objects. Selecting a new kind of data from the existing data is called transforming orprojecting the data.

The Find Students example program shown in Figure 36.2 (and available in this lesson's code download on the website) uses the following simple Student class:

class Student

{

public string FirstName { get; set; }

public string LastName { get; set; }

public List<int> TestScores { get; set; }

}

Screenshot of FindStudents: Class Average = 76.95 window presenting four panels, namely, All Students and Average, Below Average Students, Passing, and Failing, with each panel having its own list.

Figure 36.2

The program uses the following query to select all of the students' names and test averages ordered by name:

// Select all students and their test averages ordered by name.

var allStudents =

from Student student in students

orderby student.LastName, student.FirstName

select String.Format("{0} {1}\t{2:0.00}",

student.FirstName, student.LastName,

student.TestScores.Average());

allListBox.DataSource = allStudents.ToArray();

This query's select clause does not select the range variable student. Instead it selects a string that holds the student's first and last names and the student's test score average. (Notice how the code calls the TestScore list's Average method to get the average of the test scores.) The result of the query is a List<string> instead of a List<Student>.

The program next uses the following code to list the students who have averages of at least 60, giving them passing grades:

// Select passing students ordered by name.

var passingStudents =

from Student student in students

orderby student.LastName, student.FirstName

where student.TestScores.Average() >= 60

select student.FirstName + " " + student.LastName;

passingListBox.DataSource = passingStudents.ToArray();

This code again selects a string instead of a Customer object. The code that selects failing students is similar, so it isn't shown here.

The program uses the following code to select students with averages below the class average:

// Select all scores and compute a class average.

var allAverages =

from Student student in students

select student.TestScores.Average();

double classAverage = allAverages.Average();

// Display the average.

this.Text = "FindStudents: Class Average = " +

classAverage.ToString("0.00");

// Select students with average below the class average ordered by average.

var belowAverageStudents =

from Student student in students

orderby student.TestScores.Average()

where student.TestScores.Average() < classAverage

select new {Name = student.FirstName + " " + student.LastName,

Average = student.TestScores.Average()};

foreach (var info in belowAverageStudents)

belowAverageListBox.Items.Add(info.Name + "\t" + info.Average);

This snippet starts by selecting all of the students' test score averages. This returns a List<double>. The program calls that list's Average function to get the class average.

Next the code queries the student data again, this time selecting students with averages below the class average.

This query demonstrates a new kind of select clause that creates a list of objects. The new objects have two properties, Name and Average, that are given values by the select clause. The data type of these new objects is created automatically and isn't given an explicit name so this is known as an anonymous type.

After creating the query, the code loops through its results, using each object's Name and Average property to display the below average students in a ListBox. Notice that the code gives the looping variable info the implicit data type var so it doesn't need to figure out what data type it really has.

NOTE

Objects with anonymous data types actually have a true data type, just not one that you want to have to figure out. For example, you can add the following statement inside the previous code's foreach loop to see what data type the objects actually have:

Console.WriteLine(info.GetType().ToString());

If you look in the Output window, you'll see that these objects have the ungainly data type:

<>f__AnonymousType0`2[System.String,System.Double]

Although you can sort of see what's going on here (the object contains a string and a double), you probably wouldn't want to type this mess into your code even if you could. In this case, the var type is a lot easier to read.

LINQ provides plenty of other features that won't fit in this lesson. It lets you:

· Group results to produce lists that contain other lists

· Take only a certain number of results or take results while a certain condition is true

· Skip a certain number of results or skip results while a certain condition is true

· Join results selected from multiple data sources

· Use aggregate functions such as Average (which you've already seen), Count, Min, Max, and Sum

Microsoft's “Language-Integrated Query (LINQ)” page at msdn.microsoft.com/library/bb397926.aspx provides a good starting point for learning more about LINQ.

Try It

In Lesson 29's Try It, you built a program that used the DirectoryInfo class's GetFiles method to search for files matching a pattern and containing a target string. For example, the program could search the directory hierarchy starting at C:\C#Projects to find files with the .cs extension and containing the string “DirectoryInfo.”

In this Try It, you modify that program to perform the same search with LINQ. Instead of writing code to loop through the files returned by GetFiles and examining each, you make LINQ examine the files for you.

Lesson Requirements

In this lesson, you:

· Copy the program you built for Lesson 29's Try It (or download Lesson 29's version from the book's website) and modify the code to use LINQ to search for files.

NOTE

You can download the code and resources for this lesson from the website at www.wrox.com/go/csharp24hourtrainer2e.

Hints

· Use the DirectoryInfo object's GetFiles method in the query's from clause.

· In the query's where clause, use the File class's ReadAllText method to get the file's contents. Convert it to lowercase and use Contains to see if the file holds the target string.

Step-by-Step

· Copy the program you built for Lesson 29's Try It (or download Lesson 29's version from the book's website) and modify the code to use LINQ to search for files.

1. Copying the program is reasonably straightforward.

2. To use LINQ to search for files, modify the Search button's Click event handler so it looks like the following. The lines in bold show the modified code:

3. // Search for files matching the pattern

4. // and containing the target string.

5. private void searchButton_Click(object sender, EventArgs e)

6. {

7. // Get the file pattern and target string.

8. string pattern = patternComboBox.Text;

9. string target = targetTextBox.Text.ToLower();

10. // Search for the files.

11. DirectoryInfo dirinfo =

12. new DirectoryInfo(directoryTextBox.Text);

13. var fileQuery =

14. from FileInfo fileinfo

15. in dirinfo.GetFiles(pattern,

16. SearchOption.AllDirectories)

17. where

18. File.ReadAllText(fileinfo.FullName).ToLower().Contains(target)

19. select fileinfo.FullName;

20. // Display the result.

21. fileListBox.DataSource = fileQuery.ToArray();

}

If you compare this code to the version used by the Try It in Lesson 29, you'll see that this version is much shorter.

Exercises

1. Build a program that lists the names of the files in a directory together with their sizes, ordered with the biggest files first.

2. Copy the program you built for Exercise 1 and modify it so it searches for files in the directory hierarchy starting at the specified directory.

3. Make a program that lists the perfect squares between 0 and 999. (Hint: Use the Enumerable class's Range method to initialize the source data.)

For Exercises 4 through 8 download the Customer Orders program. This program defines the following classes:

class Person

{

public string Name { get; set; }

}

class OrderItem

{

public string Description { get; set; }

public int Quantity { get; set; }

public decimal UnitPrice { get; set; }

}

class Order

{

public int OrderId { get; set; }

public Person Customer { get; set; }

public List<OrderItem> OrderItems { get; set; }

}

The program's Form_Load event handler creates an array of Order objects. The program's buttons, which are shown in Figure 36.3, let the user display the data in various ways although initially they don't contain any code. In Exercises 4 through 8, you add that code to give the program its features.

Screenshot of Customer Orders window displaying buttons, All Orders,Order By Cost, Customer, and Greater Than. Customer button has a box with drop-down arrow and Greater Than button with field box.

Figure 36.3

4. The Customer Orders program creates several Order objects, but it doesn't fill in those objects' TotalCost properties. Use LINQ to do that. (Hints: Use a foreach loop to loop through the objects. For each object, use a LINQ query to go through the order's OrderItemslist and select each OrderItem's UnitPrice times its Quantity. After you define the query, call its Sum function to get the total cost for the order.)

5. Copy the program you built for Exercise 4 and add code behind the All Orders button. That code should use a LINQ query to select the orders' ID, customer name, and total costs. Display the results in the resultListBox by setting that control's DataSource property to the query.

6. Copy the program you built for Exercise 5 and add code behind the Order By Cost button. That code should use a query similar to the one used by Exercise 5, but it should order the results by cost so the orders with the largest costs are listed first.

7. Copy the program you built for Exercise 6 and add code behind the Customer button. That code should use a LINQ query to list orders placed by the customer selected in the ComboBox. (If no name is selected, don't do anything.)

8. Copy the program you built for Exercise 7 and add code behind the Greater Than button. That code should use a LINQ query to list orders with total costs greater than the value entered in the TextBox.

NOTE

Please select the videos for Lesson 36 online at www.wrox.com/go/csharp24hourtrainer2evideos.