Working with Language Integrated Query (LINQ) - MCSD Certification Toolkit (Exam 70-483): Programming in C# (2013)

MCSD Certification Toolkit (Exam 70-483): Programming in C# (2013)

Chapter 10

Working with Language Integrated Query (LINQ)

What You Will Learn in this Chapter

· Understanding query expressions

· Understanding method-based LINQ queries

· Utilizing LINQ to XML

WROX.COM CODE DOWNLOADS FOR THIS CHAPTER

You can find the code downloads for this chapter at www.wrox.com/remtitle.cgi?isbn=1118612094 on the Download Code tab. The code is in the chapter10 download and individually named according to the names throughout the chapter.

Language Integrated Query (LINQ) is a language feature in the .NET Framework that enables you to use common syntax to query data in a collection, an XML document, a database, or any type that supports the IEnumerable<T> or IQueryable<T> interface. Prior to LINQ, a developer needed to learn different syntax depending on the source of the data. If the source were a database, you needed to learn SQL. If the source were an XML document, you needed to learn XQuery. If the source were an array or a collection, you would write a looping structure, such as a foreach loop, that would enumerate through the items in the collection and filter them appropriately. LINQ enables you to use common syntax regardless of what the source is.

In this chapter, you learn two different styles of syntax for LINQ. Query expressions are the first style, and method-based queries are the second. They are functionally equivalent, which can sometimes be confusing when you first learn LINQ because you can write the code two different ways and it does the exact same thing. The last section discusses LINQ to XML, which enables you to create XML documents without having to write all the tags that normally would be required when working with XML.

Table 10-1 introduces you to the exam objectives covered in this chapter.

Table 10-1: 70-483 Exam Objectives Covered in This Chapter

Objective

Content Covered

Query and manipulate data and objects using LINQ

Writing query expressions and method-based queries using LINQ. Topics covered include projection, joining and grouping collections, the take and skip methods, and aggregate methods. You will also learn how to create, and modify data structures by using LINQ to XML.

Understanding Query Expressions

As you saw in the introduction to this chapter, Language Integrated Query is a language feature in the .NET Framework that enables you to use common query syntax to query data in a collection, an XML document, a database, or any type that supports the IEnumerable<T> or IQueryable<T>interface. There are two forms of syntax that perform LINQ queries. The first is a query expression, which is discussed in this section. The second are method-based queries, which are discussed in the next section. Functionally, they do the exact same thing; the only difference is the syntax. You must decide which syntax you prefer, but the compiler does not care.

ADVICE FROM THE EXPERTS: Query Expressions versus Method- Based Queries

The compiler converts query expressions to method-based expressions when your assembly is compiled. Query expressions are sometimes easier to read, so you may choose to standardize on that syntax, but be aware that you cannot perform all the operations using query expressions that you can using method-based queries such as a Count or Sum.

Query expressions search through data stored in an array, a collection, or any type that supports the IEnumerable<T> or IQueryable<T> interfaces. The syntax for a query expression is similar to the syntax when working with SQL. Before getting into the details about query expressions, you first take a look at the code needed to search data in objects prior to LINQ. For example, suppose you have an array with the numbers 1 through 10. If you want to retrieve all the even numbers from the array, your code would look something like the following:

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

int[] evenNumbers = new int[5];

int evenIndex = 0;

foreach (int i in myArray)

{

if (i % 2 == 0)

{

evenNumbers[evenIndex] = i;

evenIndex++;

}

}

foreach (int i in evenNumbers)

{

Debug.WriteLine(i);

}

LINQ query expressions enable you to perform “queries” against the array with syntax similar to SQL except it is C# syntax and the order of the elements is different. The benefit of a query expression is less coding and more readability. The following code performs a LINQ query against the array to return only even numbers:

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var evenNumbers = from i in myArray

where i % 2 == 0

select i;

foreach (int i in evenNumbers)

{

Debug.WriteLine(i);

}

A few things need explaining here. First, the evenNumber variable is declared as a var. A variable defined as var is called an implicitly typed variable. This means that the compiler determines the type of variable based on the expression on the right side of the initialization statement. In this sample the compiler knows that the items in the evenNumbers variable are int. It is common practice to declare the results of a LINQ query expression as a var variable.

The second thing to notice is the from clause syntax. In this example, the from clause contains i in myArray. The compiler implicitly knows what type i is based on the type of myArray. This is functionally equivalent to the foreach(int i in myArray) statement. The variable i represents an item in the array as it is enumerated.

The third thing to notice is that the where clause is second and the select clause is third. This is the opposite of SQL syntax. When writing LINQ queries, even if you query a database, the where clause precedes the select clause. The fourth thing is that the where clause contains C# syntax for filtering the data, which means you use the equivalence operator (==) instead of equals (=). This essentially tells the compiler to evaluate each element in the array and return the items that meet this condition.

EXAM TIPS AND TRICKS: Query Expression Syntax

You may see a question on the exam that features the different clauses in a query expression, and you will be asked to put the clauses in the correct order.

Finally, notice that the select clause returns the variable i. This actually means as the code enumerates through the array it should return all elements that meets the where condition. If you were to step through this code, you would notice something different from what you might expect. When you start stepping through the code and get to the foreach statement, you can notice that execution keeps transferring from the foreach statement back to the where clause in the LINQ query. This is because the LINQ query isn’t actually executed until the evenNumbers variable is used. This is calleddeferred execution. If you were to add a watch statement to the evenNumbers variable, you would notice that it doesn’t actually store the results. It executes any time the elements are enumerated. If the source of the data changed and you enumerated the elements again, it would pick up the changes. For example, examine the following code:

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var evenNumbers = from i in myArray

where i % 2 == 0

select i;

foreach (int i in evenNumbers)

{

Debug.WriteLine(i);

}

myArray[1] = 12;

foreach (int i in evenNumbers)

{

Debug.WriteLine(i);

}

In the preceding example, the second element in the array, which contains the number 2, is replaced with the number 12 after the first foreach loop. When the evenNumbers variable is enumerated the second time, the number 12 is written to the Output window along with the other even numbers.

Filtering

Filtering data is done by using the where clause in a query expression. Because you are writing in C#, you must use the and (&&) and or (||) operators when making complex statements. In the previous examples, the where clause contained the boolean expression i % 2 == 0. This is referred to as a predicate. The predicate is the comparison statement that is executed against each element in the sequence. The following example returns all event numbers that are greater than 5:

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var evenNumbers = from i in myArray

where i % 2 == 0 && i > 5

select i;

foreach (int i in evenNumbers)

{

Debug.WriteLine(i);

}

You can have multiple where clauses in your query expression. This is the same as having multiple expressions in the where clause using the && operator. The following code produces the same result as the preceding code:

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var evenNumbers = from i in myArray

where i % 2 == 0

where i > 5

select i;

foreach (int i in evenNumbers)

{

Debug.WriteLine(i);

}

EXAM TIPS AND TRICKS: Multiple where Clauses

You typically would not use multiple where clauses in your code. Instead, you would just separate your clauses by the && operator. However, for the test you may see a question using this syntax, so you need to be aware that multiple where clauses is the equivalent of using the && operator.

If you had a complex filter condition that needed precedence operators, you would use parentheses, (), just as you would in a regular if statement. But be aware that if your query expression contains two where clauses, each is executed separately and are considered and expressions.

Also be aware that you can call a function in your statement to make your code more readable. The following code sample calls a method called IsEvenAndGT5 and passes in the current element while enumerating through the array:

static void RetrieveEvenNumberGT5V3()

{

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var evenNumbers = from i in myArray

where IsEvenAndGT5(i)

select i;

foreach (int i in evenNumbers)

{

Debug.WriteLine(i);

}

}

static bool IsEvenAndGT5(int i)

{

return (i % 2 == 0 && i > 5);

}

The last point to be aware of regarding the where clause is that it can appear anywhere in your query expression as long as it is not the first or last clause.

Ordering

You can sort the results of your query by using the orderby clause in your query. You can order ascending or descending just as you would in a SQL statement. The following code sorts the even elements in descending order:

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var evenNumbers = from i in myArray

where i % 2 == 0

orderby i descending

select i;

foreach (int i in evenNumbers)

{

Debug.WriteLine(i);

}

You can also order by more than one property by separating your conditions with a comma. The following example uses a class that contains a City and State property. The query returns the elements sorted first by state and then by city alphabetically.

class Hometown

{

public string City { get; set; }

public string State { get; set; }

}

static void OrderByStateThenCity()

{

List<Hometown> hometowns = new List<Hometown>()

{

new Hometown() { City = "Philadelphia", State = "PA" },

new Hometown() { City = "Ewing", State = "NJ" },

new Hometown() { City = "Havertown", State = "PA" },

new Hometown() { City = "Fort Washington", State = "PA" },

new Hometown() { City = "Trenton", State = "NJ" }

};

var orderedHometowns = from h in hometowns

orderby h.State ascending, h.City ascending

select h;

foreach (Hometown hometown in orderedHometowns)

{

Debug.WriteLine(hometown.City + ", " + hometown.State);

}

}

The preceding code produces the following results:

Ewing, NJ

Trenton, NJ

Fort Washington, PA

Havertown, PA

Philadelphia, PA

The default order in an orderby clause is ascending, and you can omit this keyword.

Projection

The select clause can return the object in the sequence or return a limited number of properties from the object in the sequence. Selecting a limited number of properties or transforming the result into a different type is referred to as projection. For example, assume you have a Person call declared with the following properties:

class Person

{

public string FirstName { get; set; }

public string LastName { get; set; }

public string Address1 { get; set; }

public string City { get; set; }

public string State { get; set; }

public string Zip { get; set; }

}

Now suppose you need to write a query that only returns the LastName of each Person in a List of Person objects:

List<Person> people = new List<Person>()

{

new Person()

{

FirstName = "John",

LastName = "Smith",

Address1 = "First St",

City = "Havertown",

State = "PA",

Zip = "19084"

},

new Person()

{

FirstName = "Jane",

LastName = "Doe",

Address1 = "Second St",

City = "Ewing",

State = "NJ",

Zip = "08560"

},

new Person()

{

FirstName = "Jack",

LastName = "Jones",

Address1 = "Third St",

City = "Ft Washington",

State = "PA",

Zip = "19034"

}

};

var lastNames = from p in people

select p.LastName;

foreach (string lastName in lastNames)

{

Debug.WriteLine(lastName);

}

The select clause selects p.LastName instead of the entire p object. The compiler determines that the result should be a list of strings, based on the type of the property selected. This is one example of projection. You selected a single property and returned a list of strings from the query.

Now suppose that you needed to return the FirstName and LastName properties. The following query creates an anonymous type that contains just a FirstName and LastName property. An anonymous type is an object with read-only properties that is not explicitly declared.

var names = from p in people

select new { p.FirstName, p.LastName };

foreach (var name in names)

{

Debug.WriteLine(name.FirstName + ", " + name.LastName);

}

You can also explicitly name the properties of an anonymous type using the following syntax:

var names = from p in people

select new { First = p.FirstName, Last = p.LastName };

foreach (var name in names)

{

Debug.WriteLine(name.First + ", " + name.Last);

}

In the preceding example, the properties of the anonymous type are named First and Last. Visual Studio’s IntelliSense can recognize these properties, which appear in the drop-down lists when you use the anonymous type.

Joining

You can use the join clause to combine two or more sequences of objects similar to how you join tables in a SQL statement. The following sample joins two separate lists on a common property called StateId:

class Employee

{

public string FirstName { get; set; }

public string LastName { get; set; }

public int StateId { get; set; }

}

class State

{

public int StateId { get; set; }

public string StateName { get; set; }

}

static void Join()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

StateId = 1

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

StateId = 2

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones",

StateId = 1

}

};

List<State> states = new List<State>()

{

new State()

{

StateId = 1,

StateName = "PA"

},

new State()

{

StateId = 2,

StateName = "NJ"

}

};

var employeeByState = from e in employees

join s in states

on e.StateId equals s.StateId

select new { e.LastName, s.StateName };

foreach (var employee in employeeByState)

{

Debug.WriteLine(employee.LastName + ", " + employee.StateName);

}

}

The join clause uses the equals keyword instead of =. This is because you can join fields only based on equivalence unlike SQL where you can use > or < signs. Using the keyword equals is supposed to make it clearer that the operation is equivalence.

EXAM TIPS AND TRICKS: Equals versus “=”

You may see a question on the exam that displays four different join clauses, and you will be asked which is correct. Be sure to remember that a join clause must use the equals keyword and not the = operator.

Outer Join

Now suppose you need to perform an outer join. An outer join selects all elements from one sequence even if there is not a matching element in the second sequence. In SQL this is referred to as a RIGHT OUTER JOIN or a LEFT OUTER JOIN. In SQL, if you want all the rows from the table on the right side of the JOIN clause, you use a RIGHT OUTER JOIN; if you want all the rows from the table on the left side of the join, you use LEFT OUTER JOIN. This scenario happens often when writing database queries. For example, if you have a table that contains a foreign key that is nullable and you want to join to the table with the primary key, you would use an OUTER JOIN clause to ensure you select all records, even if the column is NULL.

To accomplish this same functionality in a query expression, you need to use the group join keyword and the DefaultIfEmpty method. A group join enables you to combine two sequences into a third object. For example, suppose you added another Employee object to the employees list in the previous example, but the StateId for your new object does not exist in the states list.

new Employee()

{

FirstName = "Sue",

LastName = "Smith",

StateId = 3

}

The following query selects all the elements from the employees list even if there is not a match in the states List:

var employeeByState = from e in employees

join s in states

on e.StateId equals s.StateId into employeeGroup

from item in employeeGroup.DefaultIfEmpty( new State

{StateId = 0,

StateName = ""})

select new { e.LastName, item.StateName };

foreach (var employee in employeeByState)

{

Debug.WriteLine(employee.LastName + ", " + employee.StateName);

}

The joined lists are combined into an object called employeeGroup. This is done by using the into keyword. There is also a second from clause that can create a new instance of a State object with a StateId of 0 and a StateName of "" when there is not a match found on StateId. If you use aninto clause, you can no longer reference the variable declared on the right side of the on statement in your select clause. You instead use the variable that was used to enumerate the values of the new sequence, in this example that is the item variable. The select clause needs to be changed to use the item variable rather than the s variable to get the state name.

COMMON MISTAKES: Left Joins Only

For query expressions, you can perform only left joins, so the order of the sequences is important in your from clause.

Composite Keys

There may be instances where you need to perform your join on a composite key. A composite key contains multiple properties that you need for the purpose of a join. To accomplish this, you create two anonymous types with the same properties and compare the anonymous types. For example, change the Hometown class to have a CityCode property, and change the Employee class to contain the City and State and remove the StateId:

class Hometown

{

public string City { get; set; }

public string State { get; set; }

public string CityCode { get; set; }

}

class Employee

{

public string FirstName { get; set; }

public string LastName { get; set; }

public string City { get; set; }

public string State { get; set; }

}

The following query joins a List of Hometown objects and Employee objects using their City and State properties:

static void CompositeKey()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

City = "Havertown",

State = "PA"

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

City = "Ewing",

State = "NJ"

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones",

City = "Fort Washington",

State = "PA"

}

};

List<Hometown> hometowns = new List<Hometown>()

{

new Hometown()

{

City = "Havertown",

State = "PA",

CityCode = "1234"

},

new Hometown()

{

City = "Ewing",

State = "NJ",

CityCode = "5678"

},

new Hometown()

{

City = "Fort Washington",

State = "PA",

CityCode = "9012"

}

};

var employeeByState = from e in employees

join h in hometowns

on new { City = e.City, State = e.State } equals

new { City = h.City, State = h.State }

select new { e.LastName, h.CityCode };

foreach (var employee in employeeByState)

{

Debug.WriteLine(employee.LastName + ", " + employee.CityCode);

}

}

The join creates two anonymous types with the same properties. The equivalence is determined by matching all properties of the anonymous type.

Grouping

Often, you need to group items to determine the count of elements or the sum of a particular property when working with a sequence of objects. For example, you may need to produce a report that displays the count of employees by state. You can use the group clause in a query expression to group by a particular property to accomplish this requirement. For example, the following code creates a List of Employee objects and then executes a query to group them by State. The count of the employees by state can then be written to the Output window.

static void Group()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

City = "Havertown",

State = "PA"

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

City = "Ewing",

State = "NJ"

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones",

City = "Fort Washington",

State = "PA"

}

};

var employeesByState = from e in employees

group e by e.State;

foreach (var employeeGroup in employeesByState)

{

Debug.WriteLine(employeeGroup.Key + ": " + employeeGroup.Count());

foreach (var employee in employeeGroup)

{

Debug.WriteLine(employee.LastName + ", " + employee.State);

}

}

}

In this sample there isn’t a select clause. This is because a group clause returns an IGrouping<TKey, TElement> collection. This object is a collection that contains a property for the key that the sequence is grouped by. There are two foreach loops in this sample. The first enumerates through theIGrouping collection and writes the Key property and the Count of elements for the State to the Output window. The inner foreach loop writes the elements that make up the group to the Output window. The output for the previous sample is as follows:

PA: 2

Smith, PA

Jones, PA

NJ: 1

Doe, NJ

You can add logic in your group by clause to group by anything. The following example groups even and odd number and then prints the count and sum of each group to the Output window:

static void GroupV2()

{

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var groupedNumbers = from i in myArray

group i by (i % 2 == 0 ? "Even" : "Odd");

foreach (var groupNumber in groupedNumbers)

{

Debug.WriteLine(groupNumber.Key + ": " + groupNumber.Sum());

foreach(var number in groupNumber)

{

Debug.WriteLine(number);

}

}

}

In the preceding example, the group by clause contains a conditional statement that returns the string "Even" or "Odd" and groups the number appropriately. The preceding code produces the following result:

Odd: 25

1

3

5

7

9

Even: 30

2

4

6

8

10

You can use a select clause when grouping sequences, but you must include an into clause in your group clause. Suppose in the last example you wanted to select the key and the sum of even or odd numbers in the query. The following code can accomplish this:

static void GroupV3()

{

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var groupedNumbers = from i in myArray

group i by (i % 2 == 0 ? "Even" : "Odd") into g

select new { Key = g.Key, SumOfNumbers = g.Sum() };

foreach (var groupNumber in groupedNumbers)

{

Debug.WriteLine(groupNumber.Key + ": " + groupNumber.SumOfNumbers);

}

}

The variable g is of type IGrouping<TKey, TElement> and you can use that in your select clause. The select clause creates an anonymous type with two properties: Key and SumOfNumbers. The preceding code produces the following output:

Odd: 25

Even: 30

Understanding Method-Based LINQ Queries

The previous section discusses query expressions, which is the syntax used to perform LINQ queries using a shorthand query syntax. You can also perform the same queries using method-based LINQ queries. They are functionally equivalent; the only difference is the syntax.

Method-based queries are actually extension methods found in the System.Linq namespace. These methods extend any variable that implements the IEnumerable<T> or IQueryable<T> interface. Method-based queries take a lambda expression as a parameter, which represents the logic to be performed while enumerating through the sequence. Recall the first example for the query expression syntax:

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var evenNumbers = from i in myArray

where i % 2 == 0

select i;

This code selects all the even numbers in an array. The equivalent method-based query follows:

int[] myArray = new int[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var evenNumbers = myArray.Where(i => i % 2 == 0);

The variable i represents the element in the array, and the code on the right of the goes to operator (=>) represents the logic to be performed while enumerating through the array. These two code examples produce the exact same results; again, you must decide which syntax you are more comfortable with.

Filtering

Filtering is done by using the Where method as you’ve seen in the previous examples. You pass a lambda expression to the Where method that returns a boolean value to return only the elements that meet the true condition.

myArray.Where(i => i % 2 == 0)

You still use the and (&&) and or (||) operators for complex conditions because the lambda expression is C# syntax.

var evenNumbers = myArray.Where(i => i % 2 == 0 && i > 5);

If you need precedence operators, you can chain the Where clauses. This is equivalent to having multiple where clauses in a query expression.

var evenNumbers = myArray.Where(i => i % 2 == 0).Where(i => i > 5);

And you can also call a function that returns a boolean.

var evenNumbers = myArray.Where(i => IsEvenAndGT5(i));

Ordering

You can order the elements in a sequence by using the OrderBy or the OrderByDescending methods. The following code orders all the even numbers in an array descending:

var evenNumbers = myArray.Where(i => i % 2 == 0).OrderByDescending(i => i);

If you need to order by more than one field, you chain the methods using a ThenBy or ThenByDescending method. The following example orders by State and then by City:

static void MethodBasedOrderByStateThenCity()

{

List<Hometown> hometowns = new List<Hometown>()

{

new Hometown() { City = "Philadelphia", State = "PA" },

new Hometown() { City = "Ewing", State = "NJ" },

new Hometown() { City = "Havertown", State = "PA" },

new Hometown() { City = "Fort Washington", State = "PA" },

new Hometown() { City = "Trenton", State = "NJ" }

};

var orderedHometowns = hometowns.OrderBy(h => h.State).ThenBy(h => h.City);

foreach (Hometown hometown in orderedHometowns)

{

Debug.WriteLine(hometown.City + ", " + hometown.State);

}

Projection

You can project the result by using the Select method. The following code selects only the LastName property of a list of Person objects:

static void MethodBasedProjectionV1()

{

List<Person> people = new List<Person>()

{

new Person()

{

FirstName = "John",

LastName = "Smith",

Address1 = "First St",

City = "Havertown",

State = "PA",

Zip = "19084"

},

new Person()

{

FirstName = "Jane",

LastName = "Doe",

Address1 = "Second St",

City = "Ewing",

State = "NJ",

Zip = "08560"

},

new Person()

{

FirstName = "Jack",

LastName = "Jones",

Address1 = "Third St",

City = "Ft Washington",

State = "PA",

Zip = "19034"

}

};

var lastNames = people.Select(p => p.LastName);

foreach (string lastName in lastNames)

{

Debug.WriteLine(lastName);

}

}

You can create an anonymous type similar to how you do it with a query expression. The only difference is you use a lambda expression. The following creates an anonymous type with just the FirstName and LastName properties:

static void MethodBasedProjectionV2()

{

List<Person> people = new List<Person>()

{

new Person()

{

FirstName = "John",

LastName = "Smith",

Address1 = "First St",

City = "Havertown",

State = "PA",

Zip = "19084"

},

new Person()

{

FirstName = "Jane",

LastName = "Doe",

Address1 = "Second St",

City = "Ewing",

State = "NJ",

Zip = "08560"

},

new Person()

{

FirstName = "Jack",

LastName = "Jones",

Address1 = "Third St",

City = "Ft Washington",

State = "PA",

Zip = "19034"

}

};

var names = people.Select(p => new { p.FirstName, p.LastName });

foreach (var name in names)

{

Debug.WriteLine(name.FirstName + ", " + name.LastName);

}

}

You can also explicitly name the anonymous type properties by using the following syntax:

var names = people.Select(p => new { First = p.FirstName, Last = p.LastName });

The preceding sample created an anonymous type with a First and Last property rather than FirstName and LastName.

There is also a SelectMany method that you can use to flatten two sequences into one sequence similar to how a join works. The following flattens a list of Employees and a list of States and returns the combination of the two:

static void MethodBasedProjectionV4()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

StateId = 1

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

StateId = 2

},

new Employee()

{

FirstName = "John",

LastName = "Smith",

StateId = 1

}

};

List<State> states = new List<State>()

{

new State()

{

StateId = 1,

StateName = "PA"

},

new State()

{

StateId = 2,

StateName = "NJ"

}

};

var employeeByState = employees.SelectMany(e => states.Where(s =>

e.StateId == s.StateId).Select(s => new { e.LastName, s.StateName }));

foreach (var employee in employeeByState)

{

Debug.WriteLine(employee.LastName + ", " + employee.StateName);

}

}

Joining

The Join method enables you to join two sequences together using a common property or set of properties. The following code joins a List of Employee and State objects using the StateId property:

static void MethodBasedJoin()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

StateId = 1

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

StateId = 2

},

new Employee()

{

FirstName = "John",

LastName = "Smith",

StateId = 1

}

};

List<State> states = new List<State>()

{

new State()

{

StateId = 1,

StateName = "PA"

},

new State()

{

StateId = 2,

StateName = "NJ"

}

};

var employeeByState = employees.Join(states,

e => e.StateId,

s => s.StateId,

(e, s) => new { e.LastName, s.StateName });

foreach (var employee in employeeByState)

{

Debug.WriteLine(employee.LastName + ", " + employee.StateName);

}

}

The employees list is considered the outer sequence. The first parameter to the Join method is the sequence you want to join to: states. This is referred to as the inner sequence. The second parameter is the key property of the of the outer sequence. The third parameter is the key property of the inner sequence. By default an equivalence comparison will be used to join the two sequences. The fourth parameter is a lambda expression that creates the anonymous type for the result. In this sample you create a new type with LastName and StateName properties. When joining two sequences it can be more readable to use a query expression rather than the Join method.

Outer Join

An outer join is created by using the GroupJoin method. The following sample performs a left join using a List of Employee and State objects. If no matching state is found in the State list, the StateName will be blank.

static void MethodBasedOuterJoin()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

StateId = 1

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

StateId = 2

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones",

StateId = 1

},

new Employee()

{

FirstName = "Sue",

LastName = "Smith",

StateId = 3

}

};

List<State> states = new List<State>()

{

new State()

{

StateId = 1,

StateName = "PA"

},

new State()

{

StateId = 2,

StateName = "NJ"

}

};

var employeeByState = employees.GroupJoin(states,

e => e.StateId,

s => s.StateId,

(e, employeeGroup) => employeeGroup.Select(s => new

{

LastName = e.LastName, StateName = s.StateName

}).DefaultIfEmpty(new

{

LastName = e.LastName,StateName = ""

})).SelectMany(employeeGroup => employeeGroup);

foreach (var employee in employeeByState)

{

Debug.WriteLine(employee.LastName + ", " + employee.StateName);

}

}

The employees list is the outer sequence and the states list is the inner sequence. The first parameter to the GroupJoin method is the inner sequence. The second parameter is the Key property for the outer sequence, and the third parameter is the Key property for the inner sequence. The fourth parameter is where it is tricky. Recall in the query expression section that when creating an outer join, you needed to include the into keyword to create a variable that would contain the results of the join:

var employeeByState = from e in employees

join s in states

on e.StateId equals s.StateId into employeeGroup

from item in employeeGroup.DefaultIfEmpty( new State

{StateId = 0,

StateName = ""})

select new { e.LastName, item.StateName };

When using the GroupJoin method, you simply name the variable when creating the lambda expression in the fourth parameter:

var employeeByState = employees.GroupJoin(states,

e => e.StateId,

s => s.StateId,

(e, employeeGroup) => employeeGroup.Select(s => new

{

LastName = e.LastName, StateName = s.StateName

}).DefaultIfEmpty(new

{

LastName = e.LastName,StateName = ""

})).SelectMany(e => e);

You can then use the Select method to enumerate through the values in the employeeGroup object and use the DefaultIfEmpty method when no match is found between the two sequences. Finally, you need to call the SelectMany method to return the sequence of objects. This can be quite confusing when dealing with complex structures so the query expression syntax might be more to your liking than the method-based syntax. The preceding code produces the following results:

Smith, PA

Doe, NJ

Jones, PA

Smith,

Composite Keys

You can use a composite key by creating anonymous types when defining your keys in the Join parameters. The following code joins on two fields to match the data between two sequences:

static void MethodBasedCompositeKey()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

City = "Havertown",

State = "PA"

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

City = "Ewing",

State = "NJ"

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones",

City = "Fort Washington",

State = "PA"

}

};

List<Hometown> hometowns = new List<Hometown>()

{

new Hometown()

{

City = "Havertown",

State = "PA",

CityCode = "1234"

},

new Hometown()

{

City = "Ewing",

State = "NJ",

CityCode = "5678"

},

new Hometown()

{

City = "Fort Washington",

State = "PA",

CityCode = "9012"

}

};

var employeeByState = employees.Join(hometowns,

e => new { City = e.City, State = e.State },

h => new { City = h.City, State = h.State },

(e, h) => new { e.LastName, h.CityCode });

foreach (var employee in employeeByState)

{

Debug.WriteLine(employee.LastName + ", " + employee.CityCode);

}

}

The second and third parameters create an anonymous type with two properties. LINQ compares all properties of the two types when doing its equivalence test.

Grouping

The GroupBy method can be used to group by one or more fields. This is equivalent to using the group keyword when creating a query expression. The following code groups a List of Employee objects by State:

static void MethodBasedGroupV1()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

City = "Havertown",

State = "PA"

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

City = "Ewing",

State = "NJ"

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones",

City = "Fort Washington",

State = "PA"

}

};

var employeesByState = employees.GroupBy(e => e.State);

foreach (var employeeGroup in employeesByState)

{

Debug.WriteLine(employeeGroup.Key + ": " + employeeGroup.Count());

foreach (var employee in employeeGroup)

{

Debug.WriteLine(employee.LastName + ", " + employee.State);

}

}

}

The GroupBy method returns an IGrouping<TKey, TElement> collection, which can be enumerated to perform aggregate functions on the elements in the group or enumerate through the elements in each group.

If you need to group by more than one field, you would create an anonymous type as a parameter to the GroupBy method. For example, the following code will group by the City then the State properties:

var employeesByState = employees.GroupBy(e => new { e.City, e.State });

The following results are printed to the Output window:

{ City = Havertown, State = PA }: 1

Smith, PA

{ City = Ewing, State = NJ }: 1

Doe, NJ

{ City = Fort Washington, State = PA }: 1

Jones, PA

The { City = Havertown, State = PA } is printed to the Output window because that is now the key. The key was created as an anonymous type with two properties.

Aggregate Functions

Aggregate functions enable you to quickly compute the average, sum, count, max, or min on a sequence. For example, if you had a list of items that represent line items on an invoice you could quickly compute the total for the invoice by using the Sum method. These functions are only available as method-based queries but can be used in a query expression. The following code samples show the query expression syntax and the equivalent method-based syntax for the aggregate functions.

count

Query expression:

int count = (from i in myArray

where i % 2 == 0

select i).Count();

Method-based query:

int count = myArray.Where(i => i % 2 == 0).Count();

Alternatively, you could write the query expression as follows if you want to defer the execution of the query:

var evenNumbers = from i in myArray

where i % 2 == 0

select i;

int count = evenNumbers.Count();

If you were to step through the code for the query expression, you would notice that when you execute the int count = evenNumbers.Count() statement that execution jumps to the where i % 2 == 0 statement five times before setting the value of the count variable. This way of coding the query could be useful if you want to return the count of items and also enumerate through the query results in a later statement. Be aware that the query is executed every time you execute the aggregate function or enumerate through the result.

average

Query expression:

double average = (from i in myArray

where i % 2 == 0

select i).Average();

Method-based query:

double average = myArray.Where(i => i % 2 == 0).Average();

sum

Query expression:

int sum = (from i in myArray

where i % 2 == 0

select i).Sum();

Method-based query:

int sum = myArray.Where(i => i % 2 == 0).Sum();

min

Query expression:

int min = (from i in myArray

where i % 2 == 0

select i).Min();

Method-based query:

int min = myArray.Where(i => i % 2 == 0).Min();

max

Query expression:

int max = (from i in myArray

where i % 2 == 0

select i).Max();

Method-based query:

int max = myArray.Where(i => i % 2 == 0).Max();

first and last

There are two other functions that enable you to find the first or last element in your sequence. These are also only available as methods but can be used by a query expression. These functions can be helpful when you want to find the first or last element in a sequence that meets a specific condition, such as the first even number in an array. The syntax for the First and Last method is shown in the following examples.

first

Query expression:

int first = (from i in myArray

where i % 2 == 0

select i).First();

Method-based query:

int first = myArray.Where(i => i % 2 == 0).First();

last

Query expression:

int last = (from i in myArray

where i % 2 == 0

select i).Last();

Method-based query:

int last = myArray.Where(i => i % 2 == 0).Last();

Concatenation

The Concat method enables you to concatenate two sequences into one. This is similar to how a UNION clause works in a SQL statement. The following example combines two Lists of Employee objects and prints the combined sequence to the Output window:

static void Concat()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith"

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe"

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones"

}

};

List<Employee> employees2 = new List<Employee>()

{

new Employee()

{

FirstName = "Bill",

LastName = "Peters"

},

new Employee()

{

FirstName = "Bob",

LastName = "Donalds"

},

new Employee()

{

FirstName = "Chris",

LastName = "Jacobs"

}

};

var combinedEmployees = employees.Concat(employees2);

foreach (var employee in combinedEmployees)

{

Debug.WriteLine(employee.LastName);

}

}

The preceding code prints all six last names to the Output window. Be aware that the type for each list does not need to be the same. You can combine different types by selecting an anonymous type from each sequence that contains the same properties. The following code combines a List ofEmployee and Person objects and creates a new anonymous type that just contains a Name property:

static void ConcatV2()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith"

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe"

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones"

}

};

List<Person> people = new List<Person>()

{

new Person()

{

FirstName = "Bill",

LastName = "Peters"

},

new Person()

{

FirstName = "Bob",

LastName = "Donalds"

},

new Person()

{

FirstName = "Chris",

LastName = "Jacobs"

}

};

var combinedEmployees = employees.Select(e => new { Name =

e.LastName }).Concat(people.Select(p => new { Name = p.LastName }));

foreach (var employee in combinedEmployees)

{

Debug.WriteLine(employee.Name);

}

}

Skip and Take

You can partition sequences by using the Skip or Take methods. The Skip method enables you to pass in a number and returns all elements in the sequence after that number. For example, the following code skips the first element in the Employee List.

var newEmployees = employees.Skip(1);

You can use the Take method to limit the number of elements returned from the sequence. The following code returns only the top two elements from the Employee List object.

var newEmployees = employees.Take(2);

These two methods can be useful when paging through the results of a query and displaying them on a screen one page at a time. If you show 10 elements at a time and want to display the third page, you would use the following syntax.

var newEmployees = employees.Skip(20).Take(10);

Distinct

The Distinct method returns the distinct list of values in the returned sequence. This is useful when you want to remove duplicates from a sequence. The following code returns the distinct list of numbers from an array:

int[] myArray = new int[] { 1, 2, 3, 1, 2, 3, 1, 2, 3 };

var distinctArray = myArray.Distinct();

foreach (int i in distinctArray)

{

Debug.WriteLine(i);

}

The preceding code prints 1, 2, 3 to the Output window.

Code Lab: Distinct with custom classes

The Distinct method behaves differently when the underlying object is a custom class. Consider the following code:

class State : IEquatable<State>

{

public int StateId { get; set; }

public string StateName { get; set; }

public bool Equals(State other)

{

if (Object.ReferenceEquals(this, other))

{

return true;

}

else

{

if (StateId == other.StateId && StateName == StateName)

{

return true;

}

else

{

return false;

}

}

}

public override int GetHashCode()

{

return StateId.GetHashCode() ^ StateName.GetHashCode();

}

}

static void DistinctCodeLab()

{

List<State> states = new List<State>()

{

new State(){ StateId = 1, StateName = "PA"},

new State() { StateId = 2, StateName = "NJ"},

new State() { StateId = 1, StateName = "PA" },

new State() { StateId = 3, StateName = "NY"}

};

var distintStates = states.Distinct();

foreach (State state in distintStates)

{

Debug.WriteLine(state.StateName);

}

}

Code Lab Analysis

The State class must implement the IEquatable<T> interface, because the Distinct method uses the default equality comparer when determining if two objects are equivalent. The IEquatable<T> interface has one method you must override, called Equals. In this example, the StateId andStateName properties are being used to determine equivalence. You must also override the GetHashCode method and return the hash code based on the properties in your object. When you execute the DistinctCodeLab method, only three states are printed to the Output window. If the Stateclass didn’t implement the IEquatable<T> interface, then all for states would be printed to the window, which you wouldn’t expect.

Utilizing LINQ to XML

LINQ to XML enables you to easily convert a sequence into an XML document. Remember, you can use LINQ to query any sequence regardless of the source. As long as the sequence supports the IEnumerable<T> or IQueryable<T> interface, you can use a LINQ query expression to convert the sequence to XML. This can be useful when transferring data between two systems.

The following example converts a List of Employee objects to XML:

static void LINQToXML()

{

List<Employee> employees = new List<Employee>()

{

new Employee()

{

FirstName = "John",

LastName = "Smith",

StateId = 1

},

new Employee()

{

FirstName = "Jane",

LastName = "Doe",

StateId = 2

},

new Employee()

{

FirstName = "Jack",

LastName = "Jones",

StateId = 1

}

};

var xmlEmployees = new XElement("Root", from e in employees

select new XElement("Employee", new XElement("FirstName", e.FirstName),

new XElement("LastName", e.LastName)));

Debug.WriteLine(xmlEmployees);

}

The output of the preceding code follows:

<Root>

<Employee>

<FirstName>John</FirstName>

<LastName>Smith</LastName>

</Employee>

<Employee>

<FirstName>Jane</FirstName>

<LastName>Doe</LastName>

</Employee>

<Employee>

<FirstName>Jack</FirstName>

<LastName>Jones</LastName>

</Employee>

</Root>

The XElement class is found in the System.Xml.Linq namespace. The first parameter to the constructor is the name of the element. The second parameter is a ParamArray, which means you can pass a variable number of arguments to the constructor. In this example, you pass a LINQ query expression that returns the list of employees.

Summary

Language Integrated Query (LINQ) is a feature in the .NET Framework that enables you to query different data sources, such as a collection or a database, with common syntax. It may take a bit of time to understand the different forms of syntax in LINQ, but once learned it will make you a more efficient programmer. The two forms of syntax are query expressions and method-based queries. As stated throughout this chapter, both syntaxes are functionally equivalent. Query expressions can be more readable, but they do not offer all of the capabilities of method-based queries. Method-based queries require you to understand lambda expressions, which sometimes can be harder to read.

LINQ to Objects, LINQ to SQL, and LINQ to XML all refer to the ability to query a data source such as a collection, a database, or an XML document. You can also use LINQ to query data in an ADO.NET Entity Framework model. LINQ replaces the need for you to learn specific SQL or XQuery syntax and allows you to use a handful of keywords to manipulate your data. You are sure to see some questions in the exam regarding LINQ or questions that use LINQ syntax, so be sure to understand all of the keywords described in this chapter.

Chapter Test Questions

The following questions are similar to the types of questions you will find on Exam 70-483. Read each question carefully and select the answer or answers that represent the best solution to the problem. You can find the answers in Appendix A, “Answers to Chapter Test Questions.”

1. Which answer has the correct order of keywords for a LINQ query expression?

a. select, from, where

b. where, from, select

c. from, where, select

d. from, select, where

2. Which where clause can select all integers in the myList object that are even numbers given the following from clause?

from i in myList

a. where myList.Contains(i % 2)

b. where i % 2 = 0

c. where i % 2 == 0

d. where i % 2 equals 0

3. Which line of code executes the LINQ query?

[1] var result = from i in myArray

[2] order by i

[3] select i

[4] foreach(int i in result)

[5] { …}

a. Line 1

b. Line 2

c. Line 3

d. Line 4

4. Which method can you use to find the minimum value in a sequence?

a. (from i in myArray select i).Min()

b. from i in myArray select Min(i)

c. from Min(i) in myArray select i

d. from i in Min(myArray) select i

5. Which methods can you use to find the first item in a sequence?

a. Min

b. First

c. Skip

d. Take

6. Which where clause returns all integers between 10 and 20?

a. where i >= 10 and i <= 20

b. where i >= 10 && i <= 20

c. where i gt 10 and i lt 20

d. where i gt 10 && i lt 20

7. Which clause orders the state and then the city?

a. orderby h.State

orderby h.City

b. orderby h.State thenby h.City

c. orderby h.State, h.City

d. orderby h.State, thenby h.City

8. Which statement selects an anonymous type?

a. select { h.City, h.State }

b. select h

c. select new { h.City, h.State }

d. select h.City, h.State

9. Which on statement joins two sequences on the StateId property?

a. on e.StateId equals s.StateId

b. on e.StateId = s.StateId

c. on e.StateId == s.StateId

d. on e.StateId.Equals(s.StateId)

10. Which two keywords must you use in a join clause to create an outer join?

a. groupby, into

b. into, DefaultIfEmpty

c. new, DefaultIfEmpty

d. into, groupby

11. Which join clause uses a composite key?

a. on new { City = e.City, State = e.State } equals new { City = h.City, State = h.State }

b. on e.City = h.City && e.State = h.State

c. on e.City = h.City and e.State = h.State

d. on e.City equals h.City and e.State equals h.State

12. Which statement groups a sequence by the State property?

a. groupby e.State

b. group e.State

c. group e by e.State

d. groupby e.State in states

13. Which answers return the count of all even numbers?

a. myArray.Where(i => i % 2 == 0).Count()

b. myArray.Count(i => i % 2 == 0)

c. myArray.Count(i =>).Where(i % 2 == 0)

d. myArray.Count(Where(i => i % 2))

Additional Reading and Resources

Here are some additional useful resources to help you understand the topics presented in this chapter:

Microsoft LINQ Official Documentation http://msdn.microsoft.com/en-us/library/vstudio/bb397926.aspx

LINQPad http://www.linqpad.net/

LINQ Wiki http://en.wikipedia.org/wiki/Language_Integrated_Query

LINQ on Code Project http://www.codeproject.com/KB/linq/

Cheat Sheet

This cheat sheet is designed as a way for you to quickly study the key points of this chapter.

Language Integrated Query LINQ

· Any object that implements the IEnumerable<T> or IQueryable<T> interface can be queries using LINQ.

· The results of a LINQ query are normally returned to a variable of type var, which is an implicitly typed variable.

Query expression

· A query expression contains a from clause and can contain a select, groupby, order by, where, or join clause

· Joins are always equivalence based for LINQ queries.

· The execution of a query does not occur until the result is enumerated. You can force execution of the query by using an aggregate function.

· The code in the where clause of a query expression is the predicate.

· Multiple where clauses use the and operator.

· The orderby clause is used in query expressions to sort the results on one or more properties.

· You can create a new type on the fly in the select clause of a query expression with a limited number of properties from the original object. This is referred to as projection.

· You use the keyword equals in a join clause.

· To create an outer join, you include an into clause in your join, and also call the DefaultIfEmpty method to set the properties on the object when no match was found between the two sequences.

· A join clause can contain an anonymous type to create a composite key.

· The group by clause returns an IGrouping<TKey, TElement> collection.

Method-based queries

· Method-based queries and query expressions are interchangeable and produce the same results. The only difference is the syntax.

· Method-based query use lambda expressions as parameters to the methods.

· You can use the SelectMany method to flatten two sequences into one sequence similar to how a join works.

· You can use the GroupJoin method create outer joins when using method-based queries.

· You can concatenate two sequences by using the Concat method.

· You can use the Skip method to skip a specific number of elements in a sequence.

· You can use the Take method to return a limited number of elements from a sequence.

· You can use the Distinct method to return the distinct list of elements from a sequence.

LINQ to XML

· You can use the XElement class in a LINQ to XML query to return the result of a query in XML.

Review of Key Terms

anonymous type A type created with read-only properties without having to write the code to declare the class.

composite keys Contains multiple properties that you need for the purpose of a join.

deferred execution Execution of a LINQ query is deferred until the result is enumerated or by calling a function on the result.

Goes To operator The Goes To operator is the => signs in a lambda expression.

implicitly typed variable A variable that has its type determined by the expression on the right side of the initialization statement. Use the keyword var to declare an implicitly typed variable.

inner sequence When using the method-based Join function, this refers to the sequence passed into the Join method as a parameter.

Language Integrated Query (LINQ) A set of features that extends powerful query capabilities to C#.

method-based query A feature of LINQ that uses extension methods on types that implement the IEnumerable<T> or IQuerable<T> interface to query the data.

outer join Selects all elements from one sequence when joined to another sequence even if there is not a match on the joined property.

outer sequence When using the method-based Join function, this refers to the sequence calling the Join method.

ParamArray A parameter to a method that enables you to pass an unknown number of parameter to the method.

predicate The code executed in a where clause for a query expression.

projection Selecting a subset of properties from a type that creates a new anonymous type.

query expression A feature of LINQ that enables you to query any type that implements the IEnumerable<T> or IQueryable<T> interface by using syntax that is easy to comprehend.

EXAM TIPS AND TRICKS

The Review of Key Terms and the Cheat Sheet for this chapter can be printed to help you study. You can find these files in the ZIP file for this chapter at www.wrox.com/remtitle.cgi?isbn=1118612094 on the Download Code tab.