Sams Teach Yourself C# 5.0 in 24 Hours (2013)
Part II: Programming in C#
Hour 13. Understanding Query Expressions
What You’ll Learn in This Hour
• Introducing LINQ
• Using LINQ to manipulate data
• Standard query operator methods
• Deferred execution
In Hour 10, “Working with Arrays and Collections,” you learned how applications could work with data stored in collections. Applications also need to work with data stored in other data sources, such as SQL databases or XML files, or even accessed through a web service. Traditionally, queries against these different data sources required different syntax and performed no type checking at compile time.
For example, consider a collection of customers. How would you search that collection for all customers with a specific job title? Using what you have learned so far, you would need to write code that iterates over each item in the collection, examining the appropriate field and returning those items that match the job title for which you are searching. What would happen if the source of your customer data were to change and no longer be an in-memory collection but an XML file or data retrieved from a web service call? You would most likely need to rewrite your search logic to accommodate this new data source.
In this hour, you learn about Language Integrated Query (LINQ) and query expression expressions, which enable you to write a single query that works correctly for any supported data source.
Introducing LINQ
Query expressions in the .NET Framework are part of a set of technologies called LINQ, which integrate query capabilities directly into the C# language. LINQ eliminates the language mismatch commonly found between working with data and working with objects by providing the same query language for the following data sources:
• SQL databases
• XML documents
• Web services
• ADO.NET Datasets
• Any collections that support the IEnumerable or IEnumerable<T> interfaces.
This enables a query to be a first-class language construct, just like arithmetic operations and control flow statements are first-class concepts in C#.
Using LINQ to Manipulate Data
Query expressions in LINQ can query and transform data from any supported data source in a consistent fashion by working with the common operations performed rather than focusing on the structure. You can freely change the structure of the underlying data being queried without needing to change the query itself.
Listing 13.1 shows a query against a collection of Contact objects. Assume for the moment that the list has been populated as a result of calling GetContacts.
Listing 13.1. A LINQ Query
class Contact
{
public int Id { get; set; }
public string Company { get; set; }
public string LastName { get; set; }
public string FirstName { get; set; }
public string Address { get; set; }
public string City { get; set; }
public string StateProvince { get; set; }
}
IEnumerable<Contact> contacts = GetContacts();
var result =
from contact in contacts
select contact.FirstName;
foreach(var name in result)
{
Console.WriteLine(name);
}
This simple query illustrates the declarative syntax, also called the query comprehension syntax, supported by the C# language. This syntax enables you to write queries using Structured Query Language (SQL)-like query syntax, providing a great deal of flexibility and expressiveness. Although all the variables in a query expression are strongly typed, in most cases you don’t need to provide the type explicitly because the compiler can infer it.
Note: Query Comprehension Syntax
If you are familiar with SQL, the query comprehension syntax used by LINQ will be familiar since it uses some of the same keywords and offers many of the same advantages. The most noticeable difference is that the from operator occurs before the select operator, rather than after it as it does in SQL. Although SQL is designed to handle relational data only, LINQ actually supports far more data structures.
Selecting Data
Although the code shown in Listing 13.1 might look simple, a lot is actually going on. The first thing you should notice is the use of an implicitly typed variable named result, which is actually of type IEnumerable<string>. The result of the query expression (the code on the right side of the assignment operator) is actually a query, not the result of the query. The select clause returns an object that represents the operation of projecting a result (the contact.FirstName values) from a sequence (the contacts list). Because the results are strings, result must be an enumerable collection of strings. It does not actually retrieve the data at this time; rather, it simply returns an enumerable collection that will fetch the data later.
This query literally says “select the FirstName field from each element, called contact, in the data source specified by contacts.” You can think of the contact variable specified in the from clause as being similar to the iteration variable of a foreach statement. It corresponds to a read-only local variable scoped only to the query expression. The in clause specifies the data source containing the elements to be queried, and the select clause says to select only the contact.FirstName field for each element during the iteration.
Although this syntax works well for selecting a single field, it is common to select multiple fields or even to transform the data in some way, such as combining fields. Fortunately, LINQ enables these scenarios as well, using similar syntax. You actually have several options for performing these types of selections.
The first is simply to concatenate the fields in the select statement, thereby still returning a single field, as shown in Listing 13.2.
Listing 13.2. A LINQ Query Concatenating Data
var result =
from contact in contacts
select contact.FirstName + " " + contact.LastName;
foreach(var name in result)
{
Console.WriteLine(name);
}
Obviously, this form of selection works only in a limited number of cases. A more flexible approach is to return multiple fields, essentially returning a subset of data, as shown in Listing 13.3.
Listing 13.3. A LINQ Query Returning an Anonymous Type
var result =
from contact in contacts
select new
{
Name = contact.LastName + ", " + contact.FirstName,
DateOfBirth = contact.DateOfBirth
};
foreach(var contact in result)
{
Console.WriteLine("{0} born on {1}", contact.Name, contact.DateOfBirth);
}
In this case, you are still returning an IEnumerable, but what is its type? If you look at the select clause in Listing 13.3, you should notice it is returning a new type containing the values from the contact.FirstName and contact.LastName fields. This new type is actually an anonymous typecontaining properties named Name and DateOfBirth. The type is anonymous because it doesn’t have a name. You did not explicitly declare a new type that corresponds to the returned value; the compiler generated it for you.
Note: Anonymous Types
The ability to create anonymous types in this manner is central to the way LINQ works and would not be possible without the type inference provided by var.
Try It Yourself: Selecting Data
To select data using select query statements, follow these steps. Keep Visual Studio open at the end of this exercise because you will use this application later.
1. Open the ConsoleApplication1 project in Visual Studio. This project can be found in the Hour 13\Try It Yourself\Selecting Data\Starting folder of the book downloads.
2. Open Program.cs and add a query that selects the LastName property for each contact in the contacts collection.
3. Write a foreach statement that prints the results of the query.
4. Run the application by pressing Ctrl+F5. The output should look similar to Figure 13.1.
Figure 13.1. Selecting data.
5. Write an additional query that selects the concatenation of LastName, FirstName in to the Name property of a new anonymous type.
6. Write a foreach statement that prints the results of the query.
7. Run the application by pressing Ctrl+F5. The output should look similar to Figure 13.2.
Figure 13.2. Selecting data using anonymous types.
Filtering Data
Selecting data is important, but selecting data in this way provides no option to restrict what data is returned. Just as SQL provides a where clause, LINQ provides a where clause that returns an enumerable collection containing elements that match the specified criteria. Listing 13.4 applies awhere clause to the query in Listing 13.3, restricting the results to only those contacts where the value of StateProvince is equal to "FL".
Listing 13.4. A Filtered LINQ Query
var result =
from contact in contacts
where contact.StateProvince == "FL"
select new { customer.FirstName, customer.LastName };
foreach(var name in result)
{
Console.WriteLine(name.FirstName + " " + name.LastName);
}
The where clause is applied first, resulting in an enumerable collection to which the select clause is applied, resulting in an anonymous type containing the FirstName and LastName properties.
Try It Yourself: Filtering Data
By following these steps, you learn how to write query statements that filter the resulting data. If you closed Visual Studio, repeat the previous exercise first. Be sure to keep Visual Studio open at the end of this exercise because you will use this application later.
1. Modify both of the queries you previously wrote to include a where clause that filters the resultset to just contacts whose last name starts with “M.”
2. Run the application by pressing Ctrl+F5. The output should look similar to Figure 13.3.
Figure 13.3. Filtering data.
Grouping and Ordering Data
To support more complex scenarios, such as ordering or grouping the returned data, LINQ provides the orderby and group clauses. You can order data in either ascending (smallest to largest) or descending (largest to smallest) order. Because ascending is the default, you don’t need to specify it. Listing 13.5 shows the query from Listing 13.1 ordered by the LastName field.
Listing 13.5. A LINQ Query Using OrderBy
var result =
from contact in contacts
orderby contact.LastName
select contact.FirstName;
foreach(var name in result)
{
Console.WriteLine(name);
}
You can order by multiple fields and can mix ascending and descending to create rather sophisticated orderby statements, as shown in Listing 13.6.
Listing 13.6. A LINQ Query Using a Complex OrderBy
var result =
from contact in contacts
orderby
contact.LastName ascending,
contact.FirstName descending
select customer.FirstName;
foreach(var name in result)
{
Console.WriteLine(name);
}
Grouping data follows a similar pattern, but the group clause takes the place of the select clause. The difference when grouping data is that the result returned is an IEnumerable of IGrouping<TKey, TElement> objects, which you can think of as a list of lists. This requires two nested foreachstatements to access the results.
Listing 13.7 shows the same query as in Listing 13.1, but this time groups by the first character of the last name.
Listing 13.7. A LINQ Query Using Group
var result =
from contact in contacts
group contact by contact.LastName[0];
foreach(var group in result)
{
Console.WriteLine("Last names starting with {0}", group.Key);
foreach(var name in group)
{
Console.WriteLine(name);
}
Console.WriteLine();
}
If you need to refer to the result of a grouping operation, you can create an identifier that can be queried further using the into keyword. This form of composability is a query continuation.
Listing 13.8 performs the same query as Listing 13.7 but returns only those groups that have more than two entries.
Listing 13.8. A LINQ Query Using Group and Into
var result =
from contact in contacts
group contact by contact.LastName[0] into namesGroup
where namesGroup.Count() > 2
select namesGroup;
foreach(var group in result)
{
Console.WriteLine("Last names starting with {0}", group.Key);
foreach(var name in group)
{
Console.WriteLine(name);
}
Console.WriteLine();
}
Try It Yourself: Grouping and Ordering Data
To write query statements that perform grouping, ordering, and other aggregating functions, follow these steps. If you closed Visual Studio, repeat the previous exercise first. Be sure to keep Visual Studio open at the end of this exercise because you will use this application later.
1. Write a new query that groups the contacts collection by the first character of the last name.
2. Write a foreach statement that prints the grouping key and includes a nested foreach statement that prints the last name of each contact in the group.
3. Run the application by pressing Ctrl+F5. The output should look similar to Figure 13.4.
Figure 13.4. Grouping data.
Joining Data
LINQ also enables you to combine multiple data sources by joining them together on one or more common fields. Joining data is important for queries against data sources where their relationship cannot be followed directly. Unlike SQL, which supports joins using many different operators, join operations in LINQ are based on the equality of their keys.
Expanding on the earlier examples that used only the Contact class, you need at least two classes to perform join operations. The Contact class is shown again in Listing 13.9, along with a new JournalEntry class. Continue the assumption that the contacts list has been populated as a result of calling GetContacts and that the journal list has been populated as a result of calling GetJournalEntries.
Listing 13.9. The Contact and JournalEntry Classes
class Contact
{
public int Id { get; set; }
public string Company { get; set; }
public string LastName { get; set; }
public string FirstName { get; set; }
public string Address { get; set; }
public string City { get; set; }
public string StateProvince { get; set; }
}
class JournalEntry
{
public int Id { get; set; }
public int ContactId { get; set; }
public string Description { get; set; }
public string EntryType { get; set; }
public DateTime Date { get; set; }
}
IEnumerable<Contact> contacts = GetContacts();
IEnumerable<JournalEntry> journal = GetJournalEntries();
The simplest join query in LINQ is the functional equivalent of an inner join in SQL and uses the join clause. Unlike joins in SQL, which can use many different operators, joins in LINQ can use only an equality operator and are called equijoins.
Listing 13.10 shows a query against a list of Contact objects joined to a list of JournalEntry objects using the Contact.ID and JournalEntry.ContactId fields as the keys for the join.
Listing 13.10. A LINQ Query Using Join
var result =
from contact in contacts
join journalEntry in journal
on contact.Id equals journalEntry.ContactId
select new
{
contact.FirstName,
contact.LastName,
journalEntry.Date,
journalEntry.EntryType,
journalEntry.Description
};
The join clause in Listing 13.10 creates a range variable named journalEntry, which is of type JournalEntry, and then uses the equals operator to join the two data sources.
LINQ also has the concept of a group join, which has no corresponding SQL query. A group join uses the into keyword and creates results that have a hierarchical structure. Just as you did with the group clause, you need nested foreach statements to access the results.
Caution: Order Is Important
When working with LINQ joins, order is important. The data source to be joined must be on the left side of the equals operator and the joining data source must be on the right. In this example, contacts is the data source to be joined and journal is the joining data source.
Fortunately, the compiler can catch these types of errors and generate a compiler error. If you were to swap the parameters in the join clause, you would get the following compiler error:
The name 'journalentry' is not in scope on the left side of 'equals'. Consider
swapping the expressions on either side of 'equals'.
Another important thing to watch out for is that the join clause uses the equals operator, which is not the same as the equality (==) operator.
Listing 13.11 shows a query that joins contacts and journal and returns a result grouped by contact name. Each entry in the group has an enumerable collection of journal entries, represented by the JournalEntries property in the returned anonymous type.
Listing 13.11. A LINQ Query Using a Group Join
var result =
from contact in contacts
join journalEntry in journal
on contact.Id equals journalEntry.ContactId
into journalGroups
select new
{
Name = contact.LastName + ", " + contact.FirstName,
JournalEntries = journalGroups
};
Flattening Data
Although selecting and joining data often return results in the right shape, that hierarchical shape can sometimes be cumbersome to work with. LINQ enables you to create queries that instead return the flattened data, much the same way you would when querying a SQL data source.
Suppose you were to change the Contact and JournalEntry classes so that a List<JournalEntries> field named Journal is added to the Contact class and the ContactId property is removed from the JournalEntry class, as shown in Listing 13.12.
Listing 13.12. Revised Contact and JournalEntry Classes
class Contact
{
public int Id { get; set; }
public string Company { get; set; }
public string LastName { get; set; }
public string FirstName { get; set; }
public string Address { get; set; }
public string City { get; set; }
public string StateProvince { get; set; }
public List<JournalEntries> Journal;
}
class JournalEntry
{
public int Id { get; set; }
public string Description { get; set; }
public string EntryType { get; set; }
public DateTime Date { get; set; }
}
IEnumerable<Contact> contacts = GetContacts ();
You could then query the contacts collection using the following query to retrieve the list of journal entries for a specific contact, as shown in Listing 13.13.
Listing 13.13. A LINQ Query Selecting an Enumerable Collection
var result =
from contact in contacts
where contact.Id == 1
select contact.Journal;
foreach(var item in result)
{
foreach(var journalEntry in item)
{
Console.WriteLine(journalEntry);
}
}
Although this works and returns the results, it still requires nested foreach statements to generate the proper results. Fortunately, LINQ provides a query syntax that enables the data to be returned in a flattened manner by supporting a select from more than one data source. The code inListing 13.14 shows how this query would be written so that only a single foreach statement is required by using multiple from clauses.
Listing 13.14. A LINQ Query Selecting Flattened Data
var result =
from contact in contacts
from journalEntry in contact.Journal
where contact.Id == 1
select journalEntry;
foreach(var journalEntry in result)
{
Console.WriteLine(journalEntry);
}
Standard Query Operator Methods
All the queries you have just seen use declarative query syntax; however, they could have also been written using standard query operator method calls, which are actually extension methods for the Enumerable class defined in the System.Linq namespace. The compiler converts query expressions using the declarative syntax to the equivalent query operator method calls.
As long as you include the System.Linq namespace with a using statement, you can see the standard query operator methods on any classes that implement the IEnumerable<T> interface, as shown in Figure 13.5.
Figure 13.5. LINQ extension methods in IntelliSense.
Although the declarative query syntax supports almost all query operations, there are some, such as Count or Max, which have no equivalent query syntax and must be expressed as a method call. Because each method call returns an IEnumerable, you can compose complex queries by chaining the method calls together. This is what the compiler does on your behalf when it compiles your declarative query expressions.
Listing 13.15 shows the same query from Listing 13.4 using method syntax rather than declarative syntax, and the output from both will be identical. The Where method corresponds to the where clause, whereas the Select method corresponds to the select clause.
Note: Declarative or Method Syntax
The choice of using the declarative syntax or the method syntax is entirely personal and depends on which one you find easier to read. No matter which one you choose, the result of executing the query will be the same.
Listing 13.15. A LINQ Query Using Method Syntax
var result = contacts.
Where(contact => contact.StateProvince == "FL").
Select(contact => new { contact.FirstName, contact.LastName });
foreach(var name in result)
{
Console.WriteLine(name.FirstName + " " + name.LastName);
}
Lambdas
In Listing 13.15, you might have noticed that the arguments passed to the Where and Select methods look different from what you have used before. These arguments actually contain code rather than data types. In Hour 7, “Events and Event Handling,” you learned about delegates, which enable a method to be passed as an argument to other methods, and about anonymous methods, which enable you to write an unnamed inline statement block that can be executed in a delegate invocation.
The combination of these concepts is a lambda, which is an anonymous function that can contain expressions and statements. Lambdas enable you to write code normally written using an anonymous method or generic delegate in a more convenient and compact way.
Note: Lambdas and Delegates
Because lambdas are a more compact way to write a delegate, you can use them anywhere you would ordinarily have used a delegate. As a result, the lambda formal parameter types must match the corresponding delegate type exactly. The return type must also be implicitly convertible to the delegate’s return type.
Although lambdas have no type, they are implicitly convertible to any compatible delegate type. That implicit conversion is what enables you to pass them without explicit assignment.
Lambdas in C# use the lambda operator (=>). If you think about a lambda in the context of a method, the left side of the operator specifies the formal parameter list, and the right side of the operator contains the method body. All the restrictions that apply to anonymous methods also apply to lambdas.
The argument to the Where method shown in Listing 13.15, contact => contact.StateProvince == "FL", is read as “contact goes to contact.StateProvince equals FL.”
Tip: Captured and Defined Variables
Lambdas also have the capability to “capture” variables, which can be local variables or parameters of the containing method. This enables the body of the lambda to access the captured variable by name. If the captured variable is a local variable, it must be definitely assigned before it can be used in the lambda. Captured parameters cannot be ref or out parameters.
Be careful, however, because variables that are captured by lambdas will not be eligible for garbage collection until the delegate that references it goes out of scope.
Any variables introduced within the lambda are not visible in the outer containing method. This also applies to the input parameter names, so you can use the same identifiers for multiple lambdas.
Expression Lambdas
When a lambda contains an expression on the right side of the operator, it is an expression lambda and returns the result of that expression. The basic form of an expression lambda is as follows:
(input parameters) => expressions
If there is only one input parameter, the parentheses are optional. If you have any other number of input parameters, including none, the parentheses are required.
Just as generic methods can infer the type of their type parameter, lambdas can infer the type for their input parameters. If the compiler cannot infer the type, you can specify the type explicitly. Listing 13.16 shows different forms of expression lambdas.
Listing 13.16. Sample Expression Lambdas
x => Math.Pow(x, 2)
(x, y) => Math.Pow(x, y)
() => Math.Pow(2, 2)
(int x, string s) => s.Length < x
If you consider the expression portion of an expression lambda as the body of a method, an expression lambda contains an implicit return statement that returns the result of the expression.
Caution: Expression Lambdas Containing Method Calls
Although most of the examples in Listing 13.16 used methods on the right side of the operator, if you create lambdas that will be used in another domain, such as SQL Server, you should not use method calls because they have no meaning outside the boundary of the .NET Framework common language runtime.
Try It Yourself: Working with Expression Lambdas
By following these steps, you learn how to use expression lambdas with the LINQ query methods. If you closed Visual Studio, repeat the previous exercise first.
1. Modify the declarative query expressions you wrote in the previous exercises to use the corresponding standard query method.
2. Run the application by pressing Ctrl+F5. The output should match the output from the previous exercises.
Statement Lambdas
A lambda that has one or more statements enclosed by curly braces on the right side is a statement lambda. The basic form of a statement lambda is as follows:
(input parameters) => { statement; }
Like expression lambdas, if there is only one input parameter, the parentheses are optional; otherwise, they are required. Statement lambdas also follow the same rules of type inference.
Although expression lambdas contain an implicit return statement, statement lambdas do not. You must explicitly specify the return statement from a statement lambda. The return statement causes only the implicit method represented by the lambda to return, not the enclosing method.Listing 13.17 shows different forms of statement lambdas.
Listing 13.17. Sample Statement Lambdas
(x) => { return x++; };
CheckBox cb = new CheckBox();
cb.CheckedChanged += (sender, e) =>
{
MessageBox.Show(cb.Checked.ToString());
};
Action<string> myDel = n =>
{
string s = n + " " + "World";
Console.WriteLine(s);
};
myDel("Hello");
A statement lambda cannot contain a goto, break, or continue statement whose target is outside the scope of the lambda itself. Similarly, normal scoping rules prevent a branch into a nested lambda from an outer lambda.
Predefined Delegates
Although lambdas are an integral component of LINQ, they can be used anywhere you can use a delegate. As a result, the .NET Framework provides many predefined delegates that can be used to represent a method that can be passed as a parameter without requiring you to first declare an explicit delegate type.
Because delegates that return a Boolean value are common, the .NET Framework defines a Predicate<in T> delegate, which is used by many of the methods in the Array and List<T> classes.
Although Predicate<T> defines a delegate that always returns a Boolean value, the Func family of delegates encapsulates a method that has the specified return value and 0 to 16 input parameters.
Because Predicate<T> and the Func delegates all have a return type, the family of Action delegates represents a method that has a void return type. Just like the Func delegates, the Action delegates also accept from 0 to 16 input parameters.
Deferred Execution
Unlike many traditional data query techniques, a LINQ query is not evaluated until you actually iterate over it. One advantage of this approach, called lazy evaluation, is that it enables the data in the original collection to change between when the query is executed and the data identified by the query is retrieved. Ultimately, this means you will always have the most up-to-date data.
Even though LINQ prefers to use lazy evaluation, any queries that use any of the aggregation functions must first iterate over all the elements. These functions, such as Count, Max, Average, and First, return a single value and execute without using an explicit foreach statement.
Tip: Deferred Execution and Chained Queries
Another advantage of deferred execution is that it enables queries to be efficiently chained together. Because query objects represent queries, not the results of those queries, they can easily be chained together or reused without causing potentially expensive data fetching operations.
You can also force immediate evaluation, sometimes called greedy evaluation, by placing the foreach statement immediately after the query expression or by calling the ToList or ToArray methods. You can also use either ToList or ToArray to cache the data in a single collection object.
Summary
LINQ takes the best ideas from functional languages such as Haskell and other research languages and brings them together to introduce a way to query data in a consistent manner, no matter what the original data source might be, using a simple declarative or method-based syntax. By enabling queries to be written in a source-agnostic fashion, LINQ enables access to a wide variety of data sources, including databases, XML files, and in-memory collections.
Using syntax similar to that used by SQL queries, the declarative syntax of LINQ enables a query to be a first-class language construct, just like arithmetic operations and control flow statements. LINQ is actually implemented as a set of extension methods on the IEnumerable<T> interface, which accept lambdas as a parameter. Lambdas, in the form of expression or statement lambdas, are a compact way to write anonymous delegates. When you first start with LINQ, you don’t need to use lambdas extensively, but as you become more familiar with them, you will find that they are extremely powerful.
Q&A
Q. What is LINQ?
A. LINQ is a set of technologies that integrates query capabilities directly into the C# language and eliminates the language mismatch commonly found between working with data and working with objects by providing the same query language for each supported data source.
Q. What is a lambda expression?
A. A lambda expression represents a compact and concise way to write an anonymous delegate and can be used anywhere a traditional delegate can be used.
Workshop
Quiz
1. Is there a difference between the declarative and method syntax for LINQ?
2. When is a LINQ query executed?
3. What is the underlying delegate type for lambda expressions?
Answers
1. The choice of using the declarative syntax or the method syntax is entirely personal and depends on which one you find easier to read. No matter which one you choose, the result of executing the query will be the same.
2. By default, LINQ utilizes deferred execution of queries. This means that the query is not actually executed until the result is iterated over using a foreach statement.
3. Lambda expressions are inherently typeless, so they have no underlying type; however, they can be implicitly converted to any compatible delegate type.
Exercises
There are no exercises for this hour.