Delegates and Lambda Expressions - Essential C# 6.0 (2016)

Essential C# 6.0 (2016)

12. Delegates and Lambda Expressions

Previous chapters discussed extensively how to create classes to encapsulate data and operations on data. As you create more and more classes, you will see common patterns in the relationships among them. One common pattern is to pass an object to a method solely so that the method can, in turn, call a method on the object. For example, if you pass to a method a reference to IComparer<int>, odds are good that the called method will itself call the Compare() method on the object you provided. In this case, the interface is nothing more than a way to pass a reference to a single method that can be invoked. It seems unnecessary to have to define a new interface every time you want to pass a method around. In this chapter we describe how to create and use a special kind of class called a delegate that enables you to treat references to methods as you would any other data. We then show how to create custom delegates quickly and easily with lambda expressions.

Lambda expressions were added to the language in C# 3.0; the previous version, C# 2.0, supported a less elegant syntax for custom delegate creation called anonymous methods. Every version of C# after C# 2.0 supports anonymous methods for backward compatibility, but in new code they should be deprecated in favor of using lambda expressions. This chapter includes Advanced Topic blocks that describe how to use anonymous methods should you need to work with legacy C# 2.0 code; you can largely ignore these sections if you are working only with newer code.

We conclude the chapter with a discussion of expression trees, which enable you to use the compiler’s analysis of a lambda expression at execution time.

Image

Introducing Delegates

Veteran C and C++ programmers have long used “function pointers” as a mechanism for passing a reference to one method as an argument to another method. C# achieves similar functionality by using delegates. Delegates allow you to capture a reference to a method and pass it around like any other object, and to call the captured method like any other method. Let’s consider an example illustrating how this technique might be useful.

Defining the Scenario

Although it is not very efficient, one of the simplest sort routines is the bubble sort. Listing 12.1 shows the BubbleSort() method.

LISTING 12.1: BubbleSort() Method


static class SimpleSort1
{
public static void BubbleSort(int[] items)
{
int i;
int j;
int temp;

if(items==null)
{
return;
}

for (i = items.Length - 1; i >= 0; i--)
{
for (j = 1; j <= i; j++)
{
if (items[j - 1] > items[j])
{
temp = items[j - 1];
items[j - 1] = items[j];
items[j] = temp;
}
}
}
}
// ...
}


This method will sort an array of integers in ascending order.

Suppose you need to sort the integers in Listing 12.1 in either ascending or descending order. You could duplicate the code and replace the greater-than operator with a less-than operator, but it seems like a bad idea to replicate several dozen lines of code merely to change a single operator. As a less verbose alternative, you could pass in an additional parameter indicating how to perform the sort, as shown in Listing 12.2.

LISTING 12.2: BubbleSort() Method, Ascending or Descending


class SimpleSort2
{
public enum SortType
{
Ascending,
Descending
}

public static void BubbleSort(int[] items, SortType sortOrder)
{
int i;
int j;
int temp;

if(items==null)
{
return;
}

for (i = items.Length - 1; i >= 0; i--)
{
for (j = 1; j <= i; j++)
{
bool swap = false;
switch (sortOrder)
{
case SortType.Ascending :
swap = items[j - 1] > items[j];
break;

case SortType.Descending :
swap = items[j - 1] < items[j];
break;
}
if (swap)
{
temp = items[j - 1];
items[j - 1] = items[j];
items[j] = temp;
}
}
}
}
// ...
}


However, this code handles only two of the possible sort orders. If you wanted to sort them lexicographically (that is, 1, 10, 11, 12, 2, 20, ...), or order them via some other criterion, it would not take long before the number of SortType values and the corresponding switch cases would become cumbersome.

Delegate Data Types

To increase flexibility and reduce code duplication in the previous code listings, you can make the comparison method a parameter to the BubbleSort() method. To pass a method as an argument, a data type is required to represent that method; this data type is called a delegate because it “delegates” the call to the method referred to by the object. Listing 12.3 includes a modification to the BubbleSort() method that takes a delegate parameter. In this case, the delegate data type is ComparisonHandler.

LISTING 12.3: BubbleSort() with Delegate Parameter


class DelegateSample
{
// ...

public static void BubbleSort(
int[] items, ComparisonHandler comparisonMethod)
{
int i;
int j;
int temp;

if(comparisonMethod == null)
{
throw new ArgumentNullException("comparisonMethod");
}

if(items==null)
{
return;
}

for (i = items.Length - 1; i >= 0; i--)
{
for (j = 1; j <= i; j++)
{
if (comparisonMethod(items[j - 1], items[j]))
{
temp = items[j - 1];
items[j - 1] = items[j];
items[j] = temp;
}
}
}
}
// ...
}


The delegate type ComparisonHandler represents a method that compares two integers. Within the BubbleSort() method, you then use the instance of the ComparisonHandler, referred to by the comparisonMethod parameter, to determine which integer is greater. SincecomparisonMethod represents a method, the syntax to invoke the method is identical to calling any other method. In this case, the ComparisonHandler delegate takes two integer parameters and returns a Boolean value that indicates whether the first integer is greater than the second one.

Note that the ComparisonHandler delegate is strongly typed to represent a method that returns a bool and accepts exactly two integer parameters. Just as with any other method call, the call to a delegate is strongly typed, and if the data types for the arguments are not compatible with the parameters, the C# compiler reports an error.

Declaring a Delegate Type

You just saw how to define a method that uses a delegate, and you learned how to invoke a call to the delegate simply by treating the delegate variable as a method. However, you have yet to learn how to declare a delegate type. To declare a delegate type, you use the delegate keyword and follow it with what looks like a method declaration. The signature of that method is the signature of the method that the delegate can refer to, and the name of the delegate type appears where the name of the method would appear in a method declaration. Listing 12.4 shows how to declare theComparisonHandler delegate type to require two integers and return a Boolean value.

LISTING 12.4: Declaring a Delegate Type


public delegate bool ComparisonHandler (
int first, int second);


Just as classes can be nested in other classes, so delegates can also be nested in classes. If the delegate declaration appeared within another class, the delegate type would be a nested type, as shown in Listing 12.5.

LISTING 12.5: Declaring a Nested Delegate Type


class DelegateSample
{
public delegate bool ComparisonHandler (
int first, int second);
}


In this case, the data type would be DelegateSample.ComparisonHandler because it is defined as a nested type within DelegateSample.

Instantiating a Delegate

In this final step of implementing the BubbleSort() method with a delegate, you will learn how to call the method and pass a delegate instance—specifically, an instance of type ComparisonHandler. To instantiate a delegate, you need a method with parameters and a return type that matches the signature of the delegate type itself. In the case of ComparisonHandler, that method takes two integers and returns a bool. The name of the method need not match the name of the delegate, but the rest of the method signature must be compatible with the delegate signature.Listing 12.6 shows the code for a greater-than method compatible with the delegate type.

LISTING 12.6: Declaring a ComparisonHandler-Compatible Method


public delegate bool ComparisonHandler (
int first, int second);

class DelegateSample
{
public static void BubbleSort(
int[] items, ComparisonHandler comparisonMethod)
{
// ...
}

public static bool GreaterThan(int first, int second)
{
return first > second;
}
// ...
}


With this method defined, you can call BubbleSort() and supply as the argument the name of the method that is to be captured by the delegate, as shown in Listing 12.7.

LISTING 12.7: Using a Method Name As an Argument


public delegate bool ComparisonHandler (
int first, int second);

class DelegateSample
{
public static void BubbleSort(
int[] items, ComparisonHandler comparisonMethod)
{
// ...
}

public static bool GreaterThan(int first, int second)
{
return first > second;
}

static void Main()
{
int i;
int[] items = new int[5];

for (i=0; i < items.Length; i++)
{
Console.Write("Enter an integer: ");
items[i] = int.Parse(Console.ReadLine());
}

BubbleSort(items, GreaterThan);

for (int i = 0; i < items.Length; i++)
{
Console.WriteLine(items[i]);
}
}

}


Note that the ComparisonHandler delegate is a reference type, but you do not necessarily use new to instantiate it. The conversion from the method group—the expression that names the method—to the delegate type automatically creates a new delegate object in C# 2.0 and later.


Advanced Topic: Delegate Instantiation in C# 1.0

In Listing 12.7, the delegate was instantiated by simply passing the name of the desired method, GreaterThan, as an argument to the call to the BubbleSort() method. The first version of C# required instantiation of the delegate, using the more verbose syntax shown in Listing 12.8.

LISTING 12.8: Passing a Delegate As a Parameter in C# 1.0


BubbleSort(items,
new ComparisonHandler(GreaterThan));


Later versions support both syntaxes; throughout the remainder of the book we will show only the modern, concise syntax.



Advanced Topic: Delegate Internals

A delegate is actually a special kind of class. Although the C# standard does not specify exactly what the class hierarchy is, a delegate must always derive directly or indirectly from System.Delegate. In fact, in .NET, delegate types always derive fromSystem.MulticastDelegate, which in turn derives from System.Delegate, as shown in Figure 12.1.

Image

FIGURE 12.1: Delegate Types Object Model

The first property is of type System.Reflection.MethodInfo, which we cover in Chapter 17. MethodInfo describes the signature of a particular method, including its name, parameters, and return type. In addition to MethodInfo, a delegate needs the instance of the object containing the method to invoke. This is the purpose of the second property, Target. In the case of a static method, Target corresponds to the type itself. The purpose of the MulticastDelegate class is the topic of the next chapter.

Note that all delegates are immutable; that is, you cannot change a delegate once you have created it. If you have a variable that contains a reference to a delegate and you want it to refer to a different method, you must create a new delegate and assign it to the variable.

Although all delegate data types derive indirectly from System.Delegate, the C# compiler does not allow you to declare a class that derives directly or indirectly from System.Delegate or System.MulticastDelegate. As a consequence, the code shown in Listing 12.9 is not valid.

LISTING 12.9: System.Delegate Cannot Explicitly Be a Base Class


// ERROR: 'ComparisonHandler' cannot
// inherit from special class 'System.Delegate'
public class ComparisonHandler: System.Delegate
{
// ...
}



Passing the delegate to specify the sort order is a significantly more flexible strategy than using the approach described at the beginning of this chapter. By passing a delegate you can change the sort order to be alphabetical simply by adding an alternative delegate to convert integers to strings as part of the comparison. Listing 12.10 shows a full listing that demonstrates alphabetical sorting, and Output 12.1 shows the results.

LISTING 12.10: Using a Different ComparisonHandler-Compatible Method


using System;
class DelegateSample
{

public delegate bool ComparisonHandler(int first, int second);

public static void BubbleSort(
int[] items, ComparisonHandler comparisonMethod)
{
int i;
int j;
int temp;

for (i = items.Length - 1; i >= 0; i--)
{
for (j = 1; j <= i; j++)
{
if (comparisonMethod(items[j - 1], items[j]))
{
temp = items[j - 1];
items[j - 1] = items[j];
items[j] = temp;
}
}
}
}

public static bool GreaterThan(int first, int second)
{
return first > second;
}

public static bool AlphabeticalGreaterThan(
int first, int second)
{
int comparison;
comparison = (first.ToString().CompareTo(
second.ToString()));

return comparison > 0;
}

static void Main(string[] args)
{
int i;
int[] items = new int[5];

for (i=0; i<items.Length; i++)
{
Console.Write("Enter an integer: ");
items[i] = int.Parse(Console.ReadLine());
}

BubbleSort(items, AlphabeticalGreaterThan);

for (i = 0; i < items.Length; i++)
{
Console.WriteLine(items[i]);
}
}
}


OUTPUT 12.1

Enter an integer: 1
Enter an integer: 12
Enter an integer: 13
Enter an integer: 5
Enter an integer: 4
1
12
13
4
5

The alphabetic order is different from the numeric order. Even so, notice how simple it was to add this additional sort mechanism compared to the process used at the beginning of the chapter. The only changes to create the alphabetical sort order were the addition of theAlphabeticalGreaterThan method and then passing that method into the call to BubbleSort().

Begin 3.0

Lambda Expressions

In Listings 12.7 and 12.10, we saw that you can convert the expressions GreaterThan and AlphabeticalGreaterThan to a delegate type that is compatible with the parameter types and the return type of the named method. You might have noticed that the declaration of theGreaterThan method—the code that says it is a public, static, bool-returning method with two parameters of type int named first and second—was considerably larger than the body of the method, which simply compared its two parameters and returned the result. It is unfortunate that so much “ceremony” has to surround such a simple method merely so that it can be converted to a delegate type.

To address this concern, C# 2.0 introduced a far more compact syntax for creating a delegate, and C# 3.0 introduced several even more compact syntaxes than C# 2.0’s syntax. The C# 2.0 feature is called anonymous methods, while the C# 3.0 feature is called lambda expressions. When referring generally to either syntax, we’ll refer to them as anonymous functions. Both syntaxes are still legal, but for new code the lambda expression syntax is preferred over the anonymous method syntax. Throughout this book we will generally use the lambda expression syntax except when specifically describing C# 2.0 anonymous methods.

Lambda expressions are themselves divided into two types: statement lambdas and expression lambdas. Figure 12.2 shows the hierarchical relationship between these terms.

Image

FIGURE 12.2: Anonymous Function Terminology

Statement Lambdas

The purpose of a lambda expression is to eliminate the hassle of declaring an entirely new member when you need to make a delegate from a very simple method. Several different forms of lambda expressions exist. A statement lambda, for example, consists of a formal parameter list, followed by the lambda operator =>, followed by a code block.

Listing 12.11 shows equivalent functionality to the call to BubbleSort from Listing 12.7, except that Listing 12.11 uses a statement lambda to represent the comparison method, rather than creating a GreaterThan method. As you can see, much of the information that appeared in theGreaterThan method declaration is included in the statement lambda; the formal parameter declarations and the block are the same, but the method name and its modifiers are missing.

LISTING 12.11: Creating a Delegate with a Statement Lambda


// ...

BubbleSort(items,
(int first, int second) =>
{
return first < second;
}
);

// ...


When reading code that includes a lambda operator, you would replace the lambda operator with the words go/goes to. For example, in Listing 12.11, you would read the second BubbleSort() parameter as “integers first and second go to returning the result of first less thansecond.”

As readers will observe, the syntax in Listing 12.11 is almost identical to that in Listing 12.7, apart from the fact that the comparison method is now found lexically where it is converted to the delegate type, rather than being found elsewhere and looked up by name. The name of the method is missing, which explains why such methods are called “anonymous functions.” The return type is missing, but the compiler can see that the lambda expression is being converted to a delegate whose signature requires the return type bool. The compiler verifies that the expressions of every return statement in the statement lambda’s block would be legal in a bool-returning method. The public modifier is missing; given that the method is no longer an accessible member of the containing class, there is no need to describe its accessibility. Similarly, the staticmodifier is no longer necessary. The amount of “ceremony” around the method is already greatly reduced.

The syntax is still needlessly verbose, however. We have deduced from the delegate type that the lambda expression must be bool-returning; we can similarly deduce that both parameters must be of type int, as shown in Listing 12.12.

LISTING 12.12: Omitting Parameter Types from Statement Lambdas


// ...

BubbleSort(items,
(first, second) =>
{
return first < second;
}
);

// ...


In general, explicitly declared parameter types are optional in all lambda expressions if the compiler can infer the types from the delegate that the lambda expression is being converted to. For situations when specifying the type makes code more readable, however, C# enables you to do so. In cases where inference is not possible, the C# language requires that the lambda parameter types be stated explicitly. If one lambda parameter type is specified explicitly, then all of them must be specified explicitly, and they must all match the delegate parameter types exactly.


Guidelines

CONSIDER omitting the types from lambda formal parameter lists when the types are obvious to the reader, or when they are an insignificant detail.


One other means of reducing the syntax is possible, as shown in Listing 12.13: A lambda expression that has exactly one parameter whose type is inferred may omit the parentheses around the parameter list. If there are zero parameters or more than one parameter, or if the single parameter is explicitly typed, the lambda must have parentheses around the parameter list.

LISTING 12.13: Statement Lambdas with a Single Input Parameter


using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
// ...
IEnumerable<Process> processes = Process.GetProcesses().Where(
process => { return process.WorkingSet64 > 1000000000; });
// ...


In Listing 12.13, the Where() method returns a query for processes that have a physical memory utilization greater than 1 billion bytes. Contrast this with Listing 12.14, which has a parameterless statement lambda. The empty parameter list requires parentheses. Note also that in Listing 12.14, the body of the statement lambda includes multiple statements inside the statement block (via curly braces). Although a statement lambda can contain any number of statements, typically a statement lambda uses only two or three statements in its statement block. (The use of the generic Func delegate type is described in the section “General-Purpose Delegates: System.Func and System.Action” later in this chapter.)

LISTING 12.14: Parameterless Statement Lambdas


using System;
// ...
Func<string> getUserInput =
() =>
{
string input;
do
{
input = Console.ReadLine();
}
while(input.Trim().Length == 0);
return input;
};
// ...


Expression Lambdas

The statement lambda syntax is already much less verbose than the corresponding method declaration; as we’ve seen, it need not declare the method’s name, accessibility, return type, or parameter types. Nevertheless, we can get even less verbose by using an expression lambda. In Listings 12.12, 12.13, and 12.14, we saw statement lambdas whose blocks consisted of a single return statement. What if we eliminated the “ceremony” around that? The only relevant information in such a lambda block is the expression that is returned. An expression lambda contains only that returned expression, with no statement block at all. Listing 12.15 is the same as Listing 12.11, except that it uses an expression lambda rather than a statement lambda.

LISTING 12.15: Passing a Delegate with an Expression Lambda


// ...
BubbleSort(items, (first, second) => first < second );
// ...


Generally, you would read the lambda operator => in an expression lambda the same way as you would a statement lambda: as goes to, or becomes. When a lambda is used to return a bool, as it is in our BubbleSort() example, the lambda is called a predicate. In those cases it is common to read the lambda operator as such that or where. You might read the lambda in Listing 12.15 as “first and second such that first is less than second.”

Like the null literal, an anonymous function does not have any type associated with it; rather, its type is determined by the type it is being converted to. In other words, the lambda expressions we’ve seen so far are not intrinsically of the ComparisonHandler type, but they are compatible with that type and may be converted to it. As a result, you cannot use the typeof() operator (see Chapter 17) on an anonymous method, and calling GetType() is possible only after you convert the anonymous method to a particular type.

Table 12.1 provides additional lambda expression characteristics.

Image

Image

TABLE 12.1: Lambda Expression Notes and Examples

End 3.0

Begin 2.0

Anonymous Methods

Lambda expressions are not supported in C# 2.0. Instead, C# 2.0 uses a syntax called anonymous methods. An anonymous method is like a statement lambda, but without many of the features that make lambdas so compact. An anonymous method must explicitly type every parameter, and must have a statement block. Rather than using the lambda operator => between the parameter list and the code block, an anonymous method puts the keyword delegate before the parameter list, emphasizing that the anonymous method must be converted to a delegate type. Listing 12.16shows the code from Listings 12.7, 12.12, and 12.15 rewritten to use an anonymous method.

LISTING 12.16: Passing an Anonymous Method in C# 2.0


// ...
BubbleSort(items,
delegate(int first, int second)
{
return first < second;
}
);
// ...


It is unfortunate that there are two very similar ways to define an anonymous function in C# 3.0 and later.


Guidelines

AVOID the anonymous method syntax in new code; prefer the more compact lambda expression syntax.


There is, however, one small feature that is supported in anonymous methods that is not supported in lambda expressions: Anonymous methods may omit their parameter list entirely in some circumstances.


Advanced Topic: Parameterless Anonymous Methods

Unlike lambda expressions, anonymous methods may omit the parameter list entirely provided that the anonymous method body does not use any parameter and the delegate type requires only “value” parameters (that is, it does not require the parameters to be marked as out orref). For example, the anonymous method expression delegate { return Console.ReadLine() != ""; } is convertible to any delegate type that requires a return type of bool regardless of the number of parameters the delegate requires. This feature is not used frequently, but you might encounter it when reading legacy code.


End 2.0


Advanced Topic: Why “Lambda” Expressions?

It is fairly obvious why anonymous methods are called “anonymous methods”: They look very similar to method declarations but do not have a declared name associated with them. But where did the “lambda” in “lambda expressions” come from?

The idea of lambda expressions comes from the work of the logician Alonzo Church, who in the 1930s invented a technique for studying functions called the “lambda calculus.” In Church’s notation, a function that takes a parameter x and results in an expression y is notated by prefixing the entire expression with a small Greek letter lambda, and separating the parameter from the value with a dot. The C# lambda expression x=>y would be notated λx.y in Church’s notation. Because it is inconvenient to use Greek letters in C# programs and because the dot already has many meanings in C#, the designers of C# chose to use the “fat arrow” notation rather than the original notation. The name “lambda expression” indicates that the theoretical underpinnings of the idea of anonymous functions are based on the lambda calculus, even though no letter lambda actually appears in the text.


Begin 3.0

Begin 4.0

General-Purpose Delegates: System.Func and System.Action

To reduce the need to define your own custom delegate types, the .NET 3.5 runtime library (which corresponds to C# 3.0) included a set of general-purpose delegates, most of them generic. The System.Func family of delegates is for referring to methods that return a value; theSystem.Action family of delegates is for referring to void-returning methods. The signatures for these delegates are shown in Listing 12.17 (although the in/out type modifiers were not added until C# 4.0, as discussed shortly).

LISTING 12.17: Func and Action Delegate Declarations


public delegate void Action ();
public delegate void Action<in T>(T arg)
public delegate void Action<in T1, in T2>(
in T1 arg1, in T2 arg2)
public delegate void Action<in T1, in T2, in T3>(
T1 arg1, T2 arg2, T3 arg3)
public delegate void Action<in T1, in T2, in T3, in T4(
T1 arg1, T2 arg2, T3 arg3, T4 arg4)
...
public delegate void Action<
in T1, in T2, in T3, in T4, in T5, in T6, in T7, in T8,
in T9, in T10, in T11, in T12, in T13, in T14, in T16(
T1 arg1, T2 arg2, T3 arg3, T4 arg4,
T5 arg5, T6 arg6, T7 arg7, T8 arg8,
T9 arg9, T10 arg10, T11 arg11, T12 arg12,
T13 arg13, T14 arg14, T15 arg15, T16 arg16)

public delegate TResult Func<out TResult>();
public delegate TResult Func<in T, out TResult>(T arg)
public delegate TResult Func<in T1, in T2, out TResult>(
in T1 arg1, in T2 arg2)
public delegate TResult Func<in T1, in T2, in T3, out TResult>(
T1 arg1, T2 arg2, T3 arg3)
public delegate TResult Func<in T1, in T2, in T3, in T4,
out TResult>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
...
public delegate TResult Func<
in T1, in T2, in T3, in T4, in T5, in T6, in T7, in T8,
in T9, in T10, in T11, in T12, in T13, in T14, in T16,
out TResult>(
T1 arg1, T2 arg2, T3 arg3, T4 arg4,
T5 arg5, T6 arg6, T7 arg7, T8 arg8,
T9 arg9, T10 arg10, T11 arg11, T12 arg12,
T13 arg13, T14 arg14, T15 arg15, T16 arg16)


Because the delegate definitions in Listing 12.17 are generic, it is possible to use them instead of defining a custom delegate. The last type parameter of a Func delegate is always the return type of the delegate. The other type parameters correspond in sequence to the types of the delegate parameters. The BubbleSort method in Listing 12.3, for example, requires a delegate that returns bool and takes two int parameters. Thus, rather than declaring the ComparisonHandler delegate type and using it, we could have declared the BubbleSort method as follows:

void BubbleSort(int[] items,
Func<int, int, bool> comparisonMethod) { ... }

In many cases, the inclusion of Func and Action delegates in the .NET Framework 3.5 entirely eliminates the need to define your own delegate types. However, you should consider declaring your own delegate types when doing so significantly increases the readability of the code. A delegate named ComparisonHandler provides an explicit indication of what the delegate is used for, whereas using Func<int, int, bool> provides a more explicit indication of the delegate’s formal parameters and return type.


Guidelines

CONSIDER whether the readability benefit of defining your own delegate type outweighs the convenience of using a predefined generic delegate type.


Delegates Do Not Have Structural Equality

Delegate types in .NET do not exhibit structural equality. That is, you cannot convert a reference to an object of one delegate type to an unrelated delegate type, even if the formal parameters and return types of both delegates are identical. For example, you cannot assign a reference to aComparisonHandler to a variable of type Func<int, int, bool> even though both delegate types represent methods that take two int parameters and return a bool. Unfortunately, the only way to use a delegate of a given type when a delegate of a structurally identical but unrelated delegate type is needed is to create a new delegate that refers to the Invoke method of the old delegate. For example, if you have a variable c of type ComparisonHandler, and you need to assign its value to a variable f of type Func<int, int, bool>, you can say f = c.Invoke;.

However, thanks to the variance support added in C# 4.0, it is possible to make reference conversions between some delegate types. Consider the following contravariant example: Because void Action<in T>(T arg) has the in type parameter modifier, it is possible to assign a reference to a delegate of type Action<object> to a variable of type Action<string>.

Many people find delegate contravariance confusing; just remember that an action that can act on every object can be used as an action that acts on any string. But the opposite is not true: An action that can act only on strings cannot act on every object. Similarly, every type in the Funcfamily of delegates is covariant in its return type, as indicated by the out type parameter modifier on TResult. Therefore it is possible to assign a reference to a delegate of type Func<string> to a variable of type Func<object>.

Listing 12.18 shows examples of delegate covariance and contravariance.

LISTING 12.18: Using Variance for Delegates


// Contravariance
Action<object> broadAction =
(object data) =>
{
Console.WriteLine(data);
};
Action<string> narrowAction = broadAction;

// Covariance
Func<string> narrowFunction =
() =>Console.ReadLine();
Func<object> broadFunction = narrowFunction;

// Contravariance and covariance combined
Func<object, string> func1 =
(object data) => data.ToString();
Func<string, object> func2 = func1;


The last part of the listing combines both variance concepts into a single example, demonstrating how they can occur simultaneously if both in and out type parameters are involved.

Allowing reference conversions on generic delegate types was a key motivating scenario for adding covariant and contravariant conversions to C# 4.0. (The other was support for covariance to IEnumerable<out T>.)

End 4.0


Advanced Topic: Lambda Expression and Anonymous Method Internals

Lambda expressions (and anonymous methods) are not intrinsically “built in” to the CLR. Rather, when the compiler encounters an anonymous function, it translates it into special hidden classes, fields, and methods that implement the desired semantics. The C# compiler generates the implementation code for this pattern so that developers do not have to code this themselves. When given the code in Listing 12.11, 12.12, 12.15, or 12.16, the C# compiler generates CIL code that is similar to the code shown in Listing 12.19.

LISTING 12.19: C# Equivalent of CIL Generated by the Compiler for Lambda Expressions


class DelegateSample
{
// ...
static void Main(string[] args)
{
int i;
int[] items = new int[5];

for (i=0; i<items.Length; i++)
{
Console.Write("Enter an integer:");
items[i] = int.Parse(Console.ReadLine());
}


BubbleSort(items,
DelegateSample.__AnonymousMethod_00000000);

for (i = 0; i < items.Length; i++)
{
Console.WriteLine(items[i]);
}

}

private static bool __AnonymousMethod_00000000(
int first, int second)
{
return first < second;
}

}


In this example, the compiler transforms an anonymous function into a separately declared static method, which is then instantiated as a delegate and passed as a parameter. Unsurprisingly, the compiler generates code that looks remarkably like the original code in Listing 12.7, which the anonymous function syntax was intended to streamline. However, the code transformation performed by the compiler can be considerably more complex than merely rewriting the anonymous function as a static method if “outer variables” are involved.


Outer Variables

Local variables declared outside a lambda expression (including parameters of the containing method) are called the outer variables of that lambda. (The this reference, though technically not a variable, is also considered to be an outer variable.) When a lambda body uses an outer variable, the variable is said to be captured (or, equivalently, closed over) by the lambda. In Listing 12.20, we use an outer variable to count how many times BubbleSort() performs a comparison. Output 12.2 shows the results of this listing.

LISTING 12.20: Using an Outer Variable in a Lambda Expression


class DelegateSample
{

// ...

static void Main(string[] args)
{

int i;
int[] items = new int[5];
int comparisonCount=0;

for (i=0; i<items.Length; i++)
{
Console.Write("Enter an integer:");
items[i] = int.Parse(Console.ReadLine());
}

BubbleSort(items,
(int first, int second) =>
{
comparisonCount++;
return first < second;
}
);

for (i = 0; i < items.Length; i++)
{
Console.WriteLine(items[i]);
}

Console.WriteLine("Items were compared {0} times.",
comparisonCount);
}
}


OUTPUT 12.2

Enter an integer:5
Enter an integer:1
Enter an integer:4
Enter an integer:2
Enter an integer:3
5
4
3
2
1
Items were compared 10 times.

Note that comparisonCount appears outside the lambda expression and is incremented inside it. After calling the BubbleSort() method, comparisonCount is printed out to the console.

Normally the lifetime of a local variable is tied to its scope; when control leaves the scope, the storage location associated with the variable is no longer valid. But a delegate created from a lambda that captures an outer variable might have a longer (or shorter) lifetime than the local variable normally would, and the delegate must be able to safely access the outer variable every time the delegate is invoked. Therefore, the lifetime of a captured variable is extended; it is guaranteed to live at least as long as the longest-lived delegate object capturing it. (And it may live even longer than that—precisely how the compiler generates code that ensures outer variable lifetimes are extended is an implementation detail and subject to change.)

The C# compiler takes care of generating CIL code that shares comparisonCount between the anonymous method and the method that declares it.


Advanced Topic: Outer Variable CIL Implementation

The CIL code generated by the C# compiler for anonymous functions that capture outer variables is more complex than the code for a simple anonymous function that captures nothing. Listing 12.21 shows the C# equivalent of the CIL code used to implement outer variables for the code in Listing 12.20.

LISTING 12.21: C# Equivalent of CIL Code Generated by Compiler for Outer Variables


class DelegateSample
{
// ...
private sealed class __LocalsDisplayClass_00000001
{
public int comparisonCount;
public bool __AnonymousMethod_00000000(
int first, int second)
{
comparisonCount++;
return first < second;
}
}
// ...
static void Main(string[] args)
{
int i;
__LocalsDisplayClass_00000001 locals =
new __LocalsDisplayClass_00000001();
locals.comparisonCount=0;
int[] items = new int[5];

for (i=0; i<items.Length; i++)
{
Console.Write("Enter an integer:");
items[i] = int.Parse(Console.ReadLine());
}

BubbleSort(items, locals.__AnonymousMethod_00000000);
for (i = 0; i < items.Length; i++)
{
Console.WriteLine(items[i]);
}

Console.WriteLine("Items were compared {0} times.",
locals.comparisonCount);
}
}


Notice that the captured local variable is never “passed” anywhere and is never “copied” anywhere. Rather, the captured local variable (comparisonCount) is a single variable whose lifetime the compiler has extended by implementing it as an instance field rather than as a local variable. All usages of the local variable are rewritten to be usages of the field.

The generated class, __LocalsDisplayClass, is a closure—a data structure (class in C#) that contains an expression and the variables (public fields in C#) necessary to evaluate the expression.


End 3.0

Begin 5.0


Advanced Topic: Accidentally Capturing Loop Variables

What do you think the output of Listing 12.22 should be?

LISTING 12.22: Capturing Loop Variables in C# 5.0


class CaptureLoop
{
static void Main()
{
var items = new string[] { "Moe", "Larry", "Curly" };
var actions = new List<Action>();
foreach (string item in items)
{
actions.Add( ()=> { Console.WriteLine(item); } );
}
foreach (Action action in actions)
{
action();
}
}
}


Most people expect that the output will be as shown in Output 12.3, and in C# 5.0 it is. In previous versions of C#, however, the output is as shown in Output 12.4.

OUTPUT 12.3: C# 5.0 Output

Moe
Larry
Curly

OUTPUT 12.4: C# 4.0 Output

Curly
Curly
Curly

A lambda expression captures a variable and always uses the latest value of the variable; it does not capture and preserve the value that the variable had when the delegate was created. This is normally what you want—after all, the whole point of capturing comparisonCount inListing 12.20 was to ensure that its latest value would be used when it was incremented. Loop variables are no different; when you capture a loop variable, every delegate captures the same loop variable. When the loop variable changes, every delegate that captured this loop variable sees the change. The C# 4.0 behavior is therefore justified, but is almost never what the author of the code wants.

In C# 5.0, the C# language was changed so that the loop variable of a foreach loop is now considered to be a “fresh” variable every time the loop iterates; therefore, each delegate creation captures a different variable, rather than all iterations sharing the same variable. This change was not applied to the for loop, however: If you write similar code using a for loop, any loop variable declared in the header of the for statement will be considered a single outer variable when captured. If you need to write code that works the same in both C# 5.0 and previous C# versions, use the pattern shown in Listing 12.23.

LISTING 12.23: Loop Variable Capture Workaround before C# 5.0


class DoNotCaptureLoop
{
static void Main()
{
var items = new string[] { "Moe", "Larry", "Curly" };
var actions = new List<Action>();
foreach (string item in items)
{
string _item = item;
actions.Add(
()=> { Console.WriteLine(_item); } );
}
foreach (Action action in actions)
{
action();
}
}
}


Now there is clearly one fresh variable per loop iteration; each delegate is, in turn, closed over a different variable.


Guidelines

AVOID capturing loop variables in anonymous functions.



End 5.0

Expression Trees

Thus far we’ve seen that lambda expressions are a succinct syntax for declaring an “inline” method that can be converted to a delegate type. Expression lambdas (but not statement lambdas or anonymous methods) can also be converted to expression trees. A delegate is an object that enables you to pass around a method like any other object and invoke it at any time. An expression tree is an object that enables you to pass around the compiler’s analysis of the lambda body. But why would you ever need that capability? Obviously, the compiler’s analysis is useful to the compilerwhen generating the CIL, but why is it useful to the developer to have an object representing that analysis at execution time? Let’s take a look at an example.

Using Lambda Expressions As Data

Consider the lambda expression in the following code:

persons.Where(
person => person.Name.ToUpper() == "INIGO MONTOYA");

Suppose that persons is an array of Persons, and the formal parameter of the Where method that corresponds to the lambda expression argument is of delegate type Func<Person, bool>. The compiler emits a method that contains the code in the body of the lambda. It generates code that creates a delegate to the emitted method and passes the delegate to the Where method. The Where method returns a query object that, when executed, applies the delegate to each member of the array to determine the query results.

Now suppose that persons is not of type Person[], but rather is an object that represents a remote database table containing data on millions of people. Information about each row in the table can be streamed from the server to the client, and the client can then create a Personobject corresponding to that row. The call to Where returns an object that represents the query. When the results of that query are requested on the client, how are the results determined?

One technique would be to transmit several million rows of data from the server to the client. You could create a Person object from each row, create a delegate from the lambda, and execute the delegate on every Person. This is conceptually no different from the array scenario, but it is far, far more expensive.

A second, much better technique is to somehow send the meaning of the lambda (“filter out every row that names a person other than Inigo Montoya”) to the server. Database servers are optimized to rapidly perform this sort of filtering. The server can then choose to stream only the tiny number of matching rows to the client; instead of creating millions of Person objects and rejecting almost all of them, the client creates only those objects that already match the query, as determined by the server. But how does the meaning of the lambda get sent to the server?

This scenario is the motivation for adding expression trees to the language. Lambda expressions converted to expression trees become objects that represent data that describes the lambda expression, rather than compiled code that implements an anonymous function. Since the expression tree represents data rather than compiled code, it is possible to analyze the lambda at execution time and use that information to construct a query that executes on a database, for example. The expression tree received by Where() might be converted into a SQL query that is passed to a database, as shown in Listing 12.24.

LISTING 12.24: Converting an Expression Tree to a SQL where Clause

Image

The expression tree passed to the Where() call says that the lambda argument consists of the following elements:

• A read of the Name property of a Person object

• A call to a string method called ToUpper()

• A constant value, "INIGO MONTOYA"

• An equality operator, ==

The Where() method takes this data and converts it to the SQL where clause by examining the data and building a SQL query string. However, SQL is just one possibility; you can build an expression tree evaluator that converts expressions to any query language.

Expression Trees Are Object Graphs

At execution time, a lambda converted to an expression tree becomes an object graph containing objects from the System.Linq.Expressions namespace. The “root” object in the graph represents the lambda itself. This object refers to objects representing the parameters, a return type, and body expression, as shown in Figure 12.3. The object graph contains all the information that the compiler deduced about the lambda. That information can then be used at execution time to create a query. Alternatively, the root lambda expression has a method, Compile, that generates CIL “on the fly” and creates a delegate that implements the described lambda.

Image

FIGURE 12.3: The Lambda Expression Tree Type

Figure 12.4 shows the types found in object graphs for a unary and binary expression in the body of a lambda.

Image

FIGURE 12.4: Unary and Binary Expression Tree Types

A UnaryExpression represents an expression such as –count. It has a single child Operand of type Expression. A BinaryExpression has two child expressions, Left and Right. Both types have a NodeType property that identifies the specific operator, and both inherit from the base class Expression. There are another 30 or so expression types, such as NewExpression, ParameterExpression, MethodCallExpression, and LoopExpression, to represent (almost) every possible expression in C# and Visual Basic.

Delegates versus Expression Trees

The validity of a lambda expression is verified at compile time with a full semantic analysis, whether it is converted to a delegate or an expression tree. A lambda that is converted to a delegate causes the compiler to emit the lambda as a method, and generates code that creates a delegate to that method at execution time. A lambda that is converted to an expression tree causes the compiler to generate code that creates an instance of LambdaExpression at execution time. But when using the Language Integrated Query (LINQ) API, how does the compiler know whether to generate a delegate, to execute a query locally, or to generate an expression tree so that information about the query can be sent to the remote database server?

The methods used to build LINQ queries, such as Where(), are extension methods. The versions of those methods that extend the IEnumerable<T> interface take delegate parameters; the methods that extend the IQueryable<T> interface take expression tree parameters. The compiler, therefore, can use the type of the collection that is being queried to determine whether to create delegates or expression trees from lambdas supplied as arguments.

Consider, for example, the Where() method in the following code:

persons.Where( person => person.Name.ToUpper() ==
"INIGO MONTOYA");

The extension method signature declared in the System.Linq.Enumerable class is

public IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> collection,
Func<TSource, bool> predicate);

The extension method signature declared in the System.Linq.Queryable class is

public IQueryable<TSource> Where<TSource>(
this IQueryable<TSource> collection,
Expression<Func<TSource, bool>> predicate);

The compiler decides which extension method to use based on the compile-time type of persons; if it is a type convertible to IQueryable<Person>, the method from System.Linq.Queryable is chosen. It converts the lambda to an expression tree. At execution time, the object referred to by persons receives the expression tree data and might use that data to build a SQL query, which is then passed to the database when the results of the query are requested. The result of the call to Where is an object that, when asked for query results, sends the query to the database and produces the results.

If persons cannot be converted implicitly to IQueryable<Person> but can be converted implicitly to IEnumerable<Person>, the method from System.Linq.Enumerable is chosen, and the lambda is converted to a delegate. The result of the call to Where is an object that, when asked for query results, applies the generated delegate as a predicate to every member of the collection and produces the results that match the predicate.

Examining an Expression Tree

As we’ve seen, converting a lambda expression to an Expression<TDelegate> creates an expression tree rather than a delegate. We have seen previously in this chapter how to convert a lambda such as (x,y)=>x>y to a delegate type such as Func<int, int, bool>. To turn this same lambda into an expression tree, we simply convert it to Expression<Func<int, int, bool>>, as shown in Listing 12.25. We can then examine the generated object and display information about its structure, as well as that of a more complex expression tree.

Note that passing an instance of expression tree to Console.WriteLine() automatically converts the expression tree to a descriptive string form; the objects generated for expression trees all override ToString() so that you can see at a glance what the contents of an expression tree are when debugging.

LISTING 12.25: Examining an Expression Tree


using System;
using System.Linq.Expressions;

public class Program
{
public static void Main()
{
Expression<Func<int, int, bool>> expression;
expression = (x, y) => x > y;
Console.WriteLine("------------- {0} -------------",
expression);
PrintNode(expression.Body, 0);
Console.WriteLine();
Console.WriteLine();
expression = (x, y) => x * y > x + y;
Console.WriteLine("------------- {0} -------------",
expression);
PrintNode(expression.Body, 0);
}
public static void PrintNode(Expression expression,
int indent)
{
if (expression is BinaryExpression)
PrintNode(expression as BinaryExpression, indent);
else
PrintSingle(expression, indent);
}
private static void PrintNode(BinaryExpression expression,
int indent)
{
PrintNode(expression.Left, indent + 1);
PrintSingle(expression, indent);
PrintNode(expression.Right, indent + 1);
}
private static void PrintSingle(
Expression expression, int indent)
{
Console.WriteLine("{0," + indent * 5 + "}{1}",
"", NodeToString(expression));
}
private static string NodeToString(Expression expression)
{
switch (expression.NodeType)
{
case ExpressionType.Multiply:
return "*";
case ExpressionType.Add:
return "+";
case ExpressionType.Divide:
return "/";
case ExpressionType.Subtract:
return "-";
case ExpressionType.GreaterThan:
return ">";
case ExpressionType.LessThan:
return "<";
default:
return expression.ToString() +
" (" + expression.NodeType.ToString() + ")";
}
}
}


In Output 12.5, we see that the Console.WriteLine() statements within Main() print out the body of the expression trees as text.

OUTPUT 12.5

------------- (x, y) => (x > y) -------------
x (Parameter)
>
y (Parameter)


------------- (x, y) => ((x * y) > (x + y)) -------------
x (Parameter)
*
y (Parameter)
>
x (Parameter)
+
y (Parameter)

The important point to note is that an expression tree is a collection of data, and by iterating over the data, it is possible to convert the data to another format; in this case we convert the expression tree to descriptive strings, but it could also be converted to expressions in another query language.

Using recursion, the PrintNode() function demonstrates that nodes in an expression tree are themselves trees containing zero or more child expression trees. The “root” tree that represents the lambda refers to the expression that is the body of the lambda with its Body property. Every expression tree node includes a NodeType property of enumerated type ExpressionType that describes what kind of expression it is. Numerous types of expressions exist: BinaryExpression, ConditionalExpression, LambdaExpression, MethodCallExpression,ParameterExpression, and ConstantExpression are examples. Each type derives from Expression.

Note that, although the expression tree library now contains objects to represent most of the statements of C# and Visual Basic, neither language supports the conversion of statement lambdas to expression trees. Only expression lambdas can be converted to expression trees.

Summary

This chapter began with a discussion of delegates and their use as references to methods or callbacks. This powerful concept enables you to pass a set of instructions to call in a different location, rather than immediately, when coding the instructions.

The concept of lambda expressions is a syntax that supersedes (but does not eliminate) the C# 2.0 anonymous method syntax. These constructs allow programmers to assign a set of instructions to a variable directly, without defining an explicit method that contains the instructions. This construct provides significant flexibility for programming instructions dynamically within the method—a powerful concept that greatly simplifies the programming of collections through the LINQ API.

The chapter ended with a discussion of the concept of expression trees, and a consideration of how they compile into objects that represent the semantic analysis of a lambda expression, rather than the delegate implementation itself. This important feature supports such libraries as the Entity Framework and LINQ to XML—that is, libraries that interpret the expression tree and use it within contexts other than CIL.

Lambda expressions encompass both statement lambdas and expression lambdas. In other words, both statement lambdas and expression lambdas are types of lambda expressions.

One thing that the chapter mentioned but did not elaborate on was multicast delegates. The next chapter investigates multicast delegates in detail and explains how they enable the publish–subscribe pattern with events.