Building Custom Collections - Essential C# 6.0 (2016)

Essential C# 6.0 (2016)

16. Building Custom Collections

Begin 2.0

Chapter 14 covered the standard query operators—that is, the extension methods on IEnumerable<T> that provide methods common to all collections. However, these operators do not make all collections equally suited for all tasks; there is still a need for different collection types. Some collections are better suited to searching by key, whereas others are better suited to accessing items by position. Some collections act like queues: The first element in is the first out. Others are more like stacks: The first element in is the last out. Others are not ordered at all.

The .NET Framework provides a plethora of collection types suited for many of the scenarios in which collections are needed. This chapter provides an introduction to some of these collection types and the interfaces they implement. It also describes how to create custom-built collections that support standard functionality, such as indexing. In addition, it explores the use of the yield return statement to create classes and methods that implement IEnumerable<T>. This C# 2.0 feature greatly simplifies implementation of collections that can be enumerated with theforeach statement.

Image

Many nongeneric collection classes and interfaces are available in the .NET Framework, but in general these exist today only for backward compatibility with code written before generics came into use. The generic collection types are both faster, because they avoid boxing costs, and more type-safe than the nongeneric collections. Thus, new code should almost always use the generic collection types exclusively. Throughout this book, we assume that you are primarily using generic collection types.

More Collection Interfaces

We’ve already seen how collections implement IEnumerable<T>, the primary interface that enables iteration over the elements of a collection. Many additional interfaces exist that are implemented by more complex collections. Figure 16.1 shows the hierarchy of interfaces implemented by collection classes.

Image

FIGURE 16.1: Generic Collection Interface Hierarchy

These interfaces provide a standard way to perform common tasks such as iterating, indexing, and counting elements in a collection. This section examines these interfaces (at least all of the generic ones), starting at the bottom of Figure 16.1 and moving upward.

IList<T> versus IDictionary<TKey, TValue>

An English-language dictionary can be thought of as a collection of definitions. A specific definition can be rapidly accessed by looking up its associated “key”—that is, the word being defined. A dictionary collection class is similarly a collection of values, in which each value can be rapidly accessed by using its associated unique key. Note, however, that a language dictionary typically stores the definitions sorted alphabetically by key; a dictionary class might choose to do so but typically does not. Dictionary collections are best thought of as an unordered list of keys and associated values unless specifically documented as being ordered. Similarly, one does not normally think of looking up “the sixth definition in the dictionary”; dictionary classes usually provide indexing only by key, not by position.

A list, by contrast, stores values in a specific order, and accesses them by their position. In a sense, lists are just the special case of dictionaries where the “key” is always an integer and the “key set” is always a contiguous set of non-negative integers starting with zero. Nevertheless, that is a strong enough difference that it is worth having an entirely different type to represent it.

Thus, when selecting a collection class to solve some data storage or retrieval problem, the first two interfaces to look for are IList<T> and IDictionary<TKey, TValue>. These interfaces indicate whether the collection type is focused on retrieval of a value when given its positional index or retrieval of a value when given its associated key.

Both of these interfaces require that a class that implements them provide an indexer. In the case of IList<T>, the operand of the indexer corresponds to the position of the element being retrieved: The indexer takes an integer and gives you access to the nth element in the list. In the case of the IDictionary<TKey, TValue> interface, the operand of the indexer corresponds to the key associated with a value, and gives you access to that value.

ICollection<T>

Both IList<T> and IDictionary<TKey, TValue> implement ICollection<T>. A collection that does not implement either IList<T> or IDictionary<TKey, TValue> will more than likely implement ICollection<T> (although not necessarily, because collections could implement the lesser requirement of IEnumerable or IEnumerable<T>). ICollection<T> is derived from IEnumerable<T> and includes two members: Count and CopyTo().

• The Count property returns the total number of elements in the collection. Initially, it might appear that this would be sufficient to iterate through each element in the collection using a for loop, but, in fact, the collection would also need to support retrieval by index, which theICollection<T> interface does not include (although IList<T> does include it).

• The CopyTo() method provides the ability to convert the collection into an array. This method includes an index parameter so that you can specify where to insert elements in the target array. To use the method, you must initialize the array target with sufficient capacity, starting at the index, to contain all the elements in ICollection<T>.

Primary Collection Classes

Five key categories of collection classes exist, and they differ from one another in terms of how data is inserted, stored, and retrieved. Each generic class is located in the System.Collections.Generic namespace, and their nongeneric equivalents are found in theSystem.Collections namespace.

List Collections: List<T>

The List<T> class has properties similar to an array. The key difference is that these classes automatically expand as the number of elements increases. (In contrast, an array size is constant.) Furthermore, lists can shrink via explicit calls to TrimToSize() or Capacity (see Figure 16.2).

Image

FIGURE 16.2: List<> Class Diagrams

These classes are categorized as list collections whose distinguishing functionality is that each element can be individually accessed by index, just like an array. Therefore, you can set and access elements in the list collection classes using the index operator, where the index parameter value corresponds to the position of an element in the collection. Listing 16.1 shows an example, and Output 16.1 shows the results.

LISTING 16.1: Using List<T>


using System;
using System.Collections.Generic;

class Program
{
static void Main()
{
List<string> list = new List<string>();

// Lists automatically expand as elements
// are added.
list.Add("Sneezy");
list.Add("Happy");
list.Add("Dopey");
list.Add("Doc");
list.Add("Sleepy");
list.Add("Bashful");
list.Add("Grumpy");

list.Sort();

Console.WriteLine(
$"In alphabetical order { list[0] } is the "
+ $"first dwarf while { list[6] } is the last.");

list.Remove("Grumpy");
}
}


OUTPUT 16.1

In alphabetical order Bashful is the first dwarf while Sneezy is the last.

C# is zero-index based; therefore, index 0 in Listing 16.1 corresponds to the first element and index 6 indicates the seventh element. Retrieving elements by index does not involve a search. Rather, it entails a quick and simple “jump” operation to a location in memory.

A List<T> is an ordered collection; the Add() method appends the given item to the end of the list. Before the call to Sort() in Listing 16.1, "Sneezy" was first and "Grumpy" was last; after the call, the list is sorted into alphabetical order, rather than the order in which items were added. Some collections automatically sort elements as they are added, but List<T> is not one of them; an explicit call to Sort() is required for the elements to be sorted.

To remove an element, you use the Remove() or RemoveAt() method, to either remove a given element or remove whatever element is at a particular index, respectively.


Advanced Topic: Customizing Collection Sorting

You might have wondered how the List<T>.Sort() method in Listing 16.1 knew how to sort the elements of the list into alphabetical order. The string type implements the IComparable<string> interface, which has one method, CompareTo(). It returns an integer indicating whether the element passed is greater than, less than, or equal to the current element. If the element type implements the generic IComparable<T> interface (or the nongeneric IComparable interface), the sorting algorithm will, by default, use it to determine the sorted order.

But what if either the element type does not implement IComparable<T> or the default logic for comparing two things does not meet your needs? To specify a nondefault sort order, you can call the overload of List<T>.Sort(), which takes an IComparer<T> as an argument.

The difference between IComparable<T> and IComparer<T> is subtle but important. The first interface means, “I know how to compare myself to another instance of my type.” The latter means, “I know how to compare two things of a given type.”

The IComparer<T> interface is typically used when there are many different possible ways of sorting a data type and none is obviously the best. For example, you might have a collection of Contact objects that you sometimes want to sort by name, by location, by birthday, by geographic region, or by any number of other possibilities. Rather than choosing sorting strategy and making the Contact class implement IComparable<Contact>, it might be wiser to create several different classes that implement IComparer<Contact>. Listing 16.2shows a sample implementation of a LastName, FirstName comparison.

LISTING 16.2: Implementing IComparer<T>


class Contact
{
public string FirstName { get; private set; }
public string LastName { get; private set; }
public Contact(string firstName, string lastName)
{
this.FirstName = firstName;
this.LastName = lastName;
}
}


using System;
using System.Collections.Generic;

class NameComparison : IComparer<Contact>
{
public int Compare(Contact x, Contact y)
{
if (Object.ReferenceEquals(x, y))
return 0;
if (x == null)
return 1;
if (y == null)
return -1;
int result = StringCompare(x.LastName, y.LastName);
if (result == 0)
result = StringCompare(x.FirstName, y.FirstName);
return result;
}

private static int StringCompare(string x, string y)
{
if (Object.ReferenceEquals(x, y))
return 0;
if (x == null)
return 1;
if (y == null)
return -1;
return x.CompareTo(y);
}
}


To sort a List<Contact> by last name and then first name, you can call contactList.Sort(new NameComparer()).


Total Ordering

You are required to produce a total order when implementing IComparable<T> or IComparer<T>. Your implementation of CompareTo must provide a fully consistent ordering for any possible pair of items. This ordering is required to have a number of basic characteristics. For example, every element is required to be considered equal to itself. If an element X is considered to be equal to element Y, and element Y is considered to be equal to element Z, all three elements X, Y, and Z must be considered equal to one another. If an element X is considered to be greater than Y, Y must be considered to be less than X. And there must be no “transitivity paradoxes”—that is, you cannot have X greater than Y, Y greater than Z, and Z greater than X. If you fail to provide a total ordering, the action of the sort algorithm is undefined; it may produce a crazy ordering, it may crash, it may go into an infinite loop, and so on.

Notice, for example, how the comparer in Listing 16.2 ensures a total order, even if the arguments are null references. It would not be legal to say, “If either element is null, then return zero,” for example, because then two non-null things could be equal to null but not equal to each other.


Guidelines

DO ensure that custom comparison logic produces a consistent “total order.”


Searching a List<T>

To search List<T> for a particular element, you use the Contains(), IndexOf(), LastIndexOf(), and BinarySearch() methods. The first three methods search through the array, starting at the first element (or the last element for LastIndexOf()), and examine each element until the desired one is found. The execution time for these algorithms is proportional to the number of elements searched before a hit occurs. (Be aware that the collection classes do not require that all the elements within the collection are unique. If two or more elements in the collection are the same, IndexOf() returns the first index and LastIndexOf() returns the last index.)

BinarySearch() uses a much faster binary search algorithm but requires that the elements be sorted. A useful feature of the BinarySearch() method is that if the element is not found, a negative integer is returned. The bitwise complement (~) of this value is the index of the next element larger than the element being sought, or the total element count if there is no greater value. This provides a convenient means to insert new values into the list at the specific location so as to maintain sorting. Listing 16.3 provides an example.

LISTING 16.3: Using the Bitwise Complement of the BinarySearch() Result


using System;
using System.Collections.Generic;

class Program
{
static void Main()
{
List<string> list = new List<string>();
int search;

list.Add("public");
list.Add("protected");
list.Add("private");

list.Sort();

search = list.BinarySearch("protected internal");
if (search < 0)
{
list.Insert(~search, "protected internal");
}

foreach (string accessModifier in list)
{
Console.WriteLine(accessModifier);
}
}
}


Beware that if the list is not first sorted, an element will not necessarily be found with this code, even if it is in the list. The results of Listing 16.3 appear in Output 16.2.

OUTPUT 16.2

private
protected
protected internal
public


Advanced Topic: Finding Multiple Items with FindAll()

Sometimes you must find multiple items within a list and your search criteria are more complex than merely looking for specific values. To support this, System.Collections.Generic.List<T> includes a FindAll() method. FindAll() takes a parameter of typePredicate<T>, which is a reference to a method called a delegate. Listing 16.4 demonstrates how to use the FindAll() method.

LISTING 16.4: Demonstrating FindAll() and Its Predicate Parameter


using System;
using System.Collections.Generic;

class Program
{
static void Main()
{
List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(2);

List<int> results = list.FindAll(Even);

foreach(int number in results)
{
Console.WriteLine(number);
}
}

public static bool Even(int value) =>
(value % 2) == 0;
}


In Listing 16.4’s call to FindAll(), you pass a delegate instance, Even(). This method returns true when the integer argument value is even. FindAll() takes the delegate instance and calls into Even() for each item within the list (this listing uses C# 2.0’s delegate type inferencing). Each time the return value is true, it adds it to a new List<T> instance and then returns this instance once it has checked each item within list. A complete discussion of delegates occurs in Chapter 12.


Dictionary Collections: Dictionary<TKey, TValue>

Another category of collection classes is the dictionary classes—specifically, Dictionary<TKey, TValue> (see Figure 16.3). Unlike the list collections, dictionary classes store name/value pairs. The name functions as a unique key that can be used to look up the corresponding element in a manner similar to that of using a primary key to access a record in a database. This adds some complexity to the access of dictionary elements, but because lookups by key are efficient operations, this is a useful collection. Note that the key may be any data type, not just a string or a numeric value.

Image

FIGURE 16.3: Dictionary Class Diagrams

One option for inserting elements into a dictionary is to use the Add() method, passing both the key and the value, as shown in Listing 16.5.

LISTING 16.5: Adding Items to a Dictionary<TKey, TValue>


using System;
using System.Collections.Generic;

class Program
{
static void Main()
{
// C# 6.0 (use {"Error", ConsoleColor.Red} pre-C# 6.0)
var colorMap = new Dictionary<string, ConsoleColor>
{
["Error"] = ConsoleColor.Red,
["Warning"] = ConsoleColor.Yellow,
["Information"] = ConsoleColor.Green
};

colorMap.Add("Verbose", ConsoleColor.White);
// ...
}
}


After initializing the dictionary with a C# 6.0 dictionary initializer (see the section “Collection Initializers” in Chapter 14), Listing 16.5 inserts the string a ConsoleColor of white for the key of "Verbose." If an element with the same key has already been added, an exception is thrown.

An alternative for adding elements is to use the indexer, as shown in Listing 16.6.

LISTING 16.6: Inserting Items in a Dictionary<TKey, TValue> Using the Index Operator


using System;
using System.Collections.Generic;

class Program
{
static void Main()
{
// C# 6.0 (use {"Error", ConsoleColor.Red} pre-C# 6.0)
var colorMap = new Dictionary<string, ConsoleColor>
{
["Error"] = ConsoleColor.Red,
["Warning"] = ConsoleColor.Yellow,
["Information"] = ConsoleColor.Green
};

colorMap["Verbose"] = ConsoleColor.White;
colorMap["Error"] = ConsoleColor.Cyan;

// ...
}
}


The first thing to observe in Listing 16.6 is that the index operator does not require an integer. Instead, the index operand type is specified by the first type argument (string), and the type of the value that is set or retrieved by the indexer is specified by the second type argument (ConsoleColor).

The second thing to notice in Listing 16.6 is that the same key ("Error") is used twice. In the first assignment, no dictionary value corresponds to the given key. When this happens, the dictionary collection classes insert a new value with the supplied key. In the second assignment, an element with the specified key already exists. Instead of inserting an additional element, the prior ConsoleColor value for the "Error" key is replaced ConsoleColor.Cyan.

Attempting to read a value from a dictionary with a nonexistent key throws a KeyNotFoundException. The ContainsKey() method allows you to check whether a particular key is used before accessing its value, thereby avoiding the exception.

The Dictionary<TKey, TValue> is implemented as a “hash table”; this data structure provides extremely fast access when searching by key, regardless of the number of values stored in the dictionary. By contrast, checking whether there is a particular value in the dictionary collections is a time-consuming operation with linear performance characteristics, much like searching an unsorted list. To do this you use the ContainsValue() method, which searches sequentially through each element in the collection.

You remove a dictionary element using the Remove() method, passing the key, not the element value.

Because both the key and the value are required to add a value to the dictionary, the loop variable of a foreach loop that enumerates elements of a dictionary must be KeyValuePair<TKey, TValue>. Listing 16.7 shows a snippet of code demonstrating the use of a foreach loop to enumerate the keys and values in a dictionary. The output appears in Output 16.3.

LISTING 16.7: Iterating over Dictionary<TKey, TValue> with foreach


using System;
using System.Collections.Generic;

class Program
{
static void Main()
{
// C# 6.0 (use {"Error", ConsoleColor.Red} pre-C# 6.0)
Dictionary<string, ConsoleColor> colorMap =
new Dictionary<string, ConsoleColor>
{
["Error"] = ConsoleColor.Red,
["Warning"] = ConsoleColor.Yellow,
["Information"] = ConsoleColor.Green,
["Verbose"] = ConsoleColor.White
};

Print(colorMap);
}

private static void Print(
IEnumerable<KeyValuePair<string, ConsoleColor>> items)
{
foreach (KeyValuePair<string, ConsoleColor> item in items)
{
Console.ForegroundColor = item.Value;
Console.WriteLine(item.Key);
}
}
}


OUTPUT 16.3

Error
Warning
Information
Verbose

Note that the order of the items shown here is the order in which the items were added to the dictionary, just as if they had been added to a list. Implementations of dictionaries will often enumerate the keys and values in the order in which they were added to the dictionary, but this feature is neither required nor documented, so you should not rely on it.


Guidelines

DO NOT make any unwarranted assumptions about the order in which elements of a collection will be enumerated. If the collection is not documented as enumerating its elements in a particular order, it is not guaranteed to produce elements in any particular order.


If you want to deal only with keys or only with elements within a dictionary class, they are available via the Keys and Values properties, respectively. The data type returned from these properties is of type ICollection<T>. The data returned by these properties is a reference to the data within the original dictionary collection, rather than a copy; changes within the dictionary are automatically reflected in the collection returned by the Keys and Values properties.


Advanced Topic: Customizing Dictionary Equality

To determine whether a given key matches any existing key in the dictionary, the dictionary must be able to compare two keys for equality. This is analogous to the way that lists must be able to compare two items to determine their order. (For an example, see the Advanced Topic, “Customizing Collection Sorting,” earlier in this chapter.) By default, two instances of a value type are compared by checking whether they contain exactly the same data, and two instances of a reference type are compared to see whether both reference the same object. However, it is occasionally necessary to be able to compare two instances as equal even if they are not exactly the same value or exactly the same reference.

For example, suppose you wish to create a Dictionary<Contact, string> using the Contact type from Listing 16.2. However, you want any two Contact objects to compare as equal if they have the same first and last names, regardless of whether the two objects are reference equal. Much as you can provide an implementation of IComparer<T> to sort a list, so you can similarly provide an implementation of IEqualityComparer<T> to determine if two keys are to be considered equal. This interface requires two methods: one that returns whether two items are equal, and one that returns a “hash code” that the dictionary can use to facilitate fast indexing. Listing 16.8 shows an example.

LISTING 16.8: Implementing IEqualityComparer<T>


using System;
using System.Collections.Generic;

class ContactEquality : IEqualityComparer<Contact>
{
public bool Equals(Contact x, Contact y)
{
if (Object.ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
return x.LastName == y.LastName &&
x.FirstName == y.FirstName;
}

public int GetHashCode(Contact x)
{
if (Object.ReferenceEquals(x, null))
return 0;
int h1 = x.FirstName == null ? 0 : x.FirstName.GetHashCode();
int h2 = x.LastName == null ? 0 : x.LastName.GetHashCode();
return h1 * 23 + h2;
}
}


To create a dictionary that uses this equality comparer, you can use the constructor new Dictionary<Contact, string>(new ContactEquality).



Beginner Topic: Requirements of Equality Comparisons

As discussed in Chapter 9, Well-Formed Types, there are several important rules for the equality and hash code algorithms. Conformance to these rules is critical in the context of collections. Just as correctly sorting a list requires a custom ordering comparison to provide a total order, so too does a hash table require certain guarantees to be met by a custom equality comparison. The most important requirement is that if Equals() returns true for two objects, GetHashCode() must return the same value for those two objects. Note that the converse is not true: Two unequal items may have the same hash code. (Indeed, there must be two unequal items that have the same hash code because there are only 232 possible hash codes, but more than that many unequal objects!)

The second most important requirement is that two calls to GetHashCode() on the same item must produce the same result for at least as long as the item is in the hash table. Note, however, that two objects that “look equal” are not required to give the same hash code in two separate runs of a program. For example, it is perfectly legal for a given contact to be assigned one hash code today, and two weeks later when you run the program a second time for “the same” contact to be given a different hash code. Do not persist hash codes into a database and expect them to remain stable across different runs of a program.

Ideally, the result of GetHashCode() should appear to be “random.” That is, small changes to the input should cause large changes to the output, and the result should be distributed roughly evenly across all possible integer values. It is difficult, however, to devise a hash algorithm that is extremely fast and produces extremely well-distributed output; try to find a good middle ground.

Finally, GetHashCode() and Equals() must not throw exceptions. Notice how the code in Listing 16.8 is careful to never dereference a null reference, for example.

To summarize, here are the key principles:

• Equal objects must have equal hash codes.

• The hash code of an object should not change for the life of the instance (at least while it is in the hash table).

• The hashing algorithm should quickly produce a well-distributed hash.

• The hashing algorithm should avoid throwing exceptions in all possible object states.


Sorted Collections: SortedDictionary<TKey, TValue> and SortedList<T>

The sorted collection classes (see Figure 16.4) store their elements sorted by key for SortedDictionary<TKey, TValue> and by value for SortedList<T>. If we change the code in Listing 16.7 to use a SortedDictionary<string, string> instead of aDictionary<string, string>, the output of the program is as appears in Output 16.4.

OUTPUT 16.4

Error
Information
Verbose
Warning

Image

FIGURE 16.4: SortedList<> and SortedDictionary<> Class Diagrams

Note that the elements are sorted into order by key, not by value.

Because sorted collections must do extra work to maintain the sorted order of their elements, insertion and removal are typically slightly slower than insertion and removal of values in an unordered dictionary.

Because sorted collections must store their items in a particular order, it is possible to access values both by key and by index. To access a key or value by its index in the sorted list, use the Keys and Values properties. They return IList<TKey> and IList<TValue> instances, respectively; the resultant collection can be indexed like any other list.

Stack Collections: Stack<T>

Chapter 11 discussed the stack collection classes (see Figure 16.5). The stack collection classes are designed as “last in, first out” (LIFO) collections. The two key methods are Push() and Pop().

• Push() inserts elements into the collection. The elements do not have to be unique.

• Pop() removes elements in the reverse order in which they were added.

Image

FIGURE 16.5: Stack<T> Class Diagram

To access the elements on the stack without modifying the stack, you use the Peek() and Contains() methods. The Peek() method returns the next element that Pop() will retrieve.

As with most collection classes, you use the Contains() method to determine whether an element exists anywhere in the stack. As with all collections, it is also possible to use a foreach loop to iterate over the elements in a stack. This allows you to access values from anywhere in the stack. Note, however, that accessing a value via the foreach loop does not remove it from the stack—only Pop() provides this functionality.

Queue Collections: Queue<T>

Queue collection classes, shown in Figure 16.6, are identical to stack collection classes, except that they follow the ordering pattern of “first in, first out” (FIFO). In place of the Pop() and Push() methods are the Enqueue() and Dequeue() methods. The queue collection behaves like a pipe: You place objects into the queue at one end using the Enqueue() method and remove them from the other end using the Dequeue() method. As with stack collection classes, the objects do not have to be unique, and queue collection classes automatically increase in size as required. As a queue shrinks, it does not necessarily reclaim the storage space previously used, because that would make inserting a new element potentially more expensive. If you happen to know that a queue will remain the same size for a long time, however, you can hint to it that you would like to reclaim storage space by using the TrimToSize() method.

Image

FIGURE 16.6: Queue<T> Class Diagram

Linked Lists: LinkedList<T>

System.Collections.Generic also supports a linked list collection that enables both forward and reverse traversal. Figure 16.7 shows the class diagram. (There is no corresponding nongeneric type.)

Image

FIGURE 16.7: LinkedList<T> and LinkedListNode<T> Class Diagrams

Providing an Indexer

Arrays, dictionaries, and lists all provide an indexer as a convenient way to get or set a member of a collection based on a key or index. As we’ve seen, to use the indexer you simply put the index (or indices) in square brackets after the collection name. It is possible to define your own indexer; Listing 16.9 shows an example using Pair<T>.

LISTING 16.9: Defining an Indexer


interface IPair<T>
{
T First { get; }
T Second { get; }

T this[PairItem index] { get; }
}


public enum PairItem
{
First,
Second
}

public struct Pair<T> : IPair<T>
{
public Pair(T first, T second)
{
First = first;
Second = second;
}
public T First { get; } // C# 6.0 Getter-only Autoproperty
public T Second { get; } // C# 6.0 Getter-only Autoproperty

public T this[PairItem index]
{
get
{
switch (index)
{
case PairItem.First:
return First;
case PairItem.Second:
return Second;
default :
throw new NotImplementedException(
string.Format(
"The enum {0} has not been implemented",
index.ToString()));
}
}
}
}


An indexer is declared much as a property is declared, except that instead of the name of the property, you use the keyword this followed by a parameter list in square brackets. The body is also like a property, with get and set blocks. As Listing 16.9 shows, the parameter does not have to be an int. In fact, the index can take multiple parameters and can even be overloaded. This example uses an enum to reduce the likelihood that callers will supply an index for a nonexistent item.

The CIL code that the C# compiler creates from an index operator is a special property called Item that takes an argument. Properties that accept arguments cannot be created explicitly in C#, so the Item property is unique in this aspect. Any additional member with the identifier Item, even if it has an entirely different signature, will conflict with the compiler-created member, so it will not be allowed.


Advanced Topic: Assigning the Indexer Property Name Using IndexerName

As indicated earlier, the CIL property name for an indexer defaults to Item. Using the IndexerNameAttibute, you can specify a different name, however. Listing 16.10, for example, changes the name to "Entry".

LISTING 16.10: Changing the Indexer’s Default Name


[System.Runtime.CompilerServices.IndexerName("Entry")]
public T this[params PairItem[] branches]
{
// ...
}


This makes no difference to C# callers of the index, but it specifies the name for languages that do not support indexers directly.

This attribute is merely an instruction to the compiler to use a different name for the indexer; the attribute is not actually emitted into metadata by the compiler, so it is not available via reflection.



Advanced Topic: Defining an Index Operator with Variable Parameters

An index operator can also take a variable parameter list. For example, Listing 16.11 defines an index operator for BinaryTree<T>, discussed in Chapter 11 (and again in the next section).

LISTING 16.11: Defining an Index Operator with Variable Parameters


using System;

public class BinaryTree<T>
{

// ...

public BinaryTree<T> this[params PairItem[] branches]
{
get
{
BinaryTree<T> currentNode = this;

// Allow either an empty array or null
// to refer to the root node.
int totalLevels = branches?.Length ?? 0;
int currentLevel = 0;

while (currentLevel < totalLevels)
{
System.Diagnostics.Debug.Assert(branches != null,
$"{ nameof(branches) } != null");
currentNode = currentNode.SubItems[
branches[currentLevel]];
if (currentNode == null)
{
// The binary tree at this location is null.
throw new IndexOutOfRangeException();
}
currentLevel++;
}
return currentNode;
}
}
}


Each item within branches is a PairItem and indicates which branch to navigate down in the binary tree. For example,

tree[PairItem.Second, PairItem.First].Value

will retrieve the value located at the second item in the first branch followed by the first branch within that branch.


Returning Null or an Empty Collection

When returning an array or collection, you must indicate that there are zero items by returning either null or a collection instance with no items. The better choice in general is to return a collection instance with no items. In so doing, you avoid forcing the caller to check for null before iterating over the items in the collection. For example, given a zero-size IEnumerable<T> collection, the caller can immediately and safely use a foreach loop over the collection without concern that the generated call to GetEnumerator() will throw aNullReferenceException. Consider using the Enumerable.Empty<T>() method to easily generate an empty collection of a given type.

One of the few times to deviate from this guideline is when null is intentionally indicating something different from zero items. For example, a collection of user names for a website might be null to indicate that an up-to-date collection could not be obtained for some reason; that is semantically different from an empty collection.


Guidelines

DO NOT represent an empty collection with a null reference.

CONSIDER using the Enumerable.Empty<T>() method instead.


Iterators

Chapter 14 went into detail on the internals of the foreach loop. This section discusses how to use iterators to create your own implementation of the IEnumerator<T>, IEnumerable<T>, and corresponding nongeneric interfaces for custom collections. Iterators provide clean syntax for specifying how to iterate over data in collection classes, especially using the foreach loop. The iterator allows end users of a collection to navigate its internal structure without knowledge of that structure.


Advanced Topic: Origin of Iterators

In 1972, Barbara Liskov and a team of scientists at MIT began researching programming methodologies, focusing on user-defined data abstractions. To prove much of their work, they created a language called CLU that had a concept called “clusters” (CLU being the first three letters of this term). Clusters were predecessors to the primary data abstraction that programmers use today: objects. During their research, the team realized that although they were able to use the CLU language to abstract some data representation away from end users of their types, they consistently found themselves having to reveal the inner structure of their data to allow others to intelligently consume it. The result of their consternation was the creation of a language construct called an iterator. (The CLU language offered many insights into what would eventually be popularized as “object-oriented programming.”)


If classes want to support iteration using the foreach loop construct, they must implement the enumerator pattern. As Chapter 14 describes, in C# the foreach loop construct is expanded by the compiler into the while loop construct based on the IEnumerator<T> interface that is retrieved from the IEnumerable<T> interface.

The problem with the enumeration pattern is that it can be cumbersome to implement manually, because it must maintain all the state necessary to describe the current position in the collection. This internal state may be simple for a list collection type class; the index of the current position suffices. In contrast, for data structures that require recursive traversal, such as binary trees, the state can be quite complicated. To mitigate the challenges associated with implementing this pattern, C# 2.0 included a construct that makes it easier for a class to dictate how the foreach loop iterates over its contents.

Defining an Iterator

Iterators are a means to implement methods of a class, and they are syntactic shortcuts for the more complex enumerator pattern. When the C# compiler encounters an iterator, it expands its contents into CIL code that implements the enumerator pattern. As such, there are no runtime dependencies for implementing iterators. Because the C# compiler handles implementation through CIL code generation, there is no real runtime performance benefit to using iterators. However, there is a substantial programmer productivity gain in choosing iterators over manual implementation of the enumerator pattern. To understand improvement, we first consider how an iterator is defined in code.

Iterator Syntax

An iterator provides shorthand implementation of iterator interfaces, the combination of the IEnumerable<T> and IEnumerator<T> interfaces. Listing 16.12 declares an iterator for the generic BinaryTree<T> type by creating a GetEnumerator() method. Next, you will add support for the iterator interfaces.

LISTING 16.12: Iterator Interfaces Pattern


using System;
using System.Collections.Generic;

public class BinaryTree<T>:
IEnumerable<T>
{
public BinaryTree ( T value)
{
Value = value;
}

#region IEnumerable<T>
public IEnumerator<T> GetEnumerator()
{
//...
}
#endregion IEnumerable<T>

public T Value { get; } // C# 6.0 Getter-only Autoproperty
public Pair<BinaryTree<T>> SubItems { get; set; }
}

public struct Pair<T>
{
public Pair(T first, T second) : this()
{
First = first;
Second = second;
}
public T First { get; } // C# 6.0 Getter-only Autoproperty
public T Second { get; } // C# 6.0 Getter-only Autoproperty
}


As Listing 16.12 shows, we need to provide an implementation for the GetEnumerator() method.

Yielding Values from an Iterator

Iterators are like functions, but instead of returning a single value, they yield a sequence of values, one at a time. In the case of BinaryTree<T>, the iterator yields a sequence of values of the type argument provided for T. If the nongeneric version of IEnumerator is used, the yielded values will instead be of type object.

To correctly implement the iterator pattern, you need to maintain some internal state to keep track of where you are while enumerating the collection. In the BinaryTree<T> case, you track which elements within the tree have already been enumerated and which are still to come. Iterators are transformed by the compiler into a “state machine” that keeps track of the current position and knows how to “move itself” to the next position.

The yield return statement yields a value each time an iterator encounters it; control immediately returns to the caller that requested the item. When the caller requests the next item, the code begins to execute immediately following the previously executed yield returnstatement. In Listing 16.13, you return the C# built-in data type keywords sequentially.

LISTING 16.13: Yielding Some C# Keywords Sequentially


using System;
using System.Collections.Generic;

public class CSharpBuiltInTypes: IEnumerable<string>
{
public IEnumerator<string> GetEnumerator()
{
yield return "object";
yield return "byte";
yield return "uint";
yield return "ulong";
yield return "float";
yield return "char";
yield return "bool";
yield return "ushort";
yield return "decimal";
yield return "int";
yield return "sbyte";
yield return "short";
yield return "long";
yield return "void";
yield return "double";
yield return "string";
}

// The IEnumerable.GetEnumerator method is also required
// because IEnumerable<T> derives from IEnumerable.
System.Collections.IEnumerator
System.Collections.IEnumerable.GetEnumerator()
{
// Invoke IEnumerator<string> GetEnumerator() above.
return GetEnumerator();
}
}

public class Program
{
static void Main()
{
var keywords = new CSharpBuiltInTypes();
foreach (string keyword in keywords)
{
Console.WriteLine(keyword);
}
}
}


The results of Listing 16.13 appear in Output 16.5.

OUTPUT 16.5

object
byte
uint
ulong
float
char
bool
ushort
decimal
int
sbyte
short
long
void
double
string

The output from this listing is a listing of the C# built-in types.

Iterators and State

When GetEnumerator() is first called in a foreach statement (such as foreach (string keyword in keywords) in Listing 16.13), an iterator object is created and its state is initialized to a special “start” state that represents the fact that no code has executed in the iterator and, therefore, no values have been yielded yet. The iterator maintains its state as long as the foreach statement at the call site continues to execute. Every time the loop requests the next value, control enters the iterator and continues where it left off the previous time around the loop; the state information stored in the iterator object is used to determine where control must resume. When the foreach statement at the call site terminates, the iterator’s state is no longer saved.

It is always safe to call GetEnumerator() again; “fresh” enumerator objects will be created when necessary.

Figure 16.8 shows a high-level sequence diagram of what takes place. Remember that the MoveNext() method appears on the IEnumerator<T> interface.

Image

FIGURE 16.8: Sequence Diagram with yield return

In Listing 16.13, the foreach statement at the call site initiates a call to GetEnumerator() on the CSharpBuiltInTypes instance called keywords. Given the iterator instance (referenced by iterator), foreach begins each iteration with a call to MoveNext(). Within the iterator, you yield a value back to the foreach statement at the call site. After the yield return statement, the GetEnumerator() method seemingly pauses until the next MoveNext() request. Back at the loop body, the foreach statement displays the yielded value on the screen. It then loops back around and calls MoveNext() on the iterator again. Notice that the second time, control picks up at the second yield return statement. Once again, the foreach displays on the screen what CSharpBuiltInTypes yielded and starts the loop again. This process continues until there are no more yield return statements within the iterator. At that point, the foreach loop at the call site terminates because MoveNext() returns false.

More Iterator Examples

Before you modify BinaryTree<T>, you must modify Pair<T> to support the IEnumerable<T> interface using an iterator. Listing 16.14 is an example that yields each element in Pair<T>.

LISTING 16.14: Using yield to Implement BinaryTree<T>


public struct Pair<T>: IPair<T>,
IEnumerable<T>
{
public Pair(T first, T second) : this()
{
First = first;
Second = second;
}
public T First { get; } // C# 6.0 Getter-only Autoproperty
public T Second { get; } // C# 6.0 Getter-only Autoproperty

#region IEnumerable<T>
public IEnumerator<T> GetEnumerator()
{
yield return First;
yield return Second;
}
#endregion IEnumerable<T>

#region IEnumerable Members
System.Collections.IEnumerator
System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
}


In Listing 16.14, the iteration over the Pair<T> data type loops twice: first through yield return First, and then through yield return Second. Each time the yield return statement within GetEnumerator() is encountered, the state is saved and execution appears to “jump” out of the GetEnumerator() method context and into the loop body. When the second iteration starts, GetEnumerator() begins to execute again with the yield return Second statement.

System.Collections.Generic.IEnumerable<T> inherits from System.Collections.IEnumerable. Therefore, when implementing IEnumerable<T>, it is also necessary to implement IEnumerable. In Listing 16.14, you do so explicitly, and the implementation simply involves a call to IEnumerable<T>’s GetEnumerator() implementation. This call from IEnumerable.GetEnumerator() to IEnumerable<T>.GetEnumerator() will always work because of the type compatibility (via inheritance) between IEnumerable<T>and IEnumerable. Since the signatures for both GetEnumerator()s are identical (the return type does not distinguish a signature), one or both implementations must be explicit. Given the additional type safety offered by IEnumerable<T>’s version, you implementIEnumerable’s implementation explicitly.

Listing 16.15 uses the Pair<T>.GetEnumerator() method and displays "Inigo" and "Montoya" on two consecutive lines.

LISTING 16.15: Using Pair<T>.GetEnumerator() via foreach


var fullname = new Pair<string>("Inigo", "Montoya");
foreach (string name in fullname)
{
Console.WriteLine(name);
}


Notice that the call to GetEnumerator() is implicit within the foreach loop.

Placing a yield return within a Loop

It is not necessary to hardcode each yield return statement, as you did in both CSharpPrimitiveTypes and Pair<T>. Using the yield return statement, you can return values from inside a loop construct. Listing 16.16 uses a foreach loop. Each time the foreach withinGetEnumerator() executes, it returns the next value.

LISTING 16.16: Placing yield return Statements within a Loop


public class BinaryTree<T>: IEnumerable<T>
{
// ...

#region IEnumerable<T>
public IEnumerator<T> GetEnumerator()
{
// Return the item at this node.
yield return Value;

// Iterate through each of the elements in the pair.
foreach (BinaryTree<T> tree in SubItems)
{
if (tree != null)
{
// Since each element in the pair is a tree,
// traverse the tree and yield each
// element.
foreach (T item in tree)
{
yield return item;
}
}
}
}
#endregion IEnumerable<T>

#region IEnumerable Members
System.Collections.IEnumerator
System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
}


In Listing 16.16, the first iteration returns the root element within the binary tree. During the second iteration, you traverse the pair of subelements. If the subelement pair contains a non-null value, you traverse into that child node and yield its elements. Note that foreach (T item in tree) is a recursive call to a child node.

As observed with CSharpBuiltInTypes and Pair<T>, you can now iterate over BinaryTree<T> using a foreach loop. Listing 16.17 demonstrates this process, and Output 16.6 shows the results.

LISTING 16.17: Using foreach with BinaryTree<string>


// JFK
var jfkFamilyTree = new BinaryTree<string>(
"John Fitzgerald Kennedy");

jfkFamilyTree.SubItems = new Pair<BinaryTree<string>>(
new BinaryTree<string>("Joseph Patrick Kennedy"),
new BinaryTree<string>("Rose Elizabeth Fitzgerald"));

// Grandparents (Father's side)
jfkFamilyTree.SubItems.First.SubItems =
new Pair<BinaryTree<string>>(
new BinaryTree<string>("Patrick Joseph Kennedy"),
new BinaryTree<string>("Mary Augusta Hickey"));

// Grandparents (Mother's side)
jfkFamilyTree.SubItems.Second.SubItems =
new Pair<BinaryTree<string>>(
new BinaryTree<string>("John Francis Fitzgerald"),
new BinaryTree<string>("Mary Josephine Hannon"));

foreach (string name in jfkFamilyTree)
{
Console.WriteLine(name);
}


OUTPUT 16.6

John Fitzgerald Kennedy
Joseph Patrick Kennedy
Patrick Joseph Kennedy
Mary Augusta Hickey
Rose Elizabeth Fitzgerald
John Francis Fitzgerald
Mary Josephine Hannon


Advanced Topic: The Dangers of Recursive Iterators

The code in Listing 16.16 creates new “nested” iterators as it traverses the binary tree. As a consequence, when the value is yielded by a node, the value is yielded by the node’s iterator, and then yielded by its parent’s iterator, and then yielded by its parent’s iterator, and so on, until it is finally yielded to the original loop by the root’s iterator. A value that is n levels deep must actually pass its value up a chain of n iterators. If the binary tree is relatively shallow, this is not typically a problem; however, an imbalanced binary tree can be extremely deep, and therefore expensive to iterate recursively.


Guidelines

CONSIDER using nonrecursive algorithms when iterating over potentially deep data structures.




Beginner Topic: struct versus class

An interesting side effect of defining Pair<T> as a struct rather than a class is that SubItems.First and SubItems.Second cannot be assigned directly, even if the setter were public. If you modify the setter to be public, the following will produce a compile error indicating that SubItems cannot be modified, “because it is not a variable”:

jfkFamilyTree.SubItems.First =
new BinaryTree<string>("Joseph Patrick Kennedy");

The issue is that SubItems is a property of type Pair<T>, a struct. Therefore, when the property returns the value, a copy of SubItems is made, and assigning First on a copy that is promptly lost at the end of the statement would be misleading. Fortunately, the C# compiler prevents this error.

To overcome the issue, don’t assign First (see the approach in Listing 16.17), use class rather than struct for Pair<T>, don’t create a SubItems property and instead use a field, or provide properties in BinaryTree<T> that give direct access to SubItems members.


Canceling Further Iteration: yield break

Sometimes you might want to cancel further iteration. You can do so by including an if statement so that no further statements within the code are executed. However, you can also use yield break to cause MoveNext() to return false and control to return immediately to the caller and end the loop. Listing 16.18 shows an example of such a method.

LISTING 16.18: Escaping Iteration via yield break


public System.Collections.Generic.IEnumerable<T>
GetNotNullEnumerator()
{
if((First == null) || (Second == null))
{
yield break;
}
yield return Second;
yield return First;
}


This method cancels the iteration if either of the elements in the Pair<T> class is null.

A yield break statement is similar to placing a return statement at the top of a function when it is determined that there is no work to do. It is a way to exit from further iterations without surrounding all remaining code with an if block. As such, it allows multiple exits. Use it with caution, because a casual reading of the code may overlook the early exit.


Advanced Topic: How Iterators Work

When the C# compiler encounters an iterator, it expands the code into the appropriate CIL for the corresponding enumerator design pattern. In the generated code, the C# compiler first creates a nested private class to implement the IEnumerator<T> interface, along with itsCurrent property and a MoveNext() method. The Current property returns a type corresponding to the return type of the iterator. Listing 16.14 of Pair<T> contains an iterator that returns a T type. The C# compiler examines the code contained within the iterator and creates the necessary code within the MoveNext method and the Current property to mimic its behavior. For the Pair<T> iterator, the C# compiler generates roughly equivalent code (see Listing 16.19).

LISTING 16.19: C# Equivalent of Compiler-Generated C# Code for Iterators


using System;
using System.Collections.Generic;

public class Pair<T> : IPair<T>, IEnumerable<T>
{
// ...

// The iterator is expanded into the following
// code by the compiler
public virtual IEnumerator<T> GetEnumerator()
{
__ListEnumerator result = new __ListEnumerator(0);
result._Pair = this;
return result;
}
public virtual System.Collections.IEnumerator
System.Collections.IEnumerable.GetEnumerator()
{
return new GetEnumerator();
}

private sealed class __ListEnumerator<T> : IEnumerator<T>
{
public __ListEnumerator(int itemCount)
{
_ItemCount = itemCount;
}

Pair<T> _Pair;
T _Current;
int _ItemCount;

public object Current
{
get
{
return _Current;
}
}

public bool MoveNext()
{
switch (_ItemCount)
{
case 0:
_Current = _Pair.First;
_ItemCount++;
return true;
case 1:
_Current = _Pair.Second;
_ItemCount++;
return true;
default:
return false;
}
}
}
}


Because the compiler takes the yield return statement and generates classes that correspond to what you probably would have written manually, iterators in C# exhibit the same performance characteristics as classes that implement the enumerator design pattern manually. Although there is no performance improvement, the gains in programmer productivity are significant.



Advanced Topic: Contextual Keywords

Many C# keywords are “reserved” and cannot be used as identifiers unless preceded with an @ sign. The yield keyword is a contextual keyword, not a reserved keyword; it is legal (though confusing) to declare a local variable called yield. In fact, all the keywords added to C# after version 1.0 have been contextual keywords; this helps prevent accidental breakages when upgrading existing programs to use new versions of the language.

Had the C# designers chosen to use yield value; as the syntax for an iterator to yield instead of yield return value;, a possible ambiguity would have been introduced: yield(1+2); now might be yielding a value, or it might be passing the value as an argument to a method called yield.

Since it was previously never legal to have the identifier yield appear immediately before return or break, the C# compiler knows that such a usage of yield must be as a keyword, not an identifier.


Creating Multiple Iterators in a Single Class

Previous iterator examples implemented IEnumerable<T>.GetEnumerator()—the method that foreach seeks implicitly. Sometimes you might want different iteration sequences, such as iterating in reverse, filtering the results, or iterating over an object projection other than the default. You can declare additional iterators in the class by encapsulating them within properties or methods that return IEnumerable<T> or IEnumerable. If you want to iterate over the elements of Pair<T> in reverse, for example, you could provide aGetReverseEnumerator() method, as shown in Listing 16.20.

LISTING 16.20: Using yield return in a Method That Returns IEnumerable<T>


public struct Pair<T>: IEnumerable<T>
{
...

public IEnumerable<T> GetReverseEnumerator()
{
yield return Second;
yield return First;
}
...
}

public void Main()
{
var game = new Pair<string>("Redskins", "Eagles");
foreach (string name in game.GetReverseEnumerator())
{
Console.WriteLine(name);
}
}


Note that you return IEnumerable<T>, not IEnumerator<T>. This is different from IEnumerable<T>.GetEnumerator(), which returns IEnumerator<T>. The code in Main() demonstrates how to call GetReverseEnumerator() using a foreach loop.

yield Statement Requirements

You can use the yield return statement only in members that return an IEnumerator<T> or IEnumerable<T> type, or their nongeneric equivalents. Members whose bodies include a yield return statement may not have a simple return. If the member uses the yield return statement, the C# compiler generates the necessary code to maintain the state of the iterator. In contrast, if the member uses the return statement instead of yield return, the programmer is responsible for maintaining his own state machine and returning an instance of one of the iterator interfaces. Further, just as all code paths in a method with a return type must contain a return statement accompanied by a value (assuming they don’t throw an exception), so all code paths in an iterator must contain a yield return statement if they are to return any data.

The following additional restrictions on the yield statement result in compiler errors if they are violated:

• The yield statement may appear only inside a method, a user-defined operator, or the get accessor of an indexer or property. The member must not take any ref or out parameter.

• The yield statement may not appear anywhere inside an anonymous method or lambda expression (see Chapter 12).

• The yield statement may not appear inside the catch and finally clauses of the try statement. Furthermore, a yield statement may appear in a try block only if there is no catch block.

Summary

The generic collection classes and interfaces made available in C# 2.0 are universally superior to their nongeneric counterparts; by avoiding boxing penalties and enforcing type safety at compile time, they execute more rapidly and are safer. Unless you must maintain compatibility with legacy C# 1.0 code, you should consider the entire namespace of System.Collections to be obsolete. In other words, don’t go back and necessarily remove all code that already uses this namespace. Instead, use System.Collections.Generics for any new code and, over time, consider migrating existing code to use the corresponding generic collections that contain both the interfaces and the classes for working with collections of objects.

The introduction of the System.Collections.Generic namespace is not the only change that C# 2.0 brought to collections. Another significant addition is the iterator. Iterators involve a new contextual keyword, yield, that C# uses to generate underlying CIL code that implements the iterator pattern used by the foreach loop.

In the next chapter we explore reflection, a topic briefly touched on earlier, albeit with little to no explanation. Reflection allows one to examine the structure of a type within CIL code at runtime.

End 2.0