Well-Formed Types - Essential C# 6.0 (2016)

Essential C# 6.0 (2016)

9. Well-Formed Types

The previous chapters covered most of the constructs for defining classes and structs. However, several details remain to round out the type definition with fit-and-finish-type functionality. This chapter explains how to put the final touches on a type declaration.

Image

Overriding object Members

Chapter 6 discussed how all classes and structs derive from object. In addition, it reviewed each method available on object and discussed how some of them are virtual. This section discusses the details concerning overriding the virtual methods.

Overriding ToString()

By default, calling ToString() on any object will return the fully qualified name of the class. Calling ToString() on a System.IO.FileStream object will return the string System.IO.FileStream, for example. For some classes, however, ToString() can be more meaningful. On string, for example, ToString() returns the string value itself. Similarly, returning a Contact’s name would make more sense. Listing 9.1 overrides ToString() to return a string representation of Coordinate.

LISTING 9.1: Overriding ToString()


public struct Coordinate
{
public Coordinate(Longitude longitude, Latitude latitude)
{
Longitude = longitude;
Latitude = latitude;
}

public Longitude Longitude { get; }
public Latitude Latitude { get; }

public override string ToString()
{
return $"{ Longitude } { Latitude }";
}

// ...
}


Write methods such as Console.WriteLine() and System.Diagnostics.Trace.Write() call an object’s ToString() method, so overloading the method often outputs more meaningful information than the default implementation. Given this point, you should consider overloading the ToString() method whenever relevant diagnostic information can be provided from the output—specifically, when the target audience is developers, since the default object.ToString() output is a type name and is not end-user friendly. ToString() is useful for debugging from within a developer IDE or writing to a log file. For this reason, you should keep the strings relatively short (one screen length) so that they are not cut off. However, the lack of localization and other advanced formatting features makes this approach less suitable for general end-user text display.


Guidelines

DO override ToString() whenever useful developer-oriented diagnostic strings can be returned.

DO try to keep the string returned from ToString() short.

DO NOT return an empty string or null from ToString().

AVOID throwing exceptions or making observable side effects (changing the object state) from ToString().

DO provide an overloaded ToString(string format) or implement IFormattable if the return value is culture-sensitive or requires formatting (for example, DateTime).

CONSIDER returning a unique string from ToString() so as to identify the object instance.


Overriding GetHashCode()

Overriding GetHashCode() is more complex than overriding ToString(). Even so, you should override GetHashCode() when you are overriding Equals(), and there is a compiler warning to indicate this step is recommended if you don’t. Overriding GetHashCode() is also a good practice when you are using it as a key into a hash table collection (System.Collections.Hashtable and System.Collections.Generic.Dictionary, for example).

The purpose of the hash code is to efficiently balance a hash table by generating a number that corresponds to the value of an object. Here are some implementation principles for a good GetHashCode() implementation:

Required: Equal objects must have equal hash codes (if a.Equals(b), then a.GetHashCode() == b.GetHashCode()).

Required: GetHashCode()’s returns over the life of a particular object should be constant (the same value), even if the object’s data changes. In many cases, you should cache the method return to enforce this constraint.

Required: GetHashCode() should not throw any exceptions; GetHashCode() must always successfully return a value.

Performance: Hash codes should be unique whenever possible. However, since hash codes return only an int, there has to be an overlap in hash codes for objects that have potentially more values than an int can hold, which is virtually all types. (An obvious example is long, since there are more possible long values than an int could uniquely identify.)

Performance: The possible hash code values should be distributed evenly over the range of an int. For example, creating a hash that doesn’t consider the fact that distribution of a string in Latin-based languages primarily centers on the initial 128 ASCII characters would result in a very uneven distribution of string values and would not be a strong GetHashCode() algorithm.

Performance: GetHashCode() should be optimized for performance. GetHashCode() is generally used in Equals() implementations to short-circuit a full equals comparison if the hash codes are different. As a result, it is frequently called when the type is used as a key type in dictionary collections.

Performance: Small differences between two objects should result in large differences between hash code values—ideally, a 1-bit difference in the object should result in approximately 16 bits of the hash code changing, on average. This helps ensure that the hash table remains balanced no matter how it is “bucketing” the hash values.

Security: It should be difficult for an attacker to craft an object that has a particular hash code. The attack is to flood a hash table with large amounts of data that all hash to the same value. The hash table implementation can become inefficient, resulting in a possible denial-of-service attack.

These guidelines and rules are, of course, contradictory: It is very difficult to come up with a hash algorithm that is fast and meets all of these guidelines. As with any design problem, you’ll need to use a combination of good judgment and realistic performance measurements to come up with a good solution.

Consider the GetHashCode() implementation for the Coordinate type shown in Listing 9.2.

LISTING 9.2: Implementing GetHashCode()


public struct Coordinate
{
public Coordinate(Longitude longitude, Latitude latitude)
{
Longitude = longitude;
Latitude = latitude;
}

public Longitude Longitude { get; }
public Latitude Latitude { get; }

public override int GetHashCode()
{
int hashCode = Longitude.GetHashCode();
// As long as the hash codes are not equal
if(Longitude.GetHashCode() != Latitude.GetHashCode())
{
hashCode ^= Latitude.GetHashCode(); // eXclusive OR
}
return hashCode;
}

// ...
}


Generally, the key is to use the XOR operator over the hash codes from the relevant types, and to make sure the XOR operands are not likely to be close or equal—or else the result will be all zeroes. (In those cases where the operands are close or equal, consider using bit shifts and adds instead.) The alternative operands, AND and OR, have similar restrictions, but those restrictions come into play more frequently. Applying AND multiple times tends toward all 0 bits, and applying OR tends toward all 1 bits.

For finer-grained control, split larger-than-int types using the shift operator. For example, GetHashCode() for a long called value is implemented as follows:

int GetHashCode() { return ((int)value ^ (int)(value >> 32)) };

Also, if the base class is not object, base.GetHashCode() should be included in the XOR assignment.

Finally, Coordinate does not cache the value of the hash code. Since each field in the hash code calculation is readonly, the value can’t change. However, implementations should cache the hash code if calculated values could change or if a cached value could offer a significant performance advantage.

Overriding Equals()

Overriding Equals() without overriding GetHashCode() results in a warning such as that shown in Output 9.1.

OUTPUT 9.1

warning CS0659: '<Class Name>' overrides Object.Equals(object o) but
does not override Object.GetHashCode()

Generally, developers expect overriding Equals() to be trivial, but it includes a surprising number of subtleties that require careful thought and testing.

Object Identity versus Equal Object Values

Two references are identical if both refer to the same instance. object (and, by inheritance, all derived types) includes a static method called ReferenceEquals() that explicitly checks for this object identity (see Figure 9.1).

Image

FIGURE 9.1: Identity

However, reference equality is not the only type of equality. Two object instances can also be called equal if the values of some or all of their members are equal. Consider the comparison of two ProductSerialNumbers shown in Listing 9.3.

LISTING 9.3: Overriding the Equality Operator


public sealed class ProductSerialNumber
{
// ...
}


class Program
{
static void Main()
{
ProductSerialNumber serialNumber1 =
new ProductSerialNumber("PV", 1000, 09187234);
ProductSerialNumber serialNumber2 = serialNumber1;
ProductSerialNumber serialNumber3 =
new ProductSerialNumber("PV", 1000, 09187234);

// These serial numbers ARE the same object identity.
if(!ProductSerialNumber.ReferenceEquals(serialNumber1,
serialNumber2))
{
throw new Exception(
"serialNumber1 does NOT " +
"reference equal serialNumber2");
}
// And, therefore, they are equal.
else if(!serialNumber1.Equals(serialNumber2))
{
throw new Exception(
"serialNumber1 does NOT equal serialNumber2");
}
else
{
Console.WriteLine(
"serialNumber1 reference equals serialNumber2");
Console.WriteLine(
"serialNumber1 equals serialNumber2");
}


// These serial numbers are NOT the same object identity.
if (ProductSerialNumber.ReferenceEquals(serialNumber1,
serialNumber3))
{
throw new Exception(
"serialNumber1 DOES reference " +
"equal serialNumber3");
}
// But they are equal (assuming Equals is overloaded).
else if(!serialNumber1.Equals(serialNumber3) ||
serialNumber1 != serialNumber3)
{
throw new Exception(
"serialNumber1 does NOT equal serialNumber3");
}

Console.WriteLine( "serialNumber1 equals serialNumber3" );
}
}


The results of Listing 9.3 appear in Output 9.2.

OUTPUT 9.2

serialNumber1 reference equals serialNumber2
serialNumber1 equals serialNumber3

As the last assertion demonstrates with ReferenceEquals(), serialNumber1 and serialNumber3 are not the same reference. However, the code constructs them with the same values and both are logically associated with the same physical product. If one instance was created from data in the database and another was created from manually entered data, you would expect that the instances would be equal and, therefore, that the product would not be duplicated (reentered) in the database. Two identical references are obviously equal; however, two different objects could be equal but not reference equal. Such objects will not have identical object identities, but they may have key data that identifies them as being equal objects.

Only reference types can be reference equal, thereby supporting the concept of identity. Calling ReferenceEquals() on value types will always return false because value types are boxed when they are converted to object for the call. Even when the same variable is passed in both (value type) parameters to ReferenceEquals(), the result will still be false because the values are boxed independently. Listing 9.4 demonstrates this behavior because each argument is put into a “different box” in this example, they are never reference equal.


Note

Calling ReferenceEquals() on value types will always return false.


LISTING 9.4: Value Types Never Reference Equal Themselves


public struct Coordinate
{
public Coordinate(Longitude longitude, Latitude latitude)
{
Longitude = longitude;
Latitude = latitude;
}

public Longitude Longitude { get; }
public Latitude Latitude { get; }
// ...
}


class Program
{
public void Main()
{
//...

Coordinate coordinate1 =
new Coordinate( new Longitude(48, 52),
new Latitude(-2, -20));

// Value types will never be reference equal.
if ( Coordinate.ReferenceEquals(coordinate1,
coordinate1) )
{
throw new Exception(
"coordinate1 reference equals coordinate1");
}

Console.WriteLine(
"coordinate1 does NOT reference equal itself" );
}
}


In contrast to the definition of Coordinate as a reference type in Chapter 8, the definition going forward is that of a value type (struct) because the combination of Longitude and Latitude data is logically thought of as a value and its size is less than 16 bytes. (In Chapter 8,Coordinate aggregated Angle rather than Longitude and Latitude.) A contributing factor to declaring Coordinate as a value type is that it is a (complex) numeric value that has particular operations on it. In contrast, a reference type such as Employee is not a value that you manipulate numerically, but rather refers to an object in real life.

Implementing Equals()

To determine whether two objects are equal (that is, if they have the same identifying data), you use an object’s Equals() method. The implementation of this virtual method on object uses ReferenceEquals() to evaluate equality. Since this implementation is often inadequate, it is sometimes necessary to override Equals() with a more appropriate implementation.


Note

The implementation of object.Equals(), the default implementation on all objects before overloading, relies on ReferenceEquals() alone.


For objects to equal one another, the expectation is that the identifying data within them will be equal. For ProductSerialNumbers, for example, the ProductSeries, Model, and Id must be the same; however, for an Employee object, perhaps comparing EmployeeIds would be sufficient for equality. To correct the object.Equals() implementation, it is necessary to override it. Value types, for example, override the Equals() implementation to instead use the fields that the type includes.

The steps for overriding Equals() are as follows:

1. Check for null.

2. Check for reference equality if the type is a reference type.

3. Check for equivalent types.

4. Invoke a typed helper method that can treat the operand as the compared type rather than an object (see the Equals(Coordinate obj) method in Listing 9.5).

5. Possibly check for equivalent hash codes to short-circuit an extensive, field-by-field comparison. (Two objects that are equal cannot have different hash codes.)

6. Check base.Equals() if the base class overrides Equals().

7. Compare each identifying field for equality.

8. Override GetHashCode().

9. Override the == and != operators (see the next section).

Listing 9.5 shows a sample Equals() implementation.

LISTING 9.5: Overriding Equals()


public struct Longitude
{
// ...
}


public struct Latitude
{
// ...
}


public struct Coordinate
{
public Coordinate(Longitude longitude, Latitude latitude)
{
Longitude = longitude;
Latitude = latitude;
}

public Longitude Longitude { get; }
public Latitude Latitude { get; }

public override bool Equals(object obj)
{
// STEP 1: Check for null.
if (obj == null)
{
return false;
}
// STEP 3: Equivalent data types.
// Can be avoided if type is sealed.
if (this.GetType() != obj.GetType())
{
return false;
}
return Equals((Coordinate)obj);
}
public bool Equals(Coordinate obj)
{
// STEP 1: Check for null if a reference type
// (e.g., a reference type).
// if (obj == null)
// {
// return false;
// }

// STEP 2: Check for ReferenceEquals if this
// is a reference type.
// if ( ReferenceEquals(this, obj))
// {
// return true;
// }

// STEP 4: Possibly check for equivalent hash codes.
// if (this.GetHashCode() != obj.GetHashCode())
// {
// return false;
// }

// STEP 5: Check base.Equals if base overrides Equals().
// System.Diagnostics.Debug.Assert(
// base.GetType() != typeof(object) );
// if ( !base.Equals(obj) )
// {
// return false;
// }

// STEP 6: Compare identifying fields for equality
// using an overload of Equals on Longitude.
return ( (Longitude.Equals(obj.Longitude)) &&
(Latitude.Equals(obj.Latitude)) );
}

// STEP 7: Override GetHashCode.
public override int GetHashCode()
{
int hashCode = Longitude.GetHashCode();
hashCode ^= Latitude.GetHashCode(); // Xor (eXclusive OR)
return hashCode;
}

}


In this implementation, the first two checks are relatively obvious. However, it is interesting to point out that step 3 can be avoided if the type is sealed.

Steps 4–6 occur in an overload of Equals() that takes the Coordinate data type specifically. This way, a comparison of two Coordinates will avoid Equals(object obj) and its GetType() check altogether.

Since GetHashCode() is not cached and is no more efficient than step 5, the GetHashCode() comparison is commented out. Similarly, base.Equals() is not used since the base class is not overriding Equals(). (The assertion checks that base is not of type object, but does not verify whether the base class overrides Equals(), which is required to appropriately call base.Equals().) Regardless, because GetHashCode() does not necessarily return a unique value (it simply identifies when operands are different), on its own it does not conclusively identify equal objects.

Like GetHashCode(), Equals() should never throw any exceptions. It is valid to compare any object with any other object, and doing so should never result in an exception.


Guidelines

DO implement GetHashCode(), Equals(), the == operator, and the != operator together—not one of these without the other three.

DO use the same algorithm when implementing Equals(), ==, and !=.

AVOID throwing exceptions from implementations of GetHashCode(), Equals(), ==, and !=.

AVOID overloading equality operators on mutable reference types or if the implementation would be significantly slower.

DO implement all the equality-related methods when implementing IComparable.


Operator Overloading

The preceding section looked at overriding Equals() and provided the guideline that the class should also implement == and !=. Implementing any operator is called operator overloading. This section describes how to perform such overloading, not only for == and !=, but also for other supported operators.

For example, string provides a + operator that concatenates two strings. This is perhaps not surprising, because string is a predefined type, so it could possibly have special compiler support. However, C# provides for adding + operator support to a class or struct. In fact, all operators are supported except x.y, f(x), new, typeof, default, checked, unchecked, delegate, is, as, =, and =>. One particularly noteworthy operator that cannot be implemented is the assignment operator; there is no way to change the behavior of the = operator.

Before going through the exercise of implementing an operator overload, consider the fact that such operators are not discoverable through IntelliSense. Unless the intent is for a type to act like a primitive type (a numeric type, for example), you should avoid overloading an operator.

Comparison Operators (==, !=, <, >, <=, >=)

Once Equals() is overridden, there is a possible inconsistency. That is, two objects could return true for Equals() but false for the == operator because == performs a reference equality check by default. To correct this flaw, it is important to overload the equals (==) and not equals (!=) operators as well.

For the most part, the implementation for these operators can delegate the logic to Equals(), or vice versa. However, for reference types, some initial null checks are required first (see Listing 9.6).

LISTING 9.6: Implementing the == and != Operators


public sealed class ProductSerialNumber
{
// ...

public static bool operator ==(
ProductSerialNumber leftHandSide,
ProductSerialNumber rightHandSide)
{

// Check if leftHandSide is null.
// (operator== would be recursive)
if(ReferenceEquals(leftHandSide, null))
{
// Return true if rightHandSide is also null
// and false otherwise.
return ReferenceEquals(rightHandSide, null);
}

return (leftHandSide.Equals(rightHandSide));
}

public static bool operator !=(
ProductSerialNumber leftHandSide,
ProductSerialNumber rightHandSide)
{
return !(leftHandSide == rightHandSide);
}
}


Note that in this example, we use ProductSerialNumber rather than Coordinate to demonstrate the logic for a reference type, which has the added complexity of a null value.

Be sure to avoid the null checks with an equality operator (leftHandSide == null). Doing so would recursively call back into the method, resulting in a loop that continues until the stack overflows. To avoid this problem, you can call ReferenceEquals() to check for null.


Note

AVOID using the equality comparison operator (==) from within the implementation of the == operator overload.


Binary Operators (+, -, *, /, %, &, |, ^, <<, >>)

You can add an Arc to a Coordinate. However, the code so far provides no support for the addition operator. Instead, you need to define such a method, as Listing 9.7 demonstrates.

LISTING 9.7: Adding an Operator


struct Arc
{
public Arc(
Longitude longitudeDifference,
Latitude latitudeDifference)
{
LongitudeDifference = longitudeDifference;
LatitudeDifference = latitudeDifference;
}

public Longitude LongitudeDifference { get; }
public Latitude LatitudeDifference { get; }
}


struct Coordinate
{
// ...
public static Coordinate operator +(
Coordinate source, Arc arc)
{
Coordinate result = new Coordinate(
new Longitude(
source.Longitude + arc.LongitudeDifference),
new Latitude(
source.Latitude + arc.LatitudeDifference));
return result;
}
}


The +, -, *, /, %, &, |, ^, <<, and >> operators are implemented as binary static methods where at least one parameter is of the containing type. The method name is the operator prefixed by the word operator as a keyword. As shown in Listing 9.8, given the definition of the - and +binary operators, you can add and subtract an Arc to and from the coordinate. Note that Longitude and Latitude will also require implementations of the + operator because they are called by source.Longitude + arc.LongitudeDifference and source.Latitude + arc.LatitudeDifference.

LISTING 9.8: Calling the – and + Binary Operators


public class Program
{
public static void Main()
{
Coordinate coordinate1,coordinate2;
coordinate1 = new Coordinate(
new Longitude(48, 52), new Latitude(-2, -20));
Arc arc = new Arc(new Longitude(3), new Latitude(1));

coordinate2 = coordinate1 + arc;
Console.WriteLine(coordinate2);

coordinate2 = coordinate2 - arc;
Console.WriteLine(coordinate2);

coordinate2 += arc;
Console.WriteLine(coordinate2);
}
}


The results of Listing 9.8 appear in Output 9.3.

OUTPUT 9.3

51° 52' 0 E -1° -20' 0 N
48° 52' 0 E -2° -20' 0 N
51° 52' 0 E -1° -20' 0 N

For Coordinate, you implement the – and + operators to return coordinate locations after adding/subtracting Arc. This allows you to string multiple operators and operands together, as in result = ((coordinate1 + arc1) + arc2) + arc3. Moreover, by supporting the same operators (+/-) on Arc (see Listing 9.9), you could eliminate the parenthesis. This approach works because the result of the first operand (arc1 + arc2) is another Arc, which you can then add to the next operand of type Arc or Coordinate.

In contrast, consider what would happen if you provided a – operator that had two Coordinates as parameters and returned a double corresponding to the distance between the two coordinates. Adding a double to a Coordinate is undefined, so you could not string together operators and operands. Caution is in order when defining operators that return a different type, because doing so is counterintuitive.

Combining Assignment with Binary Operators (+=, -=, *=, /=, %=, &=, ...)

As previously mentioned, there is no support for overloading the assignment operator. However, assignment operators in combination with binary operators (+=, -=, *=, /=, %=, &=, |=, ^=, <<=, and >>=) are effectively overloaded when overloading the binary operator. Given the definition of a binary operator without the assignment, C# automatically allows for assignment in combination with the operator. Using the definition of Coordinate in Listing 9.7, therefore, you can have code such as

coordinate += arc;

which is equivalent to the following:

coordinate = coordinate + arc;

Conditional Logical Operators (&&, ||)

Like assignment operators, conditional logical operators cannot be overloaded explicitly. However, because the logical operators & and | can be overloaded, and the conditional operators comprise the logical operators, effectively it is possible to overload conditional operators. x && y is processed as x & y, where y must evaluate to true. Similarly, x || y is processed as x | y only if x is false. To enable support for evaluating a type to true or false—in an if statement, for example—it is necessary to override the true/false unary operators.

Unary Operators (+, -, !, ~, ++, --, true, false)

Overloading unary operators is very similar to overloading binary operators, except that they take only one parameter, also of the containing type. Listing 9.9 overloads the + and – operators for Longitude and Latitude and then uses these operators when overloading the same operators in Arc.

LISTING 9.9: Overloading the – and + Unary Operators


public struct Latitude
{
// ...
public static Latitude operator -(Latitude latitude)
{
return new Latitude(-latitude.DecimalDegrees);
}
public static Latitude operator +(Latitude latitude)
{
return latitude;
}
}


public struct Longitude
{
// ...
public static Longitude operator -(Longitude longitude)
{
return new Longitude(-longitude.DecimalDegrees);
}
public static Longitude operator +(Longitude longitude)
{
return longitude;
}
}


public struct Arc
{
// ...
public static Arc operator -(Arc arc)
{
// Uses unary – operator defined on
// Longitude and Latitude
return new Arc(-arc.LongitudeDifference,
-arc.LatitudeDifference);
}
public static Arc operator +(Arc arc)
{
return arc;
}
}


Just as with numeric types, the + operator in this listing doesn’t have any effect and is provided for symmetry.

Overloading true and false is subject to the additional requirement that both must be overloaded—not just one of the two. The signatures are the same as with other operator overloads; however, the return must be a bool, as demonstrated in Listing 9.10.

LISTING 9.10: Overloading the true and false Operators


public static bool operator false(IsValid item)
{
// ...
}
public static bool operator true(IsValid item)
{
// ...
}


You can use types with overloaded true and false operators in if, do, while, and for controlling expressions.

Conversion Operators

Currently, there is no support in Longitude, Latitude, and Coordinate for casting to an alternative type. For example, there is no way to cast a double into a Longitude or Latitude instance. Similarly, there is no support for assigning a Coordinate using a string. Fortunately, C# provides for the definition of methods specifically intended to handle the converting of one type to another. Furthermore, the method declaration allows for specifying whether the conversion is implicit or explicit.


Advanced Topic: Cast Operator (())

Implementing the explicit and implicit conversion operators is not technically overloading the cast operator (()). However, this action is effectively what takes place, so defining a cast operator is common terminology for implementing explicit or implicit conversion.


Defining a conversion operator is similar in style to defining any other operator, except that the “operator” is the resultant type of the conversion. Additionally, the operator keyword follows a keyword that indicates whether the conversion is implicit or explicit (see Listing 9.11).

LISTING 9.11: Providing an Implicit Conversion between Latitude and double


public struct Latitude
{
// ...

public Latitude(double decimalDegrees)
{
DecimalDegrees = Normalize(decimalDegrees);
}

public double DecimalDegrees { get; }

// ...

public static implicit operator double(Latitude latitude)
{
return latitude.DecimalDegrees;
}
public static implicit operator Latitude(double degrees)
{
return new Latitude(degrees);
}

// ...
}


With these conversion operators, you now can convert doubles implicitly to and from Latitude objects. Assuming similar conversions exist for Longitude, you can simplify the creation of a Coordinate object by specifying the decimal degrees portion of each coordinate portion (for example, coordinate = new Coordinate(43, 172);).


Note

When implementing a conversion operator, either the return or the parameter must be of the enclosing type—in support of encapsulation. C# does not allow you to specify conversions outside the scope of the converted type.


Guidelines for Conversion Operators

The difference between defining an implicit and an explicit conversion operator centers on preventing an unintentional implicit conversion that results in undesirable behavior. You should be aware of two possible consequences of using the explicit conversion operator. First, conversion operators that throw exceptions should always be explicit. For example, it is highly likely that a string will not conform to the format that a conversion from string to Coordinate requires. Given the chance of a failed conversion, you should define the particular conversion operator as explicit, thereby requiring that you be intentional about the conversion and ensure that the format is correct or, alternatively, that you provide code to handle the possible exception. Frequently, the pattern for conversion is that one direction (string to Coordinate) is explicit and the reverse (Coordinate to string) is implicit.

A second consideration is the fact that some conversions will be lossy. Converting from a float (4.2) to an int is entirely valid, assuming an awareness of the fact that the decimal portion of the float will be lost. Any conversions that will lose data and will not successfully convert back to the original type should be defined as explicit. If an explicit cast is unexpectedly lossy or invalid, consider throwing a System.InvalidCastException.


Guidelines

DO NOT provide an implicit conversion operator if the conversion is lossy.

DO NOT throw exceptions from implicit conversions.


Referencing Other Assemblies

Instead of placing all code into one monolithic binary file, C# and the underlying CLI platform allow you to spread code across multiple assemblies. This approach enables you to reuse assemblies across multiple executables.


Beginner Topic: Class Libraries

The HelloWorld.exe program is one of the most trivial programs you can write. Real-world programs are more complex, and as complexity increases, it helps to organize the complexity by breaking programs into multiple parts. To do this, developers move portions of a program into separate compiled units called class libraries or, simply, libraries. Programs then reference and rely on class libraries to provide parts of their functionality. The power of this concept is that two programs can rely on the same class library, thereby sharing the functionality of that class library across both programs and reducing the total amount of code needed.

In other words, it is possible to write features once, place them into a class library, and allow multiple programs to include those features by referencing the same class library. Later in the development cycle, when developers fix a bug or add functionality to the class library, all the programs will have access to the increased functionality, just because they continue to reference the now improved class library.


To reuse the code within a different assembly, it is necessary to reference the assembly when running the C# compiler. Generally, the referenced assembly is a class library, and creating a class library requires a different assembly target from the default console executable targets you have created thus far.

Changing the Assembly Target

The compiler allows you to create four1 different assembly types via the /target option:

1. The C# compiler also allows /target:appcontainerexe and /target:winmdobj; these options are for creating special Windows applications not covered in this book.

Console executable: This is the default type of assembly, and all compilation thus far has been to a console executable. (Leaving off the /target option or specifying /target:exe creates a console executable.)

Class library: Classes that are shared across multiple executables are generally defined in a class library (/target:library).

Windows executable: Windows executables are designed to run in the Microsoft Windows family of operating systems and outside the command console (/target:winexe).

Module: To facilitate use of multiple languages within the same assembly, code can be compiled to a module and multiple modules can be combined to form an assembly (/target:module).

Assemblies to be shared across multiple applications are generally compiled as class libraries. Consider, for example, a library dedicated to functionality around longitude and latitude coordinates. To compile the Coordinate, Longitude, and Latitude classes into their own library, you would use the command line shown in Output 9.4.

OUTPUT 9.4

>csc /target:library /out:Coordinates.dll Coordinate.cs IAngle.cs
Latitude.cs Longitude.cs Arc.cs
Microsoft (R) Visual C# Compiler version 1.0.0.50128
Copyright (C) Microsoft Corporation. All rights reserved.

Assuming you use .NET and the C# compiler is in the path, this command builds an assembly library called Coordinates.dll.

Referencing an Assembly

To access code within a different assembly, the C# compiler allows the developer to reference the assembly on the command line. The option in such a case is /reference (/r is the abbreviation), followed by the list of references. The Program class listing in Listing 9.8 uses theCoordinate class, for example, and if you place it into a separate executable, you reference Coordinates.dll using the .NET command line shown in Output 9.5.

OUTPUT 9.5

csc.exe /R:Coordinates.dll Program.cs

The Mono command line appears in Output 9.6.

OUTPUT 9.6

msc.exe /R:Coordinates.dll Program.cs


Advanced Topic: Referencing Assemblies on Mac and Linux

At the time of this book’s writing, Microsoft had expressed the expectation that the Mac and Linux versions of the command-line C# compiler would behave just like the Windows version as far as referencing assemblies goes. Nevertheless, some open issues remain regarding how references to mscorlib.dll (the class library that contains the desktop CLR’s fundamental base classes, such as object and string) would work. Users of the C# command-line compiler may have to specify the complete path to mscorlib.dll on these platforms; consult the platform-specific documentation for details.



Advanced Topic: Portable Class Libraries

An increasingly common scenario for C# developers is to create programs that run as traditional desktop applications, on mobile devices, gaming platforms, and so on. A good technique to achieve this goal is to put the core application classes common to all versions of the application into a portable class library (PCL); a PCL can be used on many .NET platforms. Of course, a PCL must reference only assemblies that are themselves capable of running on multiple platforms.

The easiest way to create a PCL is to select the portable class library project type in Visual Studio 2012 or later. To create a PCL using the command-line compiler is a bit more difficult but still possible. To do so, specify the /noconfig and /nostdlib options; this ensures that the default framework class libraries will not be used. Then use the /reference option to add references to the special portable metadata libraries in the Reference Assemblies\Microsoft\Framework\.NETPortable\v4.5 subdirectory of your Program Filesdirectory.



Advanced Topic: Encapsulation of Types

Just as classes serve as an encapsulation boundary for behavior and data, so assemblies provide for similar boundaries among groups of types. Developers can break a system into assemblies and then share those assemblies with multiple applications or integrate them with assemblies provided by third parties.

public or internal Access Modifiers on Type Declarations

By default, a class without any access modifier is defined as internal.2 The result is that the class is inaccessible from outside the assembly. Even though another assembly references the assembly containing the class, all internal classes within the referenced assemblies will be inaccessible.

2. Excluding nested types, which are private by default.

Just as private and protected provide levels of encapsulation to members within a class, so C# supports the use of access modifiers at the class level for control over the encapsulation of the classes within an assembly. The access modifiers available are public andinternal. To expose a class outside the assembly, the assembly must be marked as public. Therefore, before compiling the Coordinates.dll assembly, it is necessary to modify the type declarations as public (see Listing 9.12).

LISTING 9.12: Making Types Available outside an Assembly


public struct Coordinate
{
// ...
}


public struct Latitude
{
// ...
}


public struct Longitude
{
// ...
}


public struct Arc
{
// ...
}


Similarly, declarations such as class and enum can be either public or internal.3

3. You can decorate nested classes with any access modifier available to other class members (private, for example). However, outside the class scope, the only access modifiers that are available are public and internal.


The internal access modifier is not limited to type declarations; that is, it is also available on type members. As a consequence, you can designate a type as public but mark specific methods within the type as internal so that the members are available only from within the assembly. It is not possible for the members to have a greater accessibility than the type. If the class is declared as internal, public members on the type will be accessible only from within the assembly.

The protected internal Type Modifier

Another type member access modifier is protected internal. Members with an accessibility modifier of protected internal will be accessible from all locations within the containing assembly and from classes that derive from the type, even if the derived class is not in the same assembly. The default state is private, so when you add an access modifier (other than public), the member becomes slightly more visible. Adding two modifiers compounds this effect.


Note

Members with an accessibility modifier of protected internal will be accessible from all locations within the containing assembly and from classes that derive from the type, even if the derived class is not in the same assembly.



Beginner Topic: Type Member Accessibility Modifiers

The full list of access modifiers appears in Table 9.1.

Image

TABLE 9.1: Accessibility Modifiers


Defining Namespaces

As mentioned in Chapter 2, all data types are identified by the combination of their namespace and their name. However, in the CLR, there is no such thing as a “namespace.” The type’s name actually is the fully qualified type name, including the namespace. For the classes you defined earlier, there was no explicit namespace declaration. Classes such as these are automatically declared as members of the default global namespace. It is likely that such classes will experience a name collision, which occurs when you attempt to define two classes with the same name. Once you begin referencing other assemblies from third parties, the likelihood of a name collision increases even further.

More importantly, there are thousands of types in the .NET Framework and multiple orders of magnitude more outside the framework. Finding the right type for a particular problem, therefore, could potentially be a significant battle.

The resolution to both of these problems is to organize all the types, grouping them into logical related categories called namespaces. For example, classes outside the System namespace are generally placed into a namespace corresponding with the company, product name, or both. Classes from Addison-Wesley, for example, are placed into an Awl or AddisonWesley namespace, and classes from Microsoft (not System classes) are located in the Microsoft namespace. The second level of a namespace should be a stable product name that will not vary between versions. Stability, in fact, is key at all levels. Changing a namespace name is a version-incompatible change that should be avoided. For this reason, avoid using volatile names (organization hierarchy, fleeting brands, and so on) within a namespace name.

Namespaces should be PascalCase, but if your brand uses nontraditional casing, it is acceptable to use the brand casing. (Consistency is key, so if that will be problematic—with Pascal or brand-based casing—favor the use of whichever convention will produce the greater consistency.) You should use the namespace keyword to create a namespace and to assign a class to it, as shown in Listing 9.13.

LISTING 9.13: Defining a Namespace


// Define the namespace AddisonWesley
namespace AddisonWesley
{
class Program
{
// ...
}
}
// End of AddisonWesley namespace declaration


All content between the namespace declaration’s curly braces will then belong within the specified namespace. In Listing 9.13, for example, Program is placed into the namespace AddisonWesley, making its full name AddisonWesley.Program.


Note

In the CLR there is no such thing as a “namespace.” The type’s name actually is the fully qualified type name.


Like classes, namespaces support nesting. This provides for a hierarchical organization of classes. All the System classes relating to network APIs are in the namespace System.Net, for example, and those relating to the Web are in System.Web.

There are two ways to nest namespaces. The first approach is to nest them within one another (similar to classes), as demonstrated in Listing 9.14.

LISTING 9.14: Nesting Namespaces within One Another


// Define the namespace AddisonWesley
namespace AddisonWesley
{
// Define the namespace AddisonWesley.Michaelis
namespace Michaelis
{
// Define the namespace
// AddisonWesley.Michaelis.EssentialCSharp
namespace EssentialCSharp
{
// Declare the class
// AddisonWesley.Michaelis.EssentialCSharp.Program
class Program
{
// ...
}
}
}
}
// End of AddisonWesley namespace declaration


Such a nesting will assign the Program class to the AddisonWesley.Michaelis.EssentialCSharp namespace.

The second approach is to use the full namespace in a single namespace declaration in which a period separates each identifier, as shown in Listing 9.15.

LISTING 9.15: Nesting Namespaces Using a Period to Separate Each Identifier


// Define the namespace AddisonWesley.Michaelis.EssentialCSharp
namespace AddisonWesley.Michaelis.EssentialCSharp
{
class Program
{
// ...
}
}
// End of AddisonWesley namespace declaration


Regardless of whether a namespace declaration follows the pattern shown in Listing 9.14, that in Listing 9.15, or a combination of the two, the resultant CIL code will be identical. The same namespace may occur multiple times, in multiple files, and even across assemblies. For example, with the convention of one-to-one correlation between files and classes, you can define each class in its own file and surround it with the same namespace declaration.

Given that namespaces are key for organizing types, it is frequently helpful to use the namespace for organizing all the class files. For this reason, it is helpful to create a folder for each namespace, placing a class such as AddisonWesley.Fezzik.Services.Registration into a folder hierarchy corresponding to the name.

When using Visual Studio projects, if the project name is AddisonWesley.Fezzik, you should create one subfolder called Services into which RegistrationService.cs is placed. You would then create another subfolder—Data, for example—into which you place classes relating to entities within the program—RealestateProperty, Buyer, and Seller, for example.


Guidelines

DO prefix namespace names with a company name to prevent namespaces from different companies having the same name.

DO use a stable, version-independent product name at the second level of a namespace name.

DO NOT define types without placing them into a namespace.

CONSIDER creating a folder structure that matches the namespace hierarchy.


Namespace Alias Qualifier

Namespaces on their own deal with the vast majority of naming conflicts that might arise. Nevertheless, sometimes (albeit rarely) conflict can arise because of an overlap in the namespace and class names. To account for this possibility, the C# 2.0 compiler includes an option for providing an alias with the /reference option. For example, if the assemblies CoordinatesPlus.dll and Coordinates.dll have an overlapping type of Arc, you can reference both assemblies on the command line by assigning to one or both references a namespace alias qualifier that further distinguishes one class from the other (see Output 9.7).

OUTPUT 9.7

csc.exe /R:CoordPlus=CoordinatesPlus.dll /R:Coordinates.dll Program.cs

However, adding the alias during compilation is not sufficient on its own. To refer to classes in the aliased assembly, it is necessary to provide an extern directive that declares that the namespace alias qualifier is provided externally to the source code (see Listing 9.16).

LISTING 9.16: Using the extern Alias Directive


// extern must precede all other namespace elements
extern alias CoordPlus;

using System;
using CoordPlus::AddisonWesley.Michaelis.EssentialCSharp
// Equivalent also allowed
// using CoordPlus.AddisonWesley.Michaelis.EssentialCSharp

using global::AddisonWesley.Michaelis.EssentialCSharp
// Equivalent NOT allowed
// using global.AddisonWesley.Michaelis.EssentialCSharp

public class Program
{
// ...
}


Once the extern alias for CoordPlus appears, you can reference the namespace using CoordPlus, followed by either two colons or a period.

To ensure that the lookup for the type occurs in the global namespace, C# 2.0 allows items to have the global:: qualifier (but not global., because it could imaginably conflict with a real namespace of global).

XML Comments

Chapter 1 introduced comments. However, you can use XML comments for more than just notes to other developers reviewing the source code. XML-based comments follow a practice popularized with Java. Although the C# compiler ignores all comments as far as the resultant executable goes, the developer can use command-line options to instruct the compiler4 to extract the XML comments into a separate XML file. By taking advantage of the XML file generation, the developer can generate documentation of the API from the XML comments. In addition, C# editors can parse the XML comments in the code and display them to developers as distinct regions (for example, as a different color from the rest of the code), or parse the XML comment data elements and display them to the developer.

4. The C# standard does not specify whether the C# compiler or a separate utility should take care of extracting the XML data. However, all mainstream C# compilers include the necessary functionality via a compile switch instead of within an additional utility.

Figure 9.2 demonstrates how an IDE can take advantage of XML comments to assist the developer with a tip about the code he is trying to write. Such coding tips offer significant assistance in large programs, especially when multiple developers share code. For this to work, however, the developer obviously must take the time to enter the XML comments within the code and then direct the compiler to create the XML file. The next section explains how to accomplish this.

Image

FIGURE 9.2: XML Comments As Tips in Visual Studio IDE

Begin 2.0

Associating XML Comments with Programming Constructs

Consider the listing of the DataStorage class, as shown in Listing 9.17.

LISTING 9.17: Commenting Code with XML Comments

Listing 9.17 uses both XML-delimited comments that span multiple lines and single-line XML comments in which each line requires a separate three-forward-slash delimiter (///).

Given that XML comments are designed to document the API, they are intended for use only in association with C# declarations, such as the class or method shown in Listing 9.17. Any attempt to place an XML comment inline with the code, unassociated with a declaration, will result in a warning by the compiler. The compiler makes the association simply because the XML comment appears immediately before the declaration.

Although C# allows any XML tag to appear in comments, the C# standard explicitly defines a set of tags to be used. <seealso cref="System.IO.StreamWriter"/> is an example of using the seealso tag. This tag creates a link between the text and theSystem.IO.StreamWriter class.

End 2.0

Generating an XML Documentation File

The compiler will check that the XML comments are well formed, and will issue a warning if they are not. To generate the XML file, you use the /doc option when compiling, as shown in Output 9.8.

OUTPUT 9.8

>csc /doc:Comments.xml DataStorage.cs

The /doc option creates an XML file based on the name specified after the colon. Using the CommentSamples class listed earlier and the compiler options listed here, the resultant CommentSamples.XML file appears as shown in Listing 9.18.

LISTING 9.18: Comments.xml


<?xml version="1.0"?>
<doc>
<assembly>
<name>DataStorage</name>
</assembly>
<members>
<member name="T:DataStorage">
<summary>
DataStorage is used to persist and retrieve
employee data from the files.
</summary>
</member>
<member name="M:DataStorage.Store(Employee)">
<summary>
Save an employee object to a file
named with the Employee name.
</summary>
<remarks>
This method uses
<seealso cref="T:System.IO.FileStream"/>
in addition to
<seealso cref="T:System.IO.StreamWriter"/>
</remarks>
<param name="employee">
The employee to persist to a file</param>
<date>January 1, 2000</date>
</member>
<member name="M:DataStorage.Load(
System.String,System.String)">
<summary>
Loads up an employee object
</summary>
<remarks>
This method uses
<seealso cref="T:System.IO.FileStream"/>
in addition to
<seealso cref="T:System.IO.StreamReader"/>
</remarks>
<param name="firstName">
The first name of the employee</param>
<param name="lastName">
The last name of the employee</param>
<returns>
The employee object corresponding to the names
</returns>
<date>January 1, 2000</date>*
</member>
</members>
</doc>


The resultant file includes only the amount of metadata that is necessary to associate an element back to its corresponding C# declaration. This is important to note, because in general, it is necessary to use the XML output in combination with the generated assembly to produce any meaningful documentation. Fortunately, tools such as the free GhostDoc5 and the open source project NDoc6 can generate documentation.

5. See http://submain.com/ to learn more about GhostDoc.

6. See http://ndoc.sourceforge.net to learn more about NDoc.


Guidelines

DO provide XML comments on public APIs when they provide more context than the API signature alone. This includes member descriptions, parameter descriptions, and examples of calling the API.


Garbage Collection

Garbage collection is obviously a core function of the runtime. Its purpose is to restore memory consumed by objects that are no longer referenced. The emphasis in this statement is on memory and references: The garbage collector is only responsible for restoring memory; it does not handle other resources such as database connections, handles (files, windows, and so on), network ports, and hardware devices such as serial ports. Also, the garbage collector determines what to clean up based on whether any references remain. Implicitly, this means that the garbage collector works with reference objects and restores memory on the heap only. Additionally, it means that maintaining a reference to an object will delay the garbage collector from reusing the memory consumed by the object.


Advanced Topic: Garbage Collection in .NET

Many details about the garbage collector pertain to the specific CLI implementation and, therefore, could vary. This section discusses the .NET implementation, because it is the most prevalent.

In .NET, the garbage collector uses a mark-and-compact algorithm. At the beginning of an iteration, it identifies all root references to objects. Root references are any references from static variables, CPU registers, and local variables or parameter instances (and f-reachable objects as described later in this section). Given this list, the garbage collector is able to walk through the tree identified by each root reference and determine recursively all the objects to which the root references point. In this manner, the garbage collector creates a graph of all reachable objects.

Instead of enumerating all the inaccessible objects, the garbage collector performs garbage collection by compacting all reachable objects next to one another, thereby overwriting any memory consumed by objects that are inaccessible (and therefore qualify as garbage).

Locating and moving all reachable objects requires that the system maintain a consistent state while the garbage collector runs. To achieve this, all managed threads within the process halt during garbage collection. Obviously, this behavior can result in brief pauses in an application, which are generally insignificant unless a particularly large garbage collection cycle is necessary. To reduce the likelihood of a garbage collection cycle occurring at an inopportune time, the System.GC object includes a Collect() method, which can be called immediately before the critical performing code. This will not prevent the garbage collector from running, but it does reduce the probability that it will run, assuming no intense memory utilization occurs during the critical performance code.

One perhaps surprising aspect of .NET garbage collection behavior is that not all garbage is necessarily cleaned up during an iteration. Studies of object lifetimes reveal that recently created objects are more likely to need garbage collection than long-standing objects. Capitalizing on this behavior, the .NET garbage collector is generational, attempting to clean up short-lived objects more frequently than objects that have already survived a previous garbage collection iteration. Specifically, objects are organized into three generations. Each time an object survives a garbage collection cycle, it is moved to the next generation, until it ends up in generation 2 (counting starts from zero). The garbage collector then runs more frequently for objects in generation 0 than it does for objects in generation 2.

Over time, in spite of the trepidation that .NET stirred during its early beta releases when compared with unmanaged code, .NET’s garbage collection has proved extremely efficient. More importantly, the gains realized in development productivity have far outweighed the costs in development for the few cases where managed code is dropped to optimize particular algorithms.


Weak References

All references discussed so far are strong references because they maintain an object’s accessibility and prevent the garbage collector from cleaning up the memory consumed by the object. The framework also supports the concept of weak references, however. Weak references will not prevent garbage collection on an object, but they do maintain a reference so that if the garbage collector does not clean up the object, it can be reused.

Weak references are designed for objects that are expensive to create, yet too expensive to keep around. Consider, for example, a large list of objects loaded from a database and displayed to the user. The loading of this list is potentially expensive, and once the user closes the list, it should be available for garbage collection. However, if the user requests the list multiple times, a second expensive load call will always be required. With weak references, it becomes possible to use code to check whether the list has not yet been cleaned up, and if not, to re-reference the same list. In this way, weak references serve as a memory cache for objects. Objects within the cache are retrieved quickly, but if the garbage collector has recovered the memory of these objects, they will need to be re-created.

Once an object (or collection of objects) is recognized as worthy of potential weak reference consideration, it needs to be assigned to System.WeakReference (see Listing 9.19).

LISTING 9.19: Using a Weak Reference


// ...

private WeakReference Data;

public FileStream GetData()
{
FileStream data = (FileStream)Data.Target;
if (data != null)
{
return data;
}
else
{
// Load data
// ...

// Create a weak reference
// to data for use later.
Data.Target = data;
}
return data;
}

// ...


Given the assignment of WeakReference (Data), you can check for garbage collection by seeing if the weak reference is set to null. The key when doing so is to first assign the weak reference to a strong reference (FileStream data = Data) to avoid the possibility that between checking for null and accessing the data, the garbage collector will run and clean up the weak reference. The strong reference obviously prevents the garbage collector from cleaning up the object, so it must be assigned first (instead of checking Target for null).

Resource Cleanup

Garbage collection is a key responsibility of the runtime. Nevertheless, it is important to recognize that the garbage collection process centers on the code’s memory utilization. It is not about the cleaning up of file handles, database connection strings, ports, or other limited resources.

Finalizers

Finalizers allow developers to write code that will clean up a class’s resources. Unlike constructors that are called explicitly using the new operator, finalizers cannot be called explicitly from within the code. There is no new equivalent such as a delete operator. Rather, the garbage collector is responsible for calling a finalizer on an object instance. Therefore, developers cannot determine at compile time exactly when the finalizer will execute. All they know is that the finalizer will run sometime between when an object was last used and when the application shuts down normally. (Finalizers might not execute if the process is terminated abnormally. For instance, events such as the computer being turned off or a forced termination of the process will prevent the finalizer from running.)


Note

You cannot determine at compile time exactly when the finalizer will execute.


The finalizer declaration is identical to the destructor syntax of C#’s predecessor—namely, C++. As shown in Listing 9.20, the finalizer declaration is prefixed with a tilde before the name of the class.

LISTING 9.20: Defining a Finalizer


using System.IO;

class TemporaryFileStream
{
public TemporaryFileStream(string fileName)
{
File = new FileInfo(fileName);
Stream = new FileStream(
File.FullName, FileMode.OpenOrCreate,
FileAccess.ReadWrite);
}

public TemporaryFileStream()
: this(Path.GetTempFileName()) { }

// Finalizer
~TemporaryFileStream()
{
Close();
}

public FileStream Stream { get; }
public FileInfo File { get; }

public void Close()
{
Stream?.Close();
File?.Delete();
}
}


Finalizers do not allow any parameters to be passed, so they cannot be overloaded. Furthermore, finalizers cannot be called explicitly—that is, only the garbage collector can invoke a finalizer. Access modifiers on finalizers are therefore meaningless, and as such, they are not supported. Finalizers in base classes will be invoked automatically as part of an object finalization call.


Note

Finalizers cannot be called explicitly; only the garbage collector can invoke a finalizer.


Because the garbage collector handles all memory management, finalizers are not responsible for de-allocating memory. Rather, they are responsible for freeing up resources such as database connections and file handles—resources that require an explicit activity that the garbage collector doesn’t know about.

Finalizers execute on their own thread, making their execution even less deterministic. This indeterminate nature makes an unhandled exception within a finalizer (outside of the debugger) difficult to diagnose because the circumstances that led to the exception are not clear. From the user’s perspective, the unhandled exception will be thrown relatively randomly and with little regard for any action the user was performing. For this reason, you should take care to avoid exceptions within finalizers. Instead, you should use defensive programming techniques such as checking for nulls (refer to Listing 9.20).

Deterministic Finalization with the using Statement

The problem with finalizers on their own is that they don’t support deterministic finalization (the ability to know when a finalizer will run). Rather, finalizers serve the important role of being a backup mechanism for cleaning up resources if a developer using a class neglects to call the requisite cleanup code explicitly.

For example, consider the TemporaryFileStream, which includes not only a finalizer but also a Close() method. This class uses a file resource that could potentially consume a significant amount of disk space. The developer using TemporaryFileStream can explicitly callClose() to restore the disk space.

Providing a method for deterministic finalization is important because it eliminates a dependency on the indeterminate timing behavior of the finalizer. Even if the developer fails to call Close() explicitly, the finalizer will take care of the call. In such a case, the finalizer will run later than if it was called explicitly—but it will be called eventually.

Because of the importance of deterministic finalization, the base class library includes a specific interface for the pattern and C# integrates the pattern into the language. The IDisposable interface defines the details of the pattern with a single method called Dispose(), which developers call on a resource class to “dispose” of the consumed resources. Listing 9.21 demonstrates the IDisposable interface and some code for calling it.

LISTING 9.21: Resource Cleanup with IDisposable


using System;
using System.IO;

class Program
{
// ...
static void Search()
{
TemporaryFileStream fileStream =
new TemporaryFileStream();

// Use temporary file stream;
// ...

fileStream.Dispose();

// ...
}
}


class TemporaryFileStream : IDisposable
{
public TemporaryFileStream(string fileName)
{
File = new FileInfo(fileName);
Stream = new FileStream(
File.FullName, FileMode.OpenOrCreate,
FileAccess.ReadWrite);
}

public TemporaryFileStream()
: this(Path.GetTempFileName()) { }

~TemporaryFileStream()
{
Dispose(false);
}

public FileStream Stream { get; }
public FileInfo File { get; }

public void Close()
{
Dispose();
}

#region IDisposable Members
public void Dispose()
{
Dispose(true);

// Turn off calling the finalizer
System.GC.SuppressFinalize(this);
}
#endregion
public void Dispose(bool disposing)
{
// Do not dispose of an owned managed object (one with a
// finalizer) if called by member finalize,
// as the owned managed objects finalize method
// will be (or has been) called by finalization queue
// processing already
if (disposing)
{
Stream?.Close();
}
File?.Delete();
}
}


From Program.Search(), there is an explicit call to Dispose() after using the TemporaryFileStream. Dispose() is the method responsible for cleaning up the resources (in this case, a file) that are not related to memory and, therefore, subject to cleanup implicitly by the garbage collector. Nevertheless, the execution here contains a hole that would prevent execution of Dispose()—namely, the chance that an exception will occur between the time when TemporaryFileStream is instantiated and the time when Dispose() is called. If this happens,Dispose() will not be invoked and the resource cleanup will have to rely on the finalizer. To avoid this problem, callers need to implement a try/finally block. Instead of requiring programmers to code such a block explicitly, C# provides a using statement expressly for the purpose (Listing 9.22).

LISTING 9.22: Invoking the using Statement


class Program
{
// ...

static void Search()
{
using (TemporaryFileStream fileStream1 =
new TemporaryFileStream(),
fileStream2 = new TemporaryFileStream())
{
// Use temporary file stream;
}
}
}


The resultant CIL code is identical to the code that would be created if the programmer specified an explicit try/finally block, where fileStream.Dispose() is called in the finally block. The using statement, however, provides a syntax shortcut for the try/finally block.

Within a using statement, you can instantiate more than one variable by separating each variable from the others with a comma. The key considerations are that all variables must be of the same type and that they implement IDisposable. To enforce the use of the same type, the data type is specified only once rather than before each variable declaration.

Garbage Collection, Finalization, and IDisposable

There are several additional noteworthy items to point out in Listing 9.21. First, the IDisposable.Dispose() method contains an important call to System.GC.SuppressFinalize(). Its purpose is to remove the TemporaryFileStream class instance from the finalization (f-reachable) queue. This is possible because all cleanup was done in the Dispose() method rather than waiting for the finalizer to execute.

Without the call to SuppressFinalize(), the instance of the object will be included in the f-reachable queue—a list of all the objects that are mostly ready for garbage collection except they also have finalization implementations. The runtime cannot garbage-collect objects with finalizers until after their finalization methods have been called. However, garbage collection itself does not call the finalization method. Rather, references to finalization objects are added to the f-reachable queue, and are processed by an additional thread at a time deemed appropriate based on the execution context. In an ironic twist, this approach delays garbage collection for the managed resources—when it is most likely that these very resources should likely be cleaned up earlier. The reason for the delay is that the f-reachable queue is a list of “references”; as such, the objects are not considered garbage until after their finalization methods are called and the object references are removed from the f-reachable queue.


Note

Objects with finalizers that are not explicitly disposed will end up with an extended object lifetime. Even after all explicit references have gone out of scope, the f-reachable queue will have references, keeping the object alive until the f-reachable queue processing is complete.


It is for this reason that Dispose() invokes System.GC.SuppressFinalize. Invoking this method informs the runtime not to add this object to the finalization queue, but instead to allow the garbage collector to de-allocate the object when it no longer has any references (including any f-reachable references).

Second, Dispose() calls Dispose(bool disposing) with an argument of true. The result is that the Dispose() method on Stream is invoked (cleaning up its resources and suppressing its finalization). Next, the temporary file itself is deleted immediately upon callingDispose(). This important call eliminates the need to wait for the finalization queue to be processed before cleaning up potentially expensive resources.

Third, rather than calling Close(), the finalizer now calls Dispose(bool disposing) with an argument of false. The result is that Stream is not closed (disposed) even though the file is deleted. The condition around closing Stream ensures that if Dispose(bool disposing) is called from the finalizer, the Stream instance itself will also be queued up for finalization processing (or possibly it would have already run depending on the order). Therefore, when executing the finalizer, objects owned by the managed resource should not be cleaned up, as this action will be the responsibility of the finalization queue.

Fourth, you should use caution when creating both a Close() type and a Dispose() method. It is not clear by looking at only the API that Close() calls Dispose(), so developers will be left wondering whether they need to explicitly call Close() and Dispose().


Language Contrast: C++—Deterministic Destruction

Although finalizers are similar to destructors in C++, the fact that their execution cannot be determined at compile time makes them distinctly different. The garbage collector calls C# finalizers sometime after they were last used, but before the program shuts down; C++ destructors are automatically called when the object (not a pointer) goes out of scope.

Although running the garbage collector can be a relatively expensive process, the fact that garbage collection is intelligent enough to delay running until process utilization is somewhat reduced offers an advantage over deterministic destructors, which will run at compile-time–defined locations, even when a processor is in high demand.



Guidelines

DO implement a finalizer method only on objects with resources that are scarce or expensive, even though finalization delays garbage collection.

DO implement IDisposable to support deterministic finalization on classes with finalizers.

DO implement a finalizer method on classes that implement IDisposable in case Dispose() is not invoked explicitly.

DO refactor a finalization method to call the same code as IDisposable, perhaps simply calling the Dispose() method.

DO NOT throw exceptions from finalizer methods.

DO call System.GC.SuppressFinalize() from Dispose() to avoid repeating resource cleanup and delaying garbage collection on an object.

DO ensure that Dispose() is idempotent (it should be possible to call Dispose() multiple times).

DO keep Dispose() simple, focusing on resource cleanup required by finalization.

AVOID calling Dispose () on owned objects that have a finalizer. Instead, rely on the finalization queue to clean up the instance.

AVOID referencing other objects that are not being finalized during finalization.

DO invoke a base class’s Dispose() method when overriding Dispose().

CONSIDER ensuring that an object becomes unusable after Dispose() is called. After an object has been disposed, methods other than Dispose() (which could potentially be called multiple times) should throw an ObjectDisposedException.

DO implement IDisposable on types that own disposable fields (or properties) and dispose of said instances.



Advanced Topic: Exception Propagating from Constructors

Even when an exception propagates out of a constructor, the object is still instantiated, although no new instance is returned by the new operator. If the type defines a finalizer, the method will run when the object becomes eligible for garbage collection (providing additional motivation to ensure the finalize method can run on partially constructed objects). Also note that if a constructor prematurely shares its this reference, it will still be accessible even if the constructor throws an exception. Do not allow this scenario to occur.



Advanced Topic: Resurrecting Objects

By the time an object’s finalization method is called, all references to the object have disappeared and the only step before garbage collection is running the finalization code. Even so, it is possible to add a reference inadvertently for a finalization object back into the root reference’s graph. In such a case, the re-referenced object will no longer be inaccessible; in turn, it will not be ready for garbage collection. However, if the finalization method for the object has already run, it will not necessarily run again unless it is explicitly marked for finalization (using theGC.ReRegisterFinalize() method).

Obviously, resurrecting objects in this manner is peculiar behavior, and you should generally avoid it. Finalization code should be simple and should focus on cleaning up only the resources that it references.


Lazy Initialization

In this preceding section, we discussed how to deterministically dispose of an object with a using statement and how the finalization queue will dispose of resources in the event that no deterministic approach is used.

A related pattern is called lazy initialization or lazy loading. Using lazy initialization, you can create (or obtain) objects when you need them rather than beforehand—the latter can be an especially problematic situation when those objects are never used. Consider the FileStreamproperty of Listing 9.23.

LISTING 9.23: Lazy Loading a Property


using System.IO;

class DataCache
{
// ...

public TemporaryFileStream FileStream =>
InternalFileStream??(InternalFileStream =
new TemporaryFileStream());

private TemporaryFileStream InternalFileStream
{ get; set; } = null;
// ...
}


In the FileStream expression-bodied property, we check whether InternalFileStream is not null before returning its value directly. If InternalFileStream is null, we first instantiate the TemporaryFileStream object and assign it to InternalFileStream before returning the new instance. Thus, the TemporaryFileStream required in the FileStream property is created only when the getter on the property is called. If the getter is never invoked, the TemporaryFileStream object would not be instantiated and we would save whatever execution time such an instantiation would cost. Obviously, if the instantiation is negligible or inevitable (and postponing the inevitable is less desirable), simply assigning it during declaration or in the constructor makes sense.

Begin 4.0


Advanced Topic: Lazy Loading with Generics and Lambda Expressions

Starting with .NET Framework 4.0, a new class was added to the CLR to assist with lazy initialization: System.Lazy<T>. Listing 9.24 demonstrates how to use it.

LISTING 9.24: Lazy Loading a Property with System.Lazy<T>


using System.IO;

class DataCache
{
// ...

public TemporaryFileStream FileStream =>
InternalFileStream.Value;
private Lazy<TemporaryFileStream> InternalFileStream { get; }
= new Lazy<TemporaryFileStream>(
() => new TemporaryFileStream() );

// ...
}


The System.Lazy<T> class takes a type parameter (T) that identifies which type the Value property on System.Lazy<T> will return. Instead of assigning a fully constructed TemporaryFileStream to the _FileStream field, an instance ofLazy<TemporaryFileStream> is assigned (a lightweight call), delaying the instantiation of the TemporaryFileStream itself, until the Value property (and therefore the FileStream property) is accessed.

If in addition to type parameters (generics) you use delegates, you can even provide a function for how to initialize an object when the Value property is accessed. Listing 9.24 demonstrates passing the delegate—a lambda expression in this case—into the constructor forSystem.Lazy<T>.

Note that the lambda expression itself, () => new TemporaryFileStream(FileStreamName), does not execute until Value is called. Rather, the lambda expression provides a means of passing the instructions for what will happen; it does not actually execute those instructions until explicitly requested to do so.

One obvious question is when you should use the System.Lazy<T> rather than the approach outlined in Listing 9.23. The difference is negligible: In fact, Listing 9.23 may actually be simpler. That is, it is simpler until there are multiple threads involved, such that a race condition might occur regarding the instantiation. In Listing 9.23, more than one check for null might potentially occur before instantiation, resulting in multiple instances being created. In contrast, System.Lazy<T> provides a thread-safe mechanism ensuring that one and only one object will be created.


End 4.0

Summary

This chapter provided a whirlwind tour of many topics related to building solid class libraries. All the topics pertain to internal development as well, but they are much more critical to building robust classes. Ultimately, the focus here was on forming more robust and programmable APIs. In the category of robustness, we can include namespaces and garbage collection. Both of these topics fit in the programmability category as well, along with overriding object’s virtual members, operator overloading, and XML comments for documentation.

Exception handling uses inheritance heavily by defining an exception hierarchy and enforcing custom exceptions to fit within this hierarchy. Furthermore, the C# compiler uses inheritance to verify catch block order. In the next chapter, you will see why inheritance is such a core part of exception handling.