Platform Interoperability and Unsafe Code - Essential C# 6.0 (2016)

Essential C# 6.0 (2016)

20. Platform Interoperability and Unsafe Code

C# has great capabilities, particularly when paired with the .NET libraries. Sometimes, however, you do need to escape out of all the safety that C# provides and step back into the world of memory addresses and pointers. C# supports this action in four ways. The first option is to go through Platform Invoke (P/Invoke) and calls into APIs exposed by unmanaged DLLs. The second way is through unsafe code, which enables access to memory pointers and addresses. The third approach—although specific to Windows 8 or newer—is through the Windows Runtime (WinRT) API, which is exposing more and more of the operating system functions and making them directly available in C# 5.0 or higher. The last way, which is not covered in this text, is through COM interoperability.

Image

The majority of the chapter discusses interoperability with unmanaged code, and the use of unsafe code. This discussion culminates with a small program that determines whether the computer is a virtual computer. The code requires that you do the following:

1. Call into an operating system DLL and request allocation of a portion of memory for executing instructions.

2. Write some assembler instructions into the allocated area.

3. Inject an address location into the assembler instructions.

4. Execute the assembler code.

Aside from the P/Invoke and unsafe constructs covered here, the full listing demonstrates the full power of C# and the fact that the capabilities of unmanaged code are still accessible from C# and managed code. We end this chapter by briefly discussing WinRT so developers are aware of some of its distinguishing characteristics before using it.

Platform Invoke

Whether a developer is trying to call a library of existing unmanaged code, accessing unmanaged code in the operating system not exposed in any managed API, or trying to achieve maximum performance for a particular algorithm by avoiding the runtime overhead of type checking and garbage collection, at some point there must be a call into unmanaged code. The CLI provides this capability through P/Invoke. With P/Invoke, you can make API calls into exported functions of unmanaged DLLs.

All of the APIs invoked in this section are Windows APIs. Although the same APIs are not available on other platforms, developers can still use P/Invoke for APIs native to their platform, or for calls into their own DLLs. The guidelines and syntax are the same.

Declaring External Functions

Once the target function is identified, the next step of P/Invoke is to declare the function with managed code. Just as with all regular methods that belong to a class, you need to declare the targeted API within the context of a class, but by using the extern modifier. Listing 20.1 demonstrates how to do this.

LISTING 20.1: Declaring an External Method


using System;
using System.Runtime.InteropServices;
class VirtualMemoryManager
{
[DllImport("kernel32.dll", EntryPoint="GetCurrentProcess")]
internal static extern IntPtr GetCurrentProcessHandle();
}


In this case, the class is VirtualMemoryManager, because it will contain functions associated with managing memory. (This particular function is available directly off the System.Diagnostics.Processor class, so there is no need to declare it in real code.) Note that the method returns an IntPtr; this type is explained in the next section.

The extern methods never include any body and are (almost) always static. Instead of a method body, the DllImport attribute, which accompanies the method declaration, points to the implementation. At a minimum, the attribute needs the name of the DLL that defines the function. The runtime determines the function name from the method name, although you can override this default using the EntryPoint named parameter to provide the function name. (The .NET platform will automatically attempt calls to the Unicode [...W] or ASCII [...A] API version.)

It this case, the external function, GetCurrentProcess(), retrieves a pseudohandle for the current process that you will use in the call for virtual memory allocation. Here’s the unmanaged declaration:

HANDLE GetCurrentProcess();

Parameter Data Types

Assuming the developer has identified the targeted DLL and exported function, the most difficult step is identifying or creating the managed data types that correspond to the unmanaged types in the external function.1 Listing 20.2 shows a more difficult API.

1. One particularly helpful resource for declaring Win32 APIs is www.pinvoke.net. It provides a great starting point for many APIs, helping you avoid some of the subtle problems that can arise when coding an external API call from scratch.

LISTING 20.2: The VirtualAllocEx() API


LPVOID VirtualAllocEx(
HANDLE hProcess, // The handle to a process. The
// function allocates memory within
// the virtual address space of this
// process.
LPVOID lpAddress, // The pointer that specifies a
// desired starting address for the
// region of pages that you want to
// allocate. If lpAddress is NULL,
// the function determines where to
// allocate the region.
SIZE_T dwSize, // The size of the region of memory to
// allocate, in bytes. If lpAddress
// is NULL, the function rounds dwSize
// up to the next page boundary.
DWORD flAllocationType, // The type of memory allocation.
DWORD flProtect); // The type of memory allocation.


VirtualAllocEx() allocates virtual memory that the operating system specifically designates for execution or data. To call it, you need corresponding definitions in managed code for each data type; although common in Win32 programming, HANDLE, LPVOID, SIZE_T, andDWORD are undefined in the CLI managed code. The declaration in C# for VirtualAllocEx(), therefore, is shown in Listing 20.3.

LISTING 20.3: Declaring the VirtualAllocEx() API in C#


using System;
using System.Runtime.InteropServices;
class VirtualMemoryManager
{
[DllImport("kernel32.dll")]
internal static extern IntPtr GetCurrentProcess();

[DllImport("kernel32.dll", SetLastError = true)]
private static extern IntPtr VirtualAllocEx(
IntPtr hProcess,
IntPtr lpAddress,
IntPtr dwSize,
AllocationType flAllocationType,
uint flProtect);
}


One distinct characteristic of managed code is that primitive data types such as int do not change their size based on the processor. Whether the processor is 16, 32, or 64 bits, int is always 32 bits. In unmanaged code, however, memory pointers will vary depending on the processor. Therefore, instead of mapping types such as HANDLE and LPVOID simply to ints, you need to map to System.IntPtr, whose size will vary depending on the processor memory layout. This example also uses an AllocationType enum, which we discuss in the section “Simplifying API Calls with Wrappers” later in this chapter.

An interesting point to note about Listing 20.3 is that IntPtr is not just useful for pointers; that is, it is useful for other things such as quantities. IntPtr does not mean just “pointer stored in an integer”; it also means “integer that is the size of a pointer.” An IntPtr need not contain a pointer, but simply needs to contain something the size of a pointer. Lots of things are the size of a pointer but are nevertheless not pointers.

Using ref Rather Than Pointers

Frequently, unmanaged code uses pointers for pass-by-reference parameters. In these cases, P/Invoke doesn’t require that you map the data type to a pointer in managed code. Instead, you map the corresponding parameters to ref (or out, depending on whether the parameter is in/out or just out). In Listing 20.4, lpflOldProtect, whose data type is PDWORD, is an example that returns the “pointer to a variable that receives the previous access protection of the first page in the specified region of pages.”2

2. MSDN documentation.

LISTING 20.4: Using ref and out Rather Than Pointers


class VirtualMemoryManager
{
// ...
[DllImport("kernel32.dll", SetLastError = true)]
static extern bool VirtualProtectEx(
IntPtr hProcess, IntPtr lpAddress,
IntPtr dwSize, uint flNewProtect,
ref uint lpflOldProtect);
}


Despite the fact that lpflOldProtect is documented as [out] (even though the signature doesn’t enforce it), the description goes on to mention that the parameter must point to a valid variable and not NULL. This inconsistency is confusing, but commonly encountered. The guideline is to use ref rather than out for P/Invoke type parameters since the callee can always ignore the data passed with ref, but the converse will not necessarily succeed.

The other parameters are virtually the same as VirtualAllocEx(), except that the lpAddress is the address returned from VirtualAllocEx(). In addition, flNewProtect specifies the exact type of memory protection: page execute, page read-only, and so on.

Using StructLayoutAttribute for Sequential Layout

Some APIs involve types that have no corresponding managed type. Calling these types requires redeclaration of the type in managed code. You declare the unmanaged COLORREF struct, for example, in managed code (see Listing 20.5).

LISTING 20.5: Declaring Types from Unmanaged Structs


[StructLayout(LayoutKind.Sequential)]
struct ColorRef
{
public byte Red;
public byte Green;
public byte Blue;
// Turn off warning about not accessing Unused.
#pragma warning disable 414
private byte Unused;
#pragma warning restore 414

public ColorRef(byte red, byte green, byte blue)
{
Blue = blue;
Green = green;
Red = red;
Unused = 0;
}
}


Various Microsoft Windows color APIs use COLORREF to represent RGB colors (that is, levels of red, green, and blue).

The key in this declaration is StructLayoutAttribute. By default, managed code can optimize the memory layouts of types, so layouts may not be sequential from one field to the next. To force sequential layouts so that a type maps directly and can be copied bit for bit (blitted) from managed to unmanaged code, and vice versa, you add the StructLayoutAttribute with the LayoutKind.Sequential enum value. (This is also useful when writing data to and from filestreams where a sequential layout may be expected.)

Since the unmanaged (C++) definition for struct does not map to the C# definition, there is not a direct mapping of unmanaged struct to managed struct. Instead, developers should follow the usual C# guidelines about whether the type should behave like a value or a reference type, and whether the size is small (approximately less than 16 bytes).

Error Handling

One inconvenient aspect of Win32 API programming is the fact that the APIs frequently report errors in inconsistent ways. For example, some APIs return a value (0, 1, false, and so on) to indicate an error, whereas others set an out parameter in some way. Furthermore, the details of what went wrong require additional calls to the GetLastError() API and then an additional call to FormatMessage() to retrieve an error message corresponding to the error. In summary, Win32 error reporting in unmanaged code seldom occurs via exceptions.

Fortunately, the P/Invoke designers provided a mechanism for error handling. To enable it, if the SetLastError named parameter of the DllImport attribute is true, it is possible to instantiate a System.ComponentModel.Win32Exception() that is automatically initialized with the Win32 error data immediately following the P/Invoke call (see Listing 20.6).

LISTING 20.6: Win32 Error Handling


class VirtualMemoryManager
{
[DllImport("kernel32.dll", ", SetLastError = true)]
private static extern IntPtr VirtualAllocEx(
IntPtr hProcess,
IntPtr lpAddress,
IntPtr dwSize,
AllocationType flAllocationType,
uint flProtect);

// ...
[DllImport("kernel32.dll", SetLastError = true)]
static extern bool VirtualProtectEx(
IntPtr hProcess, IntPtr lpAddress,
IntPtr dwSize, uint flNewProtect,
ref uint lpflOldProtect);

[Flags]
private enum AllocationType : uint
{
// ...
}

[Flags]
private enum ProtectionOptions
{
// ...
}

[Flags]
private enum MemoryFreeType
{
// ...
}

public static IntPtr AllocExecutionBlock(
int size, IntPtr hProcess)
{
IntPtr codeBytesPtr;
codeBytesPtr = VirtualAllocEx(
hProcess, IntPtr.Zero,
(IntPtr)size,
AllocationType.Reserve | AllocationType.Commit,
(uint)ProtectionOptions.PageExecuteReadWrite);

if (codeBytesPtr == IntPtr.Zero)
{
throw new System.ComponentModel.Win32Exception();
}

uint lpflOldProtect = 0;
if (!VirtualProtectEx(
hProcess, codeBytesPtr,
(IntPtr)size,
(uint)ProtectionOptions.PageExecuteReadWrite,
ref lpflOldProtect))
{
throw new System.ComponentModel.Win32Exception();
}
return codeBytesPtr;
}

public static IntPtr AllocExecutionBlock(int size)
{
return AllocExecutionBlock(
size, GetCurrentProcessHandle());
}
}


This code enables developers to provide the custom error checking that each API uses while still reporting the error in a standard manner.

Listing 20.1 and Listing 20.3 declared the P/Invoke methods as internal or private. Except for the simplest of APIs, wrapping methods in public wrappers that reduce the complexity of the P/Invoke API calls is a good guideline that increases API usability and moves toward object-orientedtype structure. The AllocExecutionBlock() declaration in Listing 20.6 provides a good example of this approach.


Guidelines

DO create public managed wrappers around unmanaged methods that use the conventions of managed code, such as structured exception handling.


Begin 2.0

Using SafeHandle

Frequently, P/Invoke involves a resource, such as a window handle, that code needs to clean up after using. Instead of requiring developers to remember this step is necessary and manually code it each time, it is helpful to provide a class that implements IDisposable and a finalizer. InListing 20.7, for example, the address returned after VirtualAllocEx() and VirtualProtectEx() requires a follow-up call to VirtualFreeEx(). To provide built-in support for this process, you define a VirtualMemoryPtr class that derives fromSystem.Runtime.InteropServices.SafeHandle.

LISTING 20.7: Managed Resources Using SafeHandle


public class VirtualMemoryPtr :
System.Runtime.InteropServices.SafeHandle
{
public VirtualMemoryPtr(int memorySize) :
base(IntPtr.Zero, true)
{
ProcessHandle =
VirtualMemoryManager.GetCurrentProcessHandle();
MemorySize = (IntPtr)memorySize;
AllocatedPointer =
VirtualMemoryManager.AllocExecutionBlock(
memorySize, ProcessHandle);
Disposed = false;
}

public readonly IntPtr AllocatedPointer;
readonly IntPtr ProcessHandle;
readonly IntPtr MemorySize;
bool Disposed;

public static implicit operator IntPtr(
VirtualMemoryPtr virtualMemoryPointer)
{
return virtualMemoryPointer.AllocatedPointer;
}

// SafeHandle abstract member
public override bool IsInvalid
{
get
{
return Disposed;
}
}

// SafeHandle abstract member
protected override bool ReleaseHandle()
{
if (!Disposed)
{
Disposed = true;
GC.SuppressFinalize(this);
VirtualMemoryManager.VirtualFreeEx(ProcessHandle,
AllocatedPointer, MemorySize);
}
return true;
}
}


System.Runtime.InteropServices.SafeHandle includes the abstract members IsInvalid and ReleaseHandle(). You place your cleanup code in the latter; the former indicates whether this code has executed yet.

With VirtualMemoryPtr, you can allocate memory simply by instantiating the type and specifying the needed memory allocation.

End 2.0

Calling External Functions

Once you declare the P/Invoke functions, you invoke them just as you would any other class member. The key, however, is that the imported DLL must be in the path, including the executable directory, so that it can be successfully loaded. Listing 20.6 and Listing 20.7 provide demonstrations of this approach. However, they rely on some constants.

Since flAllocationType and flProtect are flags, it is a good practice to provide constants or enums for each. Instead of expecting the caller to define these constants or enums, encapsulation suggests you provide them as part of the API declaration, as shown in Listing 20.8.

LISTING 20.8: Encapsulating the APIs Together


class VirtualMemoryManager
{
// ...

/// <summary>
/// The type of memory allocation. This parameter must
/// contain one of the following values.
/// </summary>
[Flags]
private enum AllocationType : uint
{
/// <summary>
/// Allocates physical storage in memory or in the
/// paging file on disk for the specified reserved
/// memory pages. The function initializes the memory
/// to zero.
/// </summary>
Commit = 0x1000,
/// <summary>
/// Reserves a range of the process's virtual address
/// space without allocating any actual physical
/// storage in memory or in the paging file on disk.
/// </summary>
Reserve = 0x2000,
/// <summary>
/// Indicates that data in the memory range specified by
/// lpAddress and dwSize is no longer of interest. The
/// pages should not be read from or written to the
/// paging file. However, the memory block will be used
/// again later, so it should not be decommitted. This
/// value cannot be used with any other value.
/// </summary>
Reset = 0x80000,
/// <summary>
/// Allocates physical memory with read-write access.
/// This value is solely for use with Address Windowing
/// Extensions (AWE) memory.
/// </summary>
Physical = 0x400000,
/// <summary>
/// Allocates memory at the highest possible address.
/// </summary>
TopDown = 0x100000,
}

/// <summary>
/// The memory protection for the region of pages to be
/// allocated.
/// </summary>
[Flags]
private enum ProtectionOptions : uint
{
/// <summary>
/// Enables execute access to the committed region of
/// pages. An attempt to read or write to the committed
/// region results in an access violation.
/// </summary>
Execute = 0x10,
/// <summary>
/// Enables execute and read access to the committed
/// region of pages. An attempt to write to the
/// committed region results in an access violation.
/// </summary>
PageExecuteRead = 0x20,
/// <summary>
/// Enables execute, read, and write access to the
/// committed region of pages.
/// </summary>
PageExecuteReadWrite = 0x40,
// ...
}

/// <summary>
/// The type of free operation
/// </summary>
[Flags]
private enum MemoryFreeType : uint
{
/// <summary>
/// Decommits the specified region of committed pages.
/// After the operation, the pages are in the reserved
/// state.
/// </summary>
Decommit = 0x4000,
/// <summary>
/// Releases the specified region of pages. After this
/// operation, the pages are in the free state.
/// </summary>
Release = 0x8000
}

// ...
}


The advantage of enums is that they group together the various values. Furthermore, they can limit the scope to nothing else besides these values.

Simplifying API Calls with Wrappers

Whether it is error handling, structs, or constant values, one goal of effective API developers is to provide a simplified managed API that wraps the underlying Win32 API. For example, Listing 20.9 overloads VirtualFreeEx() with public versions that simplify the call.

LISTING 20.9: Wrapping the Underlying API


class VirtualMemoryManager
{
// ...

[DllImport("kernel32.dll", SetLastError = true)]
static extern bool VirtualFreeEx(
IntPtr hProcess, IntPtr lpAddress,
IntPtr dwSize, IntPtr dwFreeType);
public static bool VirtualFreeEx(
IntPtr hProcess, IntPtr lpAddress,
IntPtr dwSize)
{
bool result = VirtualFreeEx(
hProcess, lpAddress, dwSize,
(IntPtr)MemoryFreeType.Decommit);
if (!result)
{
throw new System.ComponentModel.Win32Exception();
}
return result;
}
public static bool VirtualFreeEx(
IntPtr lpAddress, IntPtr dwSize)
{
return VirtualFreeEx(
GetCurrentProcessHandle(), lpAddress, dwSize);
}

[DllImport("kernel32", SetLastError = true)]
static extern IntPtr VirtualAllocEx(
IntPtr hProcess,
IntPtr lpAddress,
IntPtr dwSize,
AllocationType flAllocationType,
uint flProtect);

// ...
}


Function Pointers Map to Delegates

One last key point related to P/Invoke is that function pointers in unmanaged code map to delegates in managed code. To set up a Microsoft Windows timer, for example, you would provide a function pointer that the timer could call back on, once it had expired. Specifically, you would pass a delegate instance that matches the signature of the callback.

Guidelines

Given the idiosyncrasies of P/Invoke, there are several guidelines to aid in the process of writing such code.


Guidelines

DO NOT unnecessarily replicate existing managed classes that already perform the function of the unmanaged API.

DO declare extern methods as private or internal.

DO provide public wrapper methods that use managed conventions such as structured exception handling, use of enums for special values, and so on.

DO simplify the wrapper methods by choosing default values for unnecessary parameters.

DO use the SetLastErrorAttribute to turn APIs that use SetLastError error codes into methods that throw Win32Exception.

DO extend SafeHandle or implement IDisposable and create a finalizer to ensure that unmanaged resources can be cleaned up effectively.

DO use delegate types that match the signature of the desired method when an unmanaged API requires a function pointer.

DO use ref parameters rather than pointer types when possible.


Pointers and Addresses

On occasion, developers may want to access and work with memory, and with pointers to memory locations, directly. This is necessary, for example, for certain operating system interactions as well as with certain types of time-critical algorithms. To support this capability, C# requires use of the unsafe code construct.

Unsafe Code

One of C#’s great features is the fact that it is strongly typed and supports type checking throughout the runtime execution. What makes this feature especially beneficial is that it is possible to circumvent this support and manipulate memory and addresses directly. You would do so whenworking with things such as memory-mapped devices, for example, or if you wanted to implement time-critical algorithms. The key is to designate a portion of the code as unsafe.

Unsafe code is an explicit code block and compilation option, as shown in Listing 20.10. The unsafe modifier has no effect on the generated CIL code itself, but rather is simply a directive to the compiler to permit pointer and address manipulation within the unsafe block. Furthermore,unsafe does not imply unmanaged.

LISTING 20.10: Designating a Method for Unsafe Code


class Program
{
unsafe static int Main(string[] args)
{
// ...
}
}


You can use unsafe as a modifier to the type or to specific members within the type.

In addition, C# allows unsafe as a statement that flags a code block to allow unsafe code (see Listing 20.11).

LISTING 20.11: Designating a Code Block for Unsafe Code


class Program
{
static int Main(string[] args)
{
unsafe
{
// ...
}
}
}


Code within the unsafe block can include unsafe constructs such as pointers.


Note

It is necessary to explicitly indicate to the compiler that unsafe code is supported.


From the command line, notifying the compiler that unsafe code is supported requires using the /unsafe switch. For example, to compile the preceding code, you need to use the command shown in Output 20.1.

OUTPUT 20.1

csc.exe /unsafe Program.cs

With Visual Studio, you can activate this feature by checking the Allow Unsafe Code check box from the Build tab of the Project Properties window.

Using the /unsafe switch is necessary because unsafe code opens up the possibility of buffer overflows and similar outcomes that may potentially expose security holes. The /unsafe switch enables you to directly manipulate memory and execute instructions that are unmanaged. Requiring /unsafe, therefore, makes the choice of potential exposure explicit.

Pointer Declaration

Now that you have marked a code block as unsafe, it is time to look at how to write unsafe code. First, unsafe code allows the declaration of a pointer. Consider the following example:

byte* pData;

Assuming pData is not null, its value points to a location that contains one or more sequential bytes; the value of pData represents the memory address of the bytes. The type specified before the * is the referent type, or the type located where the value of the pointer refers. In this example, pData is the pointer and byte is the referent type, as shown in Figure 20.1.

Image

FIGURE 20.1: Pointers Contain the Address of the Data

Because pointers are simply integers that happen to refer to a memory address, they are not subject to garbage collection. C# does not allow referent types other than unmanaged types, which are types that are not reference types, are not generics, and do not contain reference types. Therefore, the following command is not valid:

string* pMessage;

Likewise, this command is not valid:

ServiceStatus* pStatus;

where ServiceStatus is defined as shown in Listing 20.12. The problem, once again, is that ServiceStatus includes a string field.


Language Contrast: C/C++—Pointer Declaration

In C/C++, multiple pointers within the same declaration are declared as follows:

int *p1, *p2;

Notice the * on p2; this makes p2 an int* rather than an int. In contrast, C# always places the * with the data type:

int* p1, p2;

The result is two variables of type int*. The syntax matches that of declaring multiple arrays in a single statement:

int[] array1, array2;

Pointers are an entirely new category of type. Unlike structs, enums, and classes, pointers don’t ultimately derive from System.Object and are not even convertible to System.Object. Instead, they are convertible (explicitly) to System.IntPtr (which can be converted toSystem.Object).


LISTING 20.12: Invalid Referent Type Example


struct ServiceStatus
{
int State;
string Description; // Description is a reference type
}


In addition to custom structs that contain only unmanaged types, valid referent types include enums, predefined value types (sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, and bool), and pointer types (such as byte**). Lastly, valid syntax includes void* pointers, which represent pointers to an unknown type.

Assigning a Pointer

Once code defines a pointer, it needs to assign a value before accessing it. Just like reference types, pointers can hold the value null; this is their default value. The value stored by the pointer is the address of a location. Therefore, to assign the pointer, you must first retrieve the address of the data.

You could explicitly cast an integer or a long into a pointer, but this rarely occurs without a means of determining the address of a particular data value at execution time. Instead, you need to use the address operator (&) to retrieve the address of the value type:

byte* pData = &bytes[0]; // Compile error

The problem is that in a managed environment, data can move, thereby invalidating the address. The error message is “You can only take the address of [an] unfixed expression inside a fixed statement initializer.” In this case, the byte referenced appears within an array, and an array is a reference type (a movable type). Reference types appear on the heap and are subject to garbage collection or relocation. A similar problem occurs when referring to a value type field on a movable type:

int* a = &"message".Length;

Either way, to assign an address of some data requires the following.

• The data must be classified as a variable.

• The data must be an unmanaged type.

• The variable needs to be classified as fixed, not movable.

If the data is an unmanaged variable type but is not fixed, use the fixed statement to fix a movable variable.

Fixing Data

To retrieve the address of a movable data item, it is necessary to fix, or pin, the data, as demonstrated in Listing 20.13.

LISTING 20.13: Fixed Statement


byte[] bytes = new byte[24];
fixed (byte* pData = &bytes[0]) // pData = bytes also allowed
{
// ...
}


Within the code block of a fixed statement, the assigned data will not move. In this example, bytes will remain at the same address, at least until the end of the fixed statement.

The fixed statement requires the declaration of the pointer variable within its scope. This avoids accessing the variable outside the fixed statement, when the data is no longer fixed. However, it is your responsibility as a programmer to ensure that you do not assign the pointer to another variable that survives beyond the scope of the fixed statement—possibly in an API call, for example. Unsafe code is called “unsafe” for a reason; you are required to ensure that you use the pointers safely rather than relying on the runtime to enforce safety on your behalf. Similarly, using refor out parameters will be problematic for data that will not survive beyond the method call.

Since a string is an invalid referent type, it would appear invalid to define pointers to strings. However, as in C++, internally a string is a pointer to the first character of an array of characters, and it is possible to declare pointers to characters using char*. Therefore, C# allows for declaring a pointer of type char* and assigning it to a string within a fixed statement. The fixed statement prevents the movement of the string during the life of the pointer. Similarly, it allows any movable type that supports an implicit conversion to a pointer of another type, given a fixed statement.

You can replace the verbose assignment of &bytes[0] with the abbreviated bytes, as shown in Listing 20.14.

LISTING 20.14: Fixed Statement without Address or Array Indexer


byte[] bytes = new byte[24];
fixed (byte* pData = bytes)
{
// ...
}


Depending on the frequency and time needed for their execution, fixed statements may have the potential to cause fragmentation in the heap because the garbage collector cannot compact fixed objects. To reduce this problem, the best practice is to pin blocks early in the execution and to pin fewer large blocks rather than many small blocks. Unfortunately, this preference has to be tempered with the practice of pinning as little as possible for as short a time as possible, so as to minimize the chance that a collection will happen during the time that the data is pinned. To some extent, .NET 2.0 reduces this problem, through its inclusion of some additional fragmentation-aware code.

It is possible that you might need to fix an object in place in one method body and have it remain fixed until another method is called; this is not possible with the fixed statement. If you are in this unfortunate situation, you can use methods on the GCHandle object to fix an object in place indefinitely. You should do so only if it is absolutely necessary, however; fixing an object for a long time makes it highly likely that the garbage collector will be unable to efficiently compact memory.

Allocating on the Stack

You should use the fixed statement on an array to prevent the garbage collector from moving the data. However, an alternative is to allocate the array on the call stack. Stack allocated data is not subject to garbage collection or to the finalizer patterns that accompany it. Like referent types, the requirement is that the stackalloc data is an array of unmanaged types. For example, instead of allocating an array of bytes on the heap, you can place it onto the call stack, as shown in Listing 20.15.

LISTING 20.15: Allocating Data on the Call Stack


byte* bytes = stackalloc byte[42];


Because the data type is an array of unmanaged types, the runtime can allocate a fixed buffer size for the array and then restore that buffer once the pointer goes out of scope. Specifically, it allocates sizeof(T) * E, where E is the array size and T is the referent type. Given the requirement of using stackalloc only on an array of unmanaged types, the runtime restores the buffer back to the system by simply unwinding the stack, thereby eliminating the complexities of iterating over the f-reachable queue (see, in Chapter 9, the section titled “Garbage Collection” and the discussion of finalization) and compacting reachable data. Therefore, there is no way to explicitly free stackalloc data.

The stack is a precious resource. Although it is small, running out of stack space will have a big effect—namely, the program will crash. For this reason, you should make every effort to avoid running out of stack space. If a program does run out of stack space, the best thing that can happen is for the program to shut down/crash immediately. Generally, programs have less than 1MB of stack space (possibly a lot less). Therefore, take great care to avoid allocating arbitrarily sized buffers on the stack.

Dereferencing a Pointer

Accessing the data stored in a variable of a type referred to by a pointer requires that you dereference the pointer, placing the indirection operator prior to the expression. byte data = *pData;, for example, dereferences the location of the byte referred to by pData and produces a variable of type byte. The variable provides read/write access to the single byte at that location.

Using this principle in unsafe code allows the unorthodox behavior of modifying the “immutable” string, as shown in Listing 20.16. In no way is this strategy recommended, even though it does expose the potential of low-level memory manipulation.

LISTING 20.16: Modifying an Immutable String


string text = "S5280ft";
Console.Write("{0} = ", text);
unsafe // Requires /unsafe switch.
{
fixed (char* pText = text)
{
char* p = pText;
*++p = 'm';
*++p = 'i';
*++p = 'l';
*++p = 'e';
*++p = ' ';
*++p = ' ';
}
}
Console.WriteLine(text);


The results of Listing 20.16 appear in Output 20.2.

OUTPUT 20.2

S5280ft = Smile

In this case, you take the original address and increment it by the size of the referent type (sizeof(char)), using the preincrement operator. Next, you dereference the address using the indirection operator and then assign the location with a different character. Similarly, using the + and– operators on a pointer changes the address by the * sizeof(T) operand, where T is the referent type.

Similarly, the comparison operators (==, !=, <, >, <=, and >=) work to compare pointers, translating effectively to the comparison of address location values.

One restriction on the dereferencing operator is the inability to dereference a void*. The void* data type represents a pointer to an unknown type. Since the data type is unknown, it can’t be dereferenced to produce a variable. Instead, to access the data referenced by a void*, you must convert it to any other pointer type and then dereference the later type.

You can achieve the same behavior as implemented in Listing 20.16 by using the index operator rather than the indirection operator (see Listing 20.17).

LISTING 20.17: Modifying an Immutable String with the Index Operator in Unsafe Code


string text;
text = "S5280ft";
Console.Write("{0} = ", text);

unsafe // Requires /unsafe switch.
{
fixed (char* pText = text)
{
pText[1] = 'm';
pText[2] = 'i';
pText[3] = 'l';
pText[4] = 'e';
pText[5] = ' ';
pText[6] = ' ';
}
}
Console.WriteLine(text);


The results of Listing 20.17 appear in Output 20.3.

OUTPUT 20.3

S5280ft = Smile

Modifications such as those in Listing 20.16 and Listing 20.17 can lead to unexpected behavior. For example, if you reassigned text to "S5280ft" following the Console.WriteLine() statement and then redisplayed text, the output would still be Smile because the address of two equal string literals is optimized to one string literal referenced by both variables. In spite of the apparent assignment

text = "S5280ft";

after the unsafe code in Listing 20.16, the internals of the string assignment are an address assignment of the modified "S5280ft" location, so text is never set to the intended value.

Accessing the Member of a Referent Type

Dereferencing a pointer produces a variable of the pointer’s underlying type. You can then access the members of the underlying type using the member access “dot” operator in the usual way. However, the rules of operator precedence require that *x.y means *(x.y), which is probably not what you intended. If x is a pointer, the correct code is (*x).y, which is an unpleasant syntax. To make it easier to access members of a dereferenced pointer, C# provides a special member access operator: x->y is a shorthand for (*x).y, as shown in Listing 20.18.

LISTING 20.18: Directly Accessing a Referent Type’s Members


unsafe
{
Angle angle = new Angle(30, 18, 0);
Angle* pAngle = ∠
System.Console.WriteLine("{0}° {1}' {2}\"",
pAngle->Hours, pAngle->Minutes, pAngle->Seconds);
}


The results of Listing 20.18 appear in Output 20.4.

OUTPUT 20.4

30° 18' 0

Executing Unsafe Code via a Delegate

As promised at the beginning of this chapter, we finish up with a full working example of what is likely the most “unsafe” thing you can do in C#: obtain a pointer to a block of memory, fill it with the bytes of machine code, make a delegate that refers to the new code, and execute it. This particular bit of assembly code determines whether the machine that is executing the code is a virtual machine or a real machine. If the machine is virtual, it outputs “Inside Matrix!” Listing 20.19 shows how to do it.


Beginner Topic: What Is a Virtual Computer?

A virtual computer (or virtual machine), also called a guest computer, is virtualized or emulated through software running on the host operating system and interacting with the host computer’s hardware. For example, virtual computer software (such as VMware Workstation and Microsoft Hyper-V) can be installed on a computer running a recent version of Windows. Once the software is installed, users can configure a guest computer within the software, boot it, and install an operating system as though it were a real computer, not just one virtualized with software.


LISTING 20.19: Designating a Block for Unsafe Code


using System.Runtime.InteropServices;

class Program
{
unsafe static int Main(string[] args)
{
// Assign redpill
byte[] redpill = {
0x0f, 0x01, 0x0d, // asm SIDT instruction
0x00, 0x00, 0x00, 0x00, // placeholder for an address
0xc3}; // asm return instruction

unsafe
{
fixed (byte* matrix = new byte[6],
redpillPtr = redpill)
{
// Move the address of matrix immediately
// following the SIDT instruction of memory.
*(uint*)&redpillPtr[3] = (uint)&matrix[0];

using (VirtualMemoryPtr codeBytesPtr =
new VirtualMemoryPtr(redpill.Length))
{
Marshal.Copy(
redpill, 0,
codeBytesPtr, redpill.Length);

MethodInvoker method =
(MethodInvoker)Marshal.GetDelegateForFunctionPointer(
codeBytesPtr, typeof(MethodInvoker));

method();
}
if (matrix[5] > 0xd0)
{
Console.WriteLine("Inside Matrix!\n");
return 1;
}
else
{
Console.WriteLine("Not in Matrix.\n");
return 0;
}
} // fixed
} // unsafe
}
}


The results of Listing 20.19 appear in Output 20.5.

OUTPUT 20.5

Inside Matrix!

Using the Windows Runtime Libraries from C#

Windows RT is the version of the Windows 8 operating system that supports only immersive “Metro-style” applications, not traditional “desktop” applications. The library of operating system APIs that support immersive applications is the Windows Runtime, or WinRT for short.

Although WinRT APIs are fundamentally unmanaged COM APIs, they are described using the same metadata format that the .NET runtime uses; thus WinRT supports development of immersive Windows applications written not only in unmanaged languages, but also in managed languages such as C#, without using the P/Invoke tricks described in the remainder of this chapter.

The WinRT APIs have been carefully designed to seem natural to C# users. However, there are a few small “impedance mismatches” that you should be aware of when writing C# programs that target WinRT.

WinRT Events with Custom Add and Remove Handlers

There are many different ways to implement the “observer” pattern. In C#, as we have already discussed, events are typically implemented as a field of multicast delegate type. That is, a field of delegate type is declared, and that delegate can refer to many different methods. When the event is fired, the delegate methods are invoked. To add an event handler to or remove an event handler from the event, you essentially create a new multicast delegate and replace the value of the field with the new delegate.

All of those mechanisms are implemented for you automatically when you use the += and -= operators on an event. C# also allows you to run custom code when the user of your class adds or removes an event handler, via the add and remove event accessor methods.

From the consumer’s perspective, WinRT events are no different. You can use += and -= as usual in a C# program when adding or removing event handlers from a WinRT object; the C# compiler will take care of ensuring that the appropriate WinRT mechanisms are used when the code is generated. However, WinRT uses a slightly different mechanism than traditional C# programs for custom event accessors, which in turn affects how you write custom event accessors for WinRT types in C#.

In a regular C# event, when you remove a delegate from an event, the delegate is passed as the hidden value argument of the remove accessor. Neither the add nor the remove accessor returns a value. WinRT events with custom accessors use a slightly different mechanism: When you add a delegate to an event, the add accessor returns a “token.” To remove that delegate from the event, you pass the token—not the delegate—to the remove accessor. Should you wish to write a custom accessor for a WinRT event, you must follow the WinRT event pattern.

Fortunately, the WinRT library provides a special helper class to keep track of the tokens and their corresponding delegates for you. The pattern looks like the code shown in Listing 20.20.

LISTING 20.20: The WinRT Event Pattern


using System;
class WinRTEvent
{
EventRegistrationTokenTable<EventHandler> table = null;
public event EventHandler MyEvent
{
add
{
return EventRegistrationTokenTable<EventHandler>
.GetOrCreateEventRegistrationTokenTable(ref table)
.AddEventHandler(value);
}
remove
{
return EventRegistrationTokenTable<EventHandler>
.GetOrCreateEventRegistrationTokenTable(ref table)
.RemoveEventHandler(value);
}
}
void OnMyEvent()
{
EventHandler handler =
EventRegistrationTokenTable<EventHandler>
.GetOrCreateEventRegistrationTokenTable(ref table)
.InvocationList;
if (handler != null)
handler(this, new EventArgs());
}
}


As you can see, every time a handler is added to the event, removed from the event, or invoked, a table is created if one does not exist already. (There should be one table variable per event.) The table manages the relationship between the token returned from the adder and the multicast delegate in the table. Just replace the EventHandler type with the appropriate delegate type for your event, and add whatever code you want to the add and remove accessors.

Automatically Shimmed Interfaces

Another difference between WinRT invocation and regular .NET invocation code is that certain frequently used interfaces have slightly different names and members in WinRT. The C# compiler and .NET runtime know about these differences, and automatically generate code behind thescenes that “shims” one interface to another so as to minimize the impact on the developer. The two most notable examples are IEnumerable<T>, which is called IIterable<T> in WinRT, and IDisposable, which is called ICloseable in WinRT.

Because these interfaces are automatically shimmed, you can use a method that returns ICloseable in any context that requires an IDisposable, such as a using statement. Similarly, sequences and collections behave the same regardless of whether they use the C# standard interface or the WinRT version.

Task-Based Asynchrony

The WinRT APIs do not use Task<T> to represent asynchronous work. (See Chapter 18 for a detailed explanation of how to use Task<T> and the C# 5 await operator.) Rather, they use the IAsyncAction<T> interface. This type has many of the same features as Task<T>; for example, it supports a cancellation mechanism, a progress-reporting mechanism, and so on.

The C# 5 await operator works just as well with an operand of type IAsyncAction<T> as it does with Task<T>. However, a C# 5 method decorated with the async keyword that contains an await operator still must return Task or Task<T>, or be void-returning; it is not legal for an async method to return IAsyncAction<T>. To convert an IAsyncAction<T> to an equivalent Task<T>, just call the AsTask() method on it.

The vast majority of other issues related to WinRT are essentially API changes, and a detailed discussion of them is beyond the scope of this book. It is important to note, however, that in WinRT all high-latency synchronous methods previously available in .NET 4.5 and earlier have been dropped, leaving only the *Async asynchronous equivalents.

Summary

As demonstrated throughout this book, C# offers great power, flexibility, consistency, and a fantastic structure. This chapter highlighted the ability of C# programs to perform very low-level machine-code operations.

Before we end the book, Chapter 21 briefly describes the underlying execution platform and shifts the focus from the C# language to the broader platform in which C# programs execute.