Introducing C# - Essential C# 6.0 (2016)

Essential C# 6.0 (2016)

1. Introducing C#

C# is now a well-established language that builds on features found in its predecessor C-style languages (C, C++, and Java), making it immediately familiar to many experienced programmers.1 Part of a larger, more complex open source execution platform called the Common Language Infrastructure (CLI), C# is a programming language for building software components and applications.

1. The first C# design meeting took place in 1998.

Image

This chapter introduces C# using the traditional HelloWorld program. The chapter focuses on C# syntax fundamentals, including defining an entry point into the C# program. This will familiarize you with the C# syntax style and structure, and it will enable you to produce the simplest of C# programs. Prior to the discussion of C# syntax fundamentals is a summary of managed execution context, which explains how a C# program executes at runtime. This chapter ends with a discussion of variable declaration, writing and retrieving data from the console, and the basics of commenting code in C#.

Hello, World

The best way to learn a new programming language is to write code. The first example is the classic HelloWorld program. In this program, you will display some text to the screen.

Listing 1.1 shows the complete HelloWorld program; in the following sections, you will compile the code.

LISTING 1.1: HelloWorld in C#2


class HelloWorld
{
static void Main()
{
System.Console.WriteLine("Hello. My name is Inigo Montoya.");
}
}


2. Refer to the movie The Princess Bride if you’re confused about the Inigo Montoya references.


Note

C# is a case-sensitive language: Incorrect case prevents the code from compiling successfully.


Those experienced in programming with Java, C, or C++ will immediately see similarities. Like Java, C# inherits its basic syntax from C and C++.3 Syntactic punctuation (such as semicolons and curly braces), features (such as case sensitivity), and keywords (such as class, public, and void) are familiar to programmers experienced in these languages. Beginners and programmers from other languages will quickly find these constructs intuitive.

3. When creating C#, the language creators reviewed the specifications for C/C++, literally crossing out the features they didn’t like and creating a list of the ones they did like. The group also included designers with strong backgrounds in other languages.

Compiling and Running the Application

The C# compiler allows any file extension for files containing C# source code, but .cs is typically used. After saving the source code to a file, developers must compile it. (Appendix A provides instructions for installing the compiler.) Because the mechanics of the command are not part of the C# standard, the compilation command varies depending on the C# compiler implementation.

If you place Listing 1.1 into a file called HelloWorld.cs, the compilation command in Output 1.1 will work with the Microsoft .NET compiler (assuming appropriate paths to the compiler are set up).4

4. Compilation is also possible using .NET Core—a cross platform implementation of .NET available from http://dotnet.github.io/core. Although I would very much have liked to place instructions for other platforms here, doing so detracts from the topic of introducing C#. Instead, see Appendix A for details on .NET Core or from http://itl.tc/GettingStartedWithDNX.

OUTPUT 1.1

>csc.exe HelloWorld.cs
Microsoft (R) Visual C# Compiler version 1.0.0.50618
Copyright (C) Microsoft Corporation. All rights reserved.

The exact output will vary depending on which version of the compiler you use.

Running the resultant program, HelloWorld.exe, displays the message shown in Output 1.2.

OUTPUT 1.2

>HelloWorld.exe
Hello. My name is Inigo Montoya.

The program created by the C# compiler, HelloWorld.exe, is an assembly. Instead of creating an entire program that can be executed independently, developers can create a library of code that can be referenced by another, larger program. Libraries (or class libraries) use the filenameextension .dll, which stands for Dynamic Link Library (DLL). A library is also an assembly. In other words, the output from a successful C# compile is an assembly regardless of whether it is a program or a library.

Begin 2.0


Language Contrast: Java—Filename Must Match Class Name

In Java, the filename must follow the name of the class. In C#, this convention is frequently followed but is not required. In C#, it is possible to have two classes in one file, and starting with C# 2.0, it’s possible to have a single class span multiple files with a feature called a partial class.


End 2.0

C# Syntax Fundamentals

Once you successfully compile and run the HelloWorld program, you are ready to start dissecting the code to learn its individual parts. First, consider the C# keywords along with the identifiers that the developer chooses.


Beginner Topic: Keywords

To enable the compiler to interpret the code, certain words within C# have special status and meaning. Known as keywords, they provide the concrete syntax that the compiler uses to interpret the expressions the programmer writes. In the HelloWorld program, class, static, and void are examples of keywords.

The compiler uses the keywords to identify the structure and organization of the code. Because the compiler interprets these words with elevated significance, C# requires that developers place keywords only in certain locations. When programmers violate these rules, the compiler will issue errors.


C# Keywords

Table 1.1 shows the C# keywords.

Image

Image

TABLE 1.1: C# Keywords

Begin 2.0

After C# 1.0, no new reserved keywords were introduced to C#. However, some constructs in later versions use contextual keywords, which are significant only in specific locations. Outside these designated locations, contextual keywords have no special significance.5 By this method, most C# 1.0 code is compatible with the later standards.6

5. For example, early in the design of C# 2.0, the language designers designated yield as a keyword, and Microsoft released alpha versions of the C# 2.0 compiler, with yield as a designated keyword, to thousands of developers. However, the language designers eventually determined that by using yield return rather than yield, they could ultimately avoid adding yield as a keyword because it would have no special significance outside its proximity to return.

6. There are some rare and unfortunate incompatibilities, such as the following:

• C# 2.0 requiring implementation of IDisposable with the using statement, rather than simply a Dispose() method

• Some rare generic expressions such as F(G<A,B>(7)) means F( (G<A), (B>7) ) in C# 1.0 will, in C# 2.0, instead mean to call generic method G<A,B> with argument 7 and pass the result to F

End 2.0

Identifiers

Like other languages, C# includes identifiers to identify constructs that the programmer codes. In Listing 1.1, HelloWorld and Main are examples of identifiers. The identifiers assigned to a construct are used to refer back to the construct later, so it is important that the names the developer assigns are meaningful rather than arbitrary.

A keen ability to select succinct and indicative names is an important characteristic of a strong programmer because it means the resultant code will be easier to understand and reuse. Clarity coupled with consistency is important enough that the .NET Framework Guidelines (http://bit.ly/dotnetguidelines) advise against the use of abbreviations or contractions in identifier names and even recommend avoiding acronyms that are not widely accepted. If an acronym is sufficiently well established (HTML, for example), you should use it consistently: Avoid spelling out the accepted acronym sometimes but not others. Generally, adding the constraint that all acronyms be included in a glossary of terms places enough overhead on the use of acronyms such that they are not used flippantly. Ultimately, select clear, possibly even verbose names—especially when working on a team or when developing a library against which others will program.

There are two basic casing formats for an identifier. Pascal case (henceforth PascalCase), as the CLI creators refer to it because of its popularity in the Pascal programming language, capitalizes the first letter of each word in an identifier name; examples include ComponentModel,Configuration, and HttpFileCollection. As HttpFileCollection demonstrates with HTTP, when using acronyms that are more than two letters long only the first letter is capitalized. The second format, camel case (henceforth camelCase), follows the same convention, except that the first letter is lowercase; examples include quotient, firstName, httpFileCollection, ioStream, and theDreadPirateRoberts.


Guidelines

DO favor clarity over brevity when naming identifiers.

DO NOT use abbreviations or contractions within identifier names.

DO NOT use any acronyms unless they are widely accepted, and even then, only when necessary.


Notice that although underscores are legal, generally there are no underscores, hyphens, or other nonalphanumeric characters in identifier names. Furthermore, C# doesn’t follow its predecessors in that Hungarian notation (prefixing a name with a data type abbreviation) is not used. This avoids the variable rename that is necessary when data types change or the inconsistency introduced due to failure to adjust the data type prefix when using Hungarian notation.

In some rare cases, some identifiers, such as Main, can have a special meaning in the C# language.


Guidelines

DO capitalize both characters in two-character acronyms, except for the first word of a camelCased identifier.

DO capitalize only the first character in acronyms with three or more characters, except for the first word of a camelCased identifier.

DO NOT capitalize any of the characters in acronyms at the beginning of a camelCased identifier.

DO NOT use Hungarian notation (that is, do not encode the type of a variable in its name).



Advanced Topic: Keywords

Although it is rare, keywords may be used as identifiers if they include “@” as a prefix. For example, you could name a local variable @return. Similarly (although it doesn’t conform to the casing standards of C# coding standards), it is possible to name a method @throw().

There are also four undocumented reserved keywords in the Microsoft implementation: __arglist, __makeref, __reftype, and __refvalue. These are required only in rare interop scenarios and you can ignore them for all practical purposes. Note that these four special keywords begin with two underscores. The designers of C# reserve the right to make any identifier that begins with two underscores into a keyword in a future version; for safety, avoid ever creating such an identifier yourself.


Type Definition

All executable code in C# appears within a type definition, and the most common type definition begins with the keyword class. A class definition is the section of code that generally begins with class identifier { ... }, as shown in Listing 1.2.

LISTING 1.2: Basic Class Declaration


class HelloWorld
{
//...
}


The name used for the type (in this case, HelloWorld) can vary, but by convention, it must be PascalCased. For this particular example, therefore, other possible names are Greetings, HelloInigoMontoya, Hello, or simply Program. (Program is a good convention to follow when the class contains the Main() method, described next.)


Guidelines

DO name classes with nouns or noun phrases.

DO use PascalCasing for all class names.


Generally, programs contain multiple types, each containing multiple methods.

Main


Beginner Topic: What Is a Method?

Syntactically, a method in C# is a named block of code introduced by a method declaration (for example, static void Main()) and (usually) followed by zero or more statements within curly braces. Methods perform computations and/or actions. Similar to paragraphs in written languages, methods provide a means of structuring and organizing code so that it is more readable. More importantly, methods can be reused and called from multiple places, and so avoid the need to duplicate code. The method declaration introduces the method and defines the method name along with the data passed to and from the method. In Listing 1.3, Main() followed by { ... } is an example of a C# method.


The location where C# programs begin execution is the Main method, which begins with static void Main(). When you execute the program by typing HelloWorld.exe at the command console, the program starts up, resolves the location of Main, and begins executing the first statement within Listing 1.3.

LISTING 1.3: Breaking Apart HelloWorld

Image

Although the Main method declaration can vary to some degree, static and the method name, Main, are always required for a program.



Advanced Topic: Declaration of the Main Method

C# requires that the Main method return either void or int, and that it take either no parameters or a single array of strings. Listing 1.4 shows the full declaration of the Main method.

LISTING 1.4: The Main Method, with Parameters and a Return


static int Main(string[] args)
{
//...
}


The args parameter is an array of strings corresponding to the command-line arguments. However, the first element of the array is not the program name but the first command-line parameter to appear after the executable name, unlike in C and C++. To retrieve the full command used to execute the program use System.Environment.CommandLine.

The int returned from Main is the status code and it indicates the success of the program’s execution. A return of a nonzero value generally indicates an error.



Language Contrast: C++/Java—main() Is All Lowercase

Unlike its C-style predecessors, C# uses an uppercase M for the Main method to be consistent with the PascalCased naming conventions of C#.


The designation of the Main method as static indicates that other methods may call it directly off the class definition. Without the static designation, the command console that started the program would need to perform additional work (known as instantiation) before calling the method. (Chapter 5 contains an entire section devoted to the topic of static members.)

Placing void prior to Main() indicates that this method does not return any data. (This is explained further in Chapter 2.)

One distinctive C/C++ style characteristic followed by C# is the use of curly braces for the body of a construct, such as the class or the method. For example, the Main method contains curly braces that surround its implementation; in this case, only one statement appears in the method.

Statements and Statement Delimiters

The Main method includes a single statement, System.Console.WriteLine(), which is used to write a line of text to the console. C# generally uses a semicolon to indicate the end of a statement, where a statement comprises one or more actions that the code will perform. Declaring a variable, controlling the program flow, and calling a method are typical uses of statements.


Language Contrast: Visual Basic—Line-Based Statements

Some languages are line based, meaning that without a special annotation, statements cannot span a line. Until Visual Basic 2010, Visual Basic was an example of a line-based language. It required an underscore at the end of a line to indicate that a statement spans multiple lines. Starting with Visual Basic 2010, many cases were introduced where the line continuation character was optional.



Advanced Topic: Statements without Semicolons

Many programming elements in C# end with a semicolon. One example that does not include the semicolon is a switch statement. Because curly braces are always included in a switch statement, C# does not require a semicolon following the statement. In fact, code blocks themselves are considered statements (they are also composed of statements) and they don’t require closure using a semicolon. Similarly, there are cases, such as the using declarative, in which a semicolon occurs at the end but it is not a statement.


Since creation of a new line does not separate statements, you can place multiple statements on the same line and the C# compiler will interpret the line as having multiple instructions. For example, Listing 1.5 contains two statements on a single line that, in combination, display Up andDown on two separate lines.

LISTING 1.5: Multiple Statements on One Line


System.Console.WriteLine("Up");System.Console.WriteLine("Down");


C# also allows the splitting of a statement across multiple lines. Again, the C# compiler looks for a semicolon to indicate the end of a statement (see Listing 1.6).

LISTING 1.6: Splitting a Single Statement across Multiple Lines


System.Console.WriteLine(
"Hello. My name is Inigo Montoya.");


In Listing 1.6, the original WriteLine() statement from the HelloWorld program is split across multiple lines.


Beginner Topic: What Is Whitespace?

Whitespace is the combination of one or more consecutive formatting characters such as tab, space, and newline characters. Eliminating all whitespace between words is obviously significant, as is including whitespace within a quoted string.


Whitespace

The semicolon makes it possible for the C# compiler to ignore whitespace in code. Apart from a few exceptions, C# allows developers to insert whitespace throughout the code without altering its semantic meaning. In Listing 1.5 and Listing 1.6, it didn’t matter whether a newline was inserted within a statement or between statements, and doing so had no effect on the resultant executable created by the compiler.

Frequently, programmers use whitespace to indent code for greater readability. Consider the two variations on HelloWorld shown in Listing 1.7 and Listing 1.8.

LISTING 1.7: No Indentation Formatting


class HelloWorld
{
static void Main()
{
System.Console.WriteLine("Hello Inigo Montoya");
}
}


LISTING 1.8: Removing Whitespace


class HelloWorld{static void Main()
{System.Console.WriteLine("Hello Inigo Montoya");}}


Although these two examples look significantly different from the original program, the C# compiler sees them as identical.


Beginner Topic: Formatting Code with Whitespace

Indenting the code using whitespace is important for greater readability. As you begin writing code, you need to follow established coding standards and conventions to enhance code readability.

The convention used in this book is to place curly braces on their own line and to indent the code contained between the curly brace pair. If another curly brace pair appears within the first pair, all the code within the second set of braces is also indented.

This is not a uniform C# standard, but a stylistic preference.


Working with Variables

Now that you’ve been introduced to the most basic C# program, it’s time to declare a local variable. Once a variable is declared, you can assign it a value, replace that value with a new value, and use it in calculations, output, and so on. However, you cannot change the data type of the variable. In Listing 1.9, string max is a variable declaration.

LISTING 1.9: Declaring and Assigning a Variable

Image


Beginner Topic: Local Variables

A variable is a name that refers to a value that can change over time. Local indicates that the programmer declared the variable within a method.

To declare a variable is to define it, which you do by

1. Specifying the type of data which the variable will contain

2. Assigning it an identifier (name)


Data Types

Listing 1.9 declares a variable with the data type string. Other common data types used in this chapter are int and char.

• int is the C# designation of an integer type that is 32 bits in size.

• char is used for a character type. It is 16 bits, large enough for (nonsurrogate) Unicode characters.

The next chapter looks at these and other common data types in more detail.


Beginner Topic: What Is a Data Type?

The type of data that a variable declaration specifies is called a data type (or object type). A data type, or simply type, is a classification of things that share similar characteristics and behavior. For example, animal is a type. It classifies all things (monkeys, warthogs, and platypuses) that have animal characteristics (multicellular, capacity for locomotion, and so on). Similarly, in programming languages, a type is a definition for several items endowed with similar qualities.


Declaring a Variable

In Listing 1.9, string max is a variable declaration of a string type whose name is max. It is possible to declare multiple variables within the same statement by specifying the data type once and separating each identifier with a comma. Listing 1.10 demonstrates such a declaration.

LISTING 1.10: Declaring Two Variables within One Statement


string message1, message2;


Because a multivariable declaration statement allows developers to provide the data type only once within a declaration, all variables will be of the same type.

In C#, the name of the variable may begin with any letter or an underscore (_), followed by any number of letters, numbers, and/or underscores. By convention, however, local variable names are camelCased (the first letter in each word is capitalized, except for the first word) and do not include underscores.


Guidelines

DO use camelCasing for local variable names.


Assigning a Variable

After declaring a local variable, you must assign it a value before reading from it. One way to do this is to use the = operator, also known as the simple assignment operator. Operators are symbols used to identify the function the code is to perform. Listing 1.11 demonstrates how to use the assignment operator to designate the string values to which the variables miracleMax and valerie will point.

LISTING 1.11: Changing the Value of a Variable


class StormingTheCastle
{
static void Main()
{
string valerie;
string miracleMax = "Have fun storming the castle!";

valerie = "Think it will work?";

System.Console.WriteLine(miracleMax);
System.Console.WriteLine(valerie);

max = "It would take a miracle.";
System.Console.WriteLine(miracleMax);
}
}


From this listing, observe that it is possible to assign a variable as part of the variable declaration (as it was for miracleMax), or afterward in a separate statement (as with the variable valerie). The value assigned must always be on the right side.

Running the compiled StormingTheCastle.exe program produces the code shown in Output 1.3.

OUTPUT 1.3

>StormingTheCastle.exe
Have fun storming the castle!
Think it will work?
It would take a miracle.

C# requires that local variables be determined by the compiler to be “definitely assigned” before they are read. Additionally, an assignment returns a value. Therefore, C# allows two assignments within the same statement, as demonstrated in Listing 1.12.

LISTING 1.12: Assignment Returning a Value That Can Be Assigned Again


class StormingTheCastle
{
static void Main()
{
// ...
string requirements, miracleMax;
requirements = miracleMax = "It would take a miracle.";
// ...
}
}


Using a Variable

The result of the assignment, of course, is that you can then refer to the value using the variable identifier. Therefore, when you use the variable miracleMax within the System.Console.WriteLine(miracleMax) statement, the program displays Have fun storming the castle!, the value of miracleMax, on the console. Changing the value of miracleMax and executing the same System.Console.WriteLine(miracleMax) statement causes the new miracleMax value, “It would take a miracle.” to be displayed.


Advanced Topic: Strings Are Immutable

All data of type string, whether string literals or otherwise, is immutable (or unmodifiable). For example, it is not possible to change the string “Come As You Are” to “Come As You Age.” A change such as this requires that you reassign the variable to refer to a new location in memory, instead of modifying the data to which the variable originally referred.


Console Input and Output

This chapter already used System.Console.WriteLine repeatedly for writing out text to the command console. In addition to being able to write out data, a program needs to be able to accept data that a user may enter.

Getting Input from the Console

One way to retrieve text that is entered at the console is to use System.Console.ReadLine(). This method stops the program execution so that the user can enter characters. When the user presses the Enter key, creating a newline, the program continues. The output, also known as thereturn, from the System.Console.ReadLine() method is the string of text that was entered. Consider Listing 1.13 and the corresponding output shown in Output 1.4.

LISTING 1.13: Using System.Console.ReadLine()


class HeyYou
{
static void Main()
{
string firstName;
string lastName;

System.Console.WriteLine("Hey you!");

System.Console.Write("Enter your first name: ");
firstName = System.Console.ReadLine();

System.Console.Write("Enter your last name: ");
lastName = System.Console.ReadLine();
}
}


OUTPUT 1.4

>HeyYou.exe
Hey you!
Enter your first name: Inigo
Enter your last name: Montoya

After each prompt, this program uses the System.Console.ReadLine() method to retrieve the text the user entered and assign it to an appropriate variable. By the time the second System.Console.ReadLine() assignment completes, firstName refers to the value Inigoand lastName refers to the value Montoya.


Advanced Topic: System.Console.Read()

In addition to the System.Console.ReadLine() method, there is a System.Console.Read() method. However, the data type returned by the System.Console.Read() method is an integer corresponding to the character value read, or –1 if no more characters are available. To retrieve the actual character, it is necessary to first cast the integer to a character, as shown in Listing 1.14.

LISTING 1.14: Using System.Console.Read()


int readValue;
char character;
readValue = System.Console.Read();
character = (char) readValue;
System.Console.Write(character);


The System.Console.Read() method does not return the input until the user presses the Enter key; no processing of characters will begin, even if the user types multiple characters before pressing the Enter key.


Begin 2.0

In C# 2.0 and above, you can use System.Console.ReadKey(), which, in contrast to System.Console.Read(), returns the input after a single keystroke. It allows the developer to intercept the keystroke and perform actions such as key validation, restricting the characters to numerics.

End 2.0

Writing Output to the Console

In Listing 1.13, you prompt the user for his first and last names using the method System.Console.Write() rather than System.Console.WriteLine(). Instead of placing a newline character after displaying the text, the System.Console.Write() method leaves the current position on the same line. In this way, any text the user enters will be on the same line as the prompt for input. The output from Listing 1.13 demonstrates the effect of System.Console.Write().

Begin 6.0

The next step is to write the values retrieved using System.Console.ReadLine() back to the console. In the case of Listing 1.15, the program writes out the user’s full name. However, instead of using System.Console.WriteLine() as before, this code will use a slight variation. Output 1.5 shows the corresponding output.

LISTING 1.15: Formatting Using String Interpolation


class HeyYou
{
static void Main()
{
string firstName;
string lastName;

System.Console.WriteLine("Hey you!");

System.Console.Write("Enter your first name: ");
firstName = System.Console.ReadLine();

System.Console.Write("Enter your last name: ");
lastName = System.Console.ReadLine();

System.Console.WriteLine(
$"Your full name is { firstName } { lastName }.");
}
}


OUTPUT 1.5

Hey you!
Enter your first name: Inigo
Enter your last name: Montoya
Your full name is Inigo Montoya.

Instead of writing out “Your full name is” followed by another Write statement for firstName, a third Write statement for the space, and finally a WriteLine statement for lastName, Listing 1.15 writes out the entire output using C# 6.0’s string interpolation. With string interpolation, the compiler interprets the interior of the curly brackets within the string as regions in which you can embed code (expressions) that the compiler will evaluate and convert to strings. Rather than executing lots of code snippets individually and combining the results as a string at the end, string interpolation allows you to do this in a single step. This makes the code easier to understand.

End 6.0

Prior to C# 6.0, C# used a different approach, that of composite formatting. With composite formatting, the code first supplies a format string to define the output format—see Listing 1.16.

LISTING 1.16: Formatting Using System.Console.WriteLine()’s Composite Formatting


class HeyYou
{
static void Main()
{
string firstName;
string lastName;

System.Console.WriteLine("Hey you!");

System.Console.Write("Enter your first name: ");
firstName = System.Console.ReadLine();

System.Console.Write("Enter your last name: ");
lastName = System.Console.ReadLine();

System.Console.WriteLine(
"Your full name is {0} {1}.", firstName, lastName);
}
}


In this example, the format string is "Your full name is {0} {1}.". It identifies two indexed placeholders for data insertion in the string. Each placeholder corresponds the order of the arguments that appears after the format string.

Note that the index value begins at zero. Each inserted argument (known as a format item) appears after the format string in the order corresponding to the index value. In this example, since firstName is the first argument to follow immediately after the format string, it corresponds to index value 0. Similarly, lastName corresponds to index value 1.

Note that the placeholders within the format string need not appear in order. For example, Listing 1.17 switches the order of the indexed placeholders and adds a comma, which changes the way the name is displayed (see Output 1.6).

LISTING 1.17: Swapping the Indexed Placeholders and Corresponding Variables


System.Console.WriteLine("Your full name is {1}, {0}",
firstName, lastName);


OUTPUT 1.6

Hey you!
Enter your first name: Inigo
Enter your last name: Montoya
Your full name is Montoya, Inigo

In addition to not having the placeholders appear consecutively within the format string, it is possible to use the same placeholder multiple times within a format string. Furthermore, it is possible to omit a placeholder. It is not possible, however, to have placeholders that do not have a corresponding argument.

Since C# 6.0-style string interpolation is almost always easier to understand than the alternative composite string approach, throughout the remainder of the book we will use string interpolation by default.

Comments

In this section, we modify the program in Listing 1.15 by adding comments. In no way does this change the execution of the program; rather, providing comments within the code can simply make the code more understandable in areas where it isn’t inherently. Listing 1.18 shows the new code, and Output 1.7 shows the corresponding output.

LISTING 1.18: Commenting Your Code

Image

OUTPUT 1.7

Hey you!
Enter your first name: Inigo
Enter your last name: Montoya
Your full name is Inigo Montoya.

In spite of the inserted comments, compiling and executing the new program produces the same output as before.

Programmers use comments to describe and explain the code they are writing, especially where the syntax itself is difficult to understand, or perhaps a particular algorithm implementation is surprising. Since comments are pertinent only to the programmer reviewing the code, the compiler ignores comments and generates an assembly that is devoid of any trace that comments were part of the original source code.

Table 1.2 shows four different C# comment types. The program in Listing 1.18 includes two of these.

Begin 2.0

Image

TABLE 1.2: C# Comment Types

End 2.0

A more comprehensive discussion of the XML comments appears in Chapter 9, where we further discuss the various XML tags.

There was a period in programming history when a prolific set of comments implied a disciplined and experienced programmer. This is no longer the case. Instead, code that is readable without comments is more valuable than that which requires comments to clarify what it does. If developers find it necessary to enter comments to clarify what a particular code block is doing, they should favor rewriting the code more clearly over commenting it. Writing comments that simply repeat what the code clearly shows serves only to clutter the code, decrease its readability, and increase the likelihood of the comments going out of date because the code changes without the comments getting updated.


Guidelines

DO NOT use comments unless they describe something that is not obvious to someone other than the developer who wrote the code.

DO favor writing clearer code over entering comments to clarify a complicated algorithm.



Beginner Topic: Extensible Markup Language

The Extensible Markup Language (XML) is a simple and flexible text format frequently used within Web applications and for exchanging data between applications. XML is extensible because included within an XML document is information that describes the data, known asmetadata. Here is a sample XML file.

<?xml version="1.0" encoding="utf-8" ?>
<body>
<book title="Essential C# 6.0">
<chapters>
<chapter title="Introducing C#"/>
<chapter title="Operators and Control Flow"/>
...
</chapters>
</book>
</body>

The file starts with a header indicating the version and character encoding of the XML file. After that appears one main “book” element. Elements begin with a word in angle brackets, such as <body>. To end an element, place the same word in angle brackets and add a forward slash to prefix the word, as in </body>. In addition to elements, XML supports attributes. title="Essential C# 6.0" is an example of an XML attribute. Note that the metadata (book title, chapter, and so on) describing the data (“Essential C# 6.0”, “Operators and Control Flow”) is included in the XML file. This can result in rather bloated files, but it offers the advantage that the data includes a description to aid in interpreting it.


Application Programming Interface

All the methods (or more generically, the members) found on a data type like System.Console are what define the System.Console’s application programming interface (API). The API defines how a software program interacts with a component. As such, it is found not just with a single data type, but more generically the combination of all the APIs for a set of data types are said to create an API for the collective set of components. In .NET, for example, all the types (and the members within those types) in an assembly are said to form the assembly’s API. Likewise, given a combination of assemblies, like those found in the .NET Framework, the collective group of assemblies form a larger API. Often, this larger group of APIs are referred to as the framework—hence the term .NET Framework in reference to the APIs exposed by all the assemblies included with .NET. Generically, the API comprises the set of interfaces and protocols (or instructions) for programming against a set of components. In fact, with .NET, the protocols themselves are the rules for how .NET assemblies execute.

Managed Execution and the Common Language Infrastructure

The processor cannot directly interpret an assembly. Assemblies consist mainly of a second language known as Common Intermediate Language (CIL), or IL for short.7 The C# compiler transforms the C# source file into this intermediate language. An additional step, usually performed at execution time, is required to change the CIL code into machine code that the processor can understand. This involves an important element in the execution of a C# program: the Virtual Execution System (VES). The VES, also casually referred to as the runtime, compiles CIL code as needed (a process known as just-in-time compilation or jitting). The code that executes under the context of an agent such as the runtime is termed managed code, and the process of executing under control of the runtime is called managed execution. The code is “managed” because the runtime controls significant portions of the program’s behavior by managing aspects such as memory allocation, security, and just-in-time compilation. Code that does not require the runtime to execute is called native code (or unmanaged code).

7. A third term for CIL is Microsoft IL (MSIL). This book uses the term CIL because it is the term adopted by the CLI standard. IL is prevalent in conversation among people writing C# code because they assume that IL refers to CIL rather than other types of intermediate languages.

The specification for a VES is included in a broader specification known as the Common Language Infrastructure (CLI) specification.8 An international standard, the CLI includes specifications for the following:

8. Miller, J., and S. Ragsdale. 2004. The Common Language Infrastructure Annotated Standard. Boston: Addison-Wesley.

• The VES or runtime

• The CIL

• A type system that supports language interoperability, known as the Common Type System (CTS)

• Guidance on how to write libraries that are accessible from CLI-compatible languages (available in the Common Language Specification [CLS])

• Metadata that enables many of the services identified by the CLI (including specifications for the layout or file format of assemblies)

• A common programming framework, the Base Class Library (BCL), which developers in all languages can utilize


Note

The term runtime can refer to either execution time or the Virtual Execution System. To help clarify the intended meaning, this book uses the term execution time to indicate when the program is executing, and it uses the term runtime when discussing the agent responsible for managing the execution of a C# program while it executes.


Running within the context of a CLI implementation enables support for a number of services and features that programmers do not need to code for directly, including the following:

Language interoperability: interoperability between different source languages. This is possible because the language compilers translate each source language to the same intermediate language (CIL).

Type safety: checks for conversion between types, ensuring that only conversions between compatible types will occur. This helps prevent the occurrence of buffer overruns, a leading cause of security vulnerabilities.

Code access security: certification that the assembly developer’s code has permission to execute on the computer.

Garbage collection: memory management that automatically de-allocates memory previously allocated by the runtime.

Platform portability: support for potentially running the same assembly on a variety of operating systems. One obvious restriction is that no platform-dependent libraries are used; therefore, as with Java, there are potentially some platform-dependent idiosyncrasies that need to be worked out.

BCL: provides a large foundation of code that developers can depend on (in all CLI implementations) so that they do not have to develop the code themselves.


Note

This section gives a brief synopsis of the CLI to familiarize you with the context in which a C# program executes. It also provides a summary of some of the terms that appear throughout this book. Chapter 21 is devoted to the topic of the CLI and its relevance to C# developers. Although the chapter appears last in the book, it does not depend on any earlier chapters, so if you want to become more familiar with the CLI, you can jump to it at any time.


Begin 3.0

C# and .NET Versioning

Microsoft assigns inconsistent version numbers to the .NET Framework and the corresponding version of the C# language, simply because different teams had different versioning mechanisms. This means that if you compile with the C# 5.0 compiler, it will, by default compile against the “.NET Framework version 4.6,” for example. Table 1.3 is a brief overview of the C# and .NET releases.

Image

TABLE 1.3: C# and .NET Versions

Most of the code within this text will work with platforms other than Microsoft’s, as long as the compiler version corresponds to the version of code required. Although providing full details on each C# platform would be helpful for some readers, it can also detract from the focus of learning C#, so the main body of this text is restricted to information on Microsoft’s platform, .NET. This choice was made simply because Microsoft has the predominant (by far) implementation. Furthermore, translation to another platform is fairly trivial.

End 3.0

Perhaps the most important framework feature added alongside C# 6.0 was support for cross platform compilation. In other words, not only would the .NET Framework run on Windows, but Microsoft also provided an implementation (called “CoreFX”) for .NET Core that would run on Linux and OS X. Although the .NET Core is not an equivalent feature set to the full .NET Framework, it includes enough functionality that entire (ASP.NET) websites can be hosted on operating systems other than Windows and its Internet Information Server (IIS). This means that with the same code base it is possible to compile and execute applications that run on cross platforms. .NET Core includes everything from the .NET Compiler Platform (“Roslyn”), which itself executes on Linux and OS X, to the .NET Core runtime, along with tools like the .NET Version Manager (DNVM) and the .NET Execution Environment (DNX).

Common Intermediate Language and ILDASM

As mentioned in the preceding section, the C# compiler converts C# code to CIL code and not to machine code. The processor can directly understand machine code, but CIL code needs to be converted before the processor can execute it. Given an assembly (either a DLL or an executable), it is possible to view the CIL code using a CIL disassembler utility to deconstruct the assembly into its CIL representation. (The CIL disassembler is commonly referred to by its Microsoft-specific filename, ILDASM, which stands for IL Disassembler.) This program will disassemble a program or its class libraries, displaying the CIL generated by the C# compiler.

The exact command used for the CIL disassembler depends on which implementation of the CLI is used. You can execute the .NET CIL disassembler from the command line as shown in Output 1.8.

OUTPUT 1.8

>ildasm /text HelloWorld.exe

The /text portion is used so that the output appears on the command console rather than in a new window. The stream of output that results from executing these commands is a dump of CIL code included in the HelloWorld.exe program. Note that CIL code is significantly easier to understand than machine code. For many developers, this may raise a concern because it is easier for programs to be decompiled and algorithms understood without explicitly redistributing the source code.

As with any program, CLI based or not, the only foolproof way of preventing disassembly is to disallow access to the compiled program altogether (for example, only hosting a program on a website instead of distributing it out to a user’s machine). However, if decreased accessibility tothe source code is all that is required, there are several obfuscators available. These obfuscators open up the IL code and transform it so that it does the same thing but in a way that is much more difficult to understand. This prevents the casual developer from accessing the code and creates assemblies that are much more difficult and tedious to decompile into comprehensible code. Unless a program requires a high degree of algorithm security, these obfuscators are generally sufficient.


Advanced Topic: CIL Output for HelloWorld.exe

Listing 1.19 shows the CIL code created by ILDASM.

LISTING 1.19: Sample CIL Output


// Microsoft (R) .NET Framework IL Disassembler. Version 4.6.81.0
// Copyright (c) Microsoft Corporation. All rights reserved.



// Metadata version: v4.0.30319
.assembly extern mscorlib
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) // .z\V.4..
.ver 4:0:0:0
}
.assembly HelloWorld
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilationRelaxationsAttribute::.ctor(int32) = ( 01 00 08 00 00 00 00 00 )
.custom instance void [mscorlib]System.Runtime.CompilerServices.RuntimeCompatibilityAttribute::.ctor() = ( 01 00 01 00 54 02 16 57 72 61 70 4E 6F 6E 45 78 // ....T..WrapNonEx
63 65 70 74 69 6F 6E 54 68 72 6F 77 73 01 ) // ceptionThrows.

// --- The following custom attribute is added automatically, do not uncomment -------
// .custom instance void [mscorlib]System.Diagnostics.DebuggableAttribute::.ctor(valuetype [mscorlib]System.Diagnostics.DebuggableAttribute/DebuggingModes) = ( 01 00 07 01 00 00 00 00 )

.hash algorithm 0x00008004
.ver 0:0:0:0
}
.module HelloWorld.exe
// MVID: {1FB5153C-639E-401D-8C94-22A66C18DC7A}
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003 // WINDOWS_CUI
.corflags 0x00000001 // ILONLY
// Image base: 0x01190000


// =============== CLASS MEMBERS DECLARATION ===================

.class public auto ansi beforefieldinit AddisonWesley.Michaelis.EssentialCSharp.Chapter01.Listing01_01.HelloWorld
extends [mscorlib]System.Object
{
.method public hidebysig static void Main() cil managed
{
.entrypoint
// Code size 13 (0xd)
.maxstack 8
IL_0000: nop
IL_0001: ldstr "Hello. My name is Inigo Montoya."
IL_0006: call void [mscorlib]System.Console::WriteLine(string)
IL_000b: nop
IL_000c: ret
} // end of method HelloWorld::Main

.method public hidebysig specialname rtspecialname
instance void .ctor() cil managed
{
// Code size 8 (0x8)
.maxstack 8
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: nop
IL_0007: ret
} // end of method HelloWorld::.ctor

} // end of class AddisonWesley.Michaelis.EssentialCSharp.Chapter01. Listing01_01.HelloWorld


// =============================================================

// *********** DISASSEMBLY COMPLETE ***********************


The beginning of the listing is the manifest information. It includes not only the full name of the disassembled module (HelloWorld.exe), but also all the modules and assemblies it depends on, along with their version information.

Perhaps the most interesting thing that you can glean from such a listing is how relatively easy it is to follow what the program is doing compared to trying to read and understand machine code (assembler). In the listing, an explicit reference toSystem.Console.WriteLine() appears. There is a lot of peripheral information to the CIL code listing, but if a developer wanted to understand the inner workings of a C# module (or any CLI-based program) without having access to the original source code, it would be relatively easy unless an obfuscator is used. In fact, several free tools are available (such as Red Gate’s Reflector, ILSpy, JustDecompile, dotPeek, and CodeReflect) that can decompile from CIL to C# automatically.


Summary

This chapter served as a rudimentary introduction to C#. It provided a means of familiarizing you with basic C# syntax. Because of C#’s similarity to C++-style languages, much of this might not have been new material to you. However, C# and managed code do have some distinct characteristics, such as compilation down to CIL. Although it is not unique, another key characteristic of C# is its full support for object-oriented programming. Even tasks such as reading and writing data to the console are object oriented. Object orientation is foundational to C#, as you will see throughout this book.

The next chapter examines the fundamental data types that are part of the C# language, and discusses how you can use these data types with operands to form expressions.