Data Types - Essential C# 6.0 (2016)

Essential C# 6.0 (2016)

2. Data Types

From Chapter 1’s HelloWorld program, you got a feel for the C# language, its structure, basic syntax characteristics, and how to write the simplest of programs. This chapter continues to discuss the C# basics by investigating the fundamental C# types.

Image

Until now, you have worked with only a few built-in data types, with little explanation. In C# thousands of types exist, and you can combine types to create new types. A few types in C#, however, are relatively simple and are considered the building blocks of all other types. These types are the predefined types. The C# language’s predefined types include eight integer types, two binary floating-point types for scientific calculations and one decimal float for financial calculations, one Boolean type, and a character type. This chapter investigates these types, looks more closely at the string type, and introduces arrays.

Fundamental Numeric Types

The basic numeric types in C# have keywords associated with them. These types include integer types, floating-point types, and a special floating-point type called decimal to store large numbers with no representation error.

Integer Types

There are eight C# integer types. This variety allows you to select a data type large enough to hold its intended range of values without wasting resources. Table 2.1 lists each integer type.

Image

TABLE 2.1: Integer Types

Included in Table 2.1 (and in Tables 2.2 and 2.3) is a column for the full name of each type; we discuss the literal suffix later in the chapter. All the fundamental types in C# have both a short name and a full name. The full name corresponds to the type as it is named in the Base Class Library (BCL). This name, which is the same across all languages, uniquely identifies the type within an assembly. Because of the fundamental nature of these types, C# also supplies keywords as short names or abbreviations to the full names of fundamental types. From the compiler’s perspective, both names refer to the same type, producing exactly the same code. In fact, an examination of the resultant CIL code would provide no indication of which name was used.

Although C# supports using both the full BCL name and the keyword, as developers we are left with the choice of which to use when. Rather than switching back and forth, it is better to use one or the other consistently. For this reason, C# developers generally go with using the C# keyword form—choosing, for example, int rather than System.Int32 and string rather than System.String (or a possible shortcut of String).


Guidelines

DO use the C# keyword rather than the BCL name when specifying a data type (for example, string rather than String).

DO favor consistency rather than variety within your code.


The choice for consistency frequently may be at odds with other guidelines. For example, given the guideline to use the C# keyword in place of the BCL name, there may be occasions when you find yourself maintaining a file (or library of files) with the opposite style. In these cases it would better to stay consistent with the previous style than to inject a new style and inconsistencies in the conventions. Even so, if the “style” was actually a bad coding practice that was likely to introduce bugs and obstruct successful maintenance, by all means correct the issue throughout.


Language Contrast: C++—short Data Type

In C/C++, the short data type is an abbreviation for short int. In C#, short on its own is the actual data type.


Floating-Point Types (float, double)

Floating-point numbers have varying degrees of precision, and binary floating-point types can represent numbers exactly only if they are a fraction with a power of 2 as the denominator. If you were to set the value of a floating-point variable to be 0.1, it could very easily be represented as0.0999999999999999 or 0.10000000000000001 or some other number very close to 0.1. Similarly, setting a variable to a large number such as Avogadro’s number, 6.02 × 1023, could lead to a representation error of approximately 108, which after all is a tiny fraction of that number. The accuracy of a floating-point number is in proportion to the magnitude of the number it represents. A floating-point number is precise to a certain number of significant digits, not by a fixed value such as ±0.01.

C# supports the two binary floating-point number types listed in Table 2.2.

Image

TABLE 2.2: Floating-Point Types

Binary numbers appear as base 10 (denary) numbers for human readability. The number of bits (binary digits) converts to 15 decimal digits, with a remainder that contributes to a sixteenth decimal digit as expressed in Table 2.2. Specifically, numbers between 1.7 × 10307 and less than 1 × 10308 have only 15 significant digits. However, numbers ranging from 1 × 10308 to 1.7 × 10308 will have 16 significant digits. A similar range of significant digits occurs with the decimal type as well.

Decimal Type

C# also provides a decimal floating-point type with 128-bit precision (see Table 2.3). This type is suitable for financial calculations.

Image

TABLE 2.3: decimal Type

Unlike binary floating-point numbers, the decimal type maintains exact accuracy for all denary numbers within its range. With the decimal type, therefore, a value of 0.1 is exactly 0.1. However, while the decimal type has greater precision than the floating-point types, it has a smaller range. Thus, conversions from floating-point types to the decimal type may result in overflow errors. Also, calculations with decimal are slightly (generally imperceptibly) slower.


Advanced Topic: Floating-Point Types Dissected

Denary numbers within the range and precision limits of the decimal type are represented exactly. In contrast, the binary floating-point representation of many denary numbers introduces a rounding error. Just as ⅓ cannot be represented exactly in any finite number of decimal digits, so Image cannot be represented exactly in any finite number of binary digits. In both cases, we end up with a rounding error of some kind.

A decimal is represented by ±N * 10k where the following is true:

• N, the mantissa, is a positive 96-bit integer.

• k, the exponent, is given by -28 <= k <= 0.

In contrast, a binary float is any number ±N * 2k where the following is true:

• N is a positive 24-bit (for float) or 53-bit (for double) integer.

• k is an integer ranging from -149 to +104 for float and from -1074 to +970 for double.


Literal Values

A literal value is a representation of a constant value within source code. For example, if you want to have System.Console.WriteLine() print out the integer value 42 and the double value 1.618034, you could use the code shown in Listing 2.1.

LISTING 2.1: Specifying Literal Values


System.Console.WriteLine(42);
System.Console.WriteLine(1.618034);


Output 2.1 shows the results of Listing 2.1.

OUTPUT 2.1

42
1.618034


Beginner Topic: Use Caution When Hardcoding Values

The practice of placing a value directly into source code is called hardcoding, because changing the values requires recompiling the code. Developers must carefully consider the choice between hardcoding values within their code and retrieving them from an external source, such as a configuration file, so that the values are modifiable without recompiling.


By default, when you specify a literal number with a decimal point, the compiler interprets it as a double type. Conversely, a literal value with no decimal point generally defaults to an int, assuming the value is not too large to be stored in an integer. If the value is too large, the compiler will interpret it as a long. Furthermore, the C# compiler allows assignment to a numeric type other than an int, assuming the literal value is appropriate for the target data type. short s = 42 and byte b = 77 are allowed, for example. However, this is appropriate only for constant values; b = s is not allowed without additional syntax, as discussed in the section Conversions between Data Types later in this chapter.

As previously discussed in the section Fundamental Numeric Types, there are many different numeric types in C#. In Listing 2.2, a literal value is placed within C# code. Since numbers with a decimal point will default to the double data type, the output, shown in Output 2.2, is1.61803398874989 (the last digit, 5, is missing), corresponding to the expected accuracy of a double.

LISTING 2.2: Specifying a Literal double


System.Console.WriteLine(1.618033988749895);


OUTPUT 2.2

1.61803398874989

To view the intended number with its full accuracy, you must declare explicitly the literal value as a decimal type by appending an M (or m) (see Listing 2.3 and Output 2.3).

LISTING 2.3: Specifying a Literal decimal


System.Console.WriteLine(1.618033988749895M);


OUTPUT 2.3

1.618033988749895

Now the output of Listing 2.3 is as expected: 1.618033988749895. Note that d is for double. To remember that m should be used to identify a decimal, remember that “m is for monetary calculations.”

You can also add a suffix to a value to explicitly declare a literal as float or double by using the F and D suffixes, respectively. For integer data types, the suffixes are U, L, LU, and UL. The type of an integer literal can be determined as follows:

• Numeric literals with no suffix resolve to the first data type that can store the value in this order: int, uint, long, and ulong.

• Numeric literals with the suffix U resolve to the first data type that can store the value in the order uint and then ulong.

• Numeric literals with the suffix L resolve to the first data type that can store the value in the order long and then ulong.

• If the numeric literal has the suffix UL or LU, it is of type ulong.

Note that suffixes for literals are case insensitive. However, uppercase is generally preferred to avoid any ambiguity between the lowercase letter l and the digit 1.

In some situations, you may wish to use exponential notation instead of writing out several zeroes before or after the decimal point. To use exponential notation, supply the e or E infix, follow the infix character with a positive or negative integer number, and complete the literal with the appropriate data type suffix. For example, you could print out Avogadro’s number as a float, as shown in Listing 2.4 and Output 2.4.

LISTING 2.4: Exponential Notation


System.Console.WriteLine(6.023E23F);


OUTPUT 2.4

6.023E+23


Guidelines

DO use uppercase literal suffixes (for example, 1.618033988749895M).



Beginner Topic: Hexadecimal Notation

Usually you work with numbers that are represented with a base of 10, meaning there are 10 symbols (0–9) for each digit in the number. If a number is displayed with hexadecimal notation, it is displayed with a base of 16 numbers, meaning 16 symbols are used: 0–9, A–F (lowercase can also be used). Therefore, 0x000A corresponds to the decimal value 10 and 0x002A corresponds to the decimal value 42, being 2 × 16 + 10. The actual number is the same. Switching from hexadecimal to decimal, or vice versa, does not change the number itself, just the representation of the number.

Each hex digit is four bits, so a byte can represent two hex digits.


In all discussions of literal numeric values so far, we have covered only base 10 type values. C# also supports the ability to specify hexadecimal values. To specify a hexadecimal value, prefix the value with 0x and then use any hexadecimal digit, as shown in Listing 2.5.

LISTING 2.5: Hexadecimal Literal Value


// Display the value 42 using a hexadecimal literal.
System.Console.WriteLine(0x002A);


Output 2.5 shows the results of Listing 2.5.

OUTPUT 2.5

42

Note that this code still displays 42, not 0x002A.


Advanced Topic: Formatting Numbers As Hexadecimal

To display a numeric value in its hexadecimal format, it is necessary to use the x or X numeric formatting specifier. The casing determines whether the hexadecimal letters appear in lowercase or uppercase. Listing 2.6 shows an example of how to do this.

LISTING 2.6: Example of a Hexadecimal Format Specifier


// Displays "0x2A"
System.Console.WriteLine($"0x{42:X}");


Output 2.6 shows the results.

OUTPUT 2.6

0x2A

Note that the numeric literal (42) can be in decimal or hexadecimal form. The result will be the same. Also, to achieve the hexadecimal formatting, we rely on the formatting specifier—separated from the string interpolation expression with a colon.



Advanced Topic: Round-Trip Formatting

By default, System.Console.WriteLine(1.618033988749895); displays 1.61803398874989, with the last digit missing. To more accurately identify the string representation of the double value it is possible to convert it using a format string and the round-trip format specifier, R (or r). string.Format("{0:R}", 1.618033988749895), for example, will return the result 1.6180339887498949.

The round-trip format specifier returns a string that, if converted back into a numeric value, will always result in the original value. Listing 2.7 shows the numbers are not equal without use of the round-trip format.

LISTING 2.7: Formatting Using the R Format Specifier


// ...
const double number = 1.618033988749895;
double result;
string text;

text = $"{number}";
result = double.Parse(text);
System.Console.WriteLine($"{result == number}: result == number");

text = string.Format("{0:R}", number);
result = double.Parse(text);
System.Console.WriteLine($"{result == number}: result == number");

// ...


Output 2.7 shows the resultant output.

OUTPUT 2.7

False: result == number
True: result == number

When assigning text the first time, there is no round-trip format specifier; as a result, the value returned by double.Parse(text) is not the same as the original number value. In contrast, when the round-trip format specifier is used, double.Parse(text) returns the original value.

For those readers who are unfamiliar with the == syntax from C-based languages, result == number returns true if result is equal to number, while result != number does the opposite. Both assignment and equality operators are discussed in the next chapter.


More Fundamental Types

The fundamental types discussed so far are numeric types. C# includes some additional types as well: bool, char, and string.

Boolean Type (bool)

Another C# primitive is a Boolean or conditional type, bool, which represents true or false in conditional statements and expressions. Allowable values are the keywords true and false. The BCL name for bool is System.Boolean. For example, in order to compare two strings in a case-insensitive manner, you call the string.Compare() method and pass a bool literal true (see Listing 2.8).

LISTING 2.8: A Case-Insensitive Comparison of Two Strings


string option;
...
int comparison = string.Compare(option, "/Help", true);


In this case, you make a case-insensitive comparison of the contents of the variable option with the literal text /Help and assign the result to comparison.

Although theoretically a single bit could hold the value of a Boolean, the size of bool is 1 byte.

Character Type (char)

A char type represents 16-bit characters whose set of possible values are drawn from the Unicode character set’s UTF-16 encoding. A char is the same size as a 16-bit unsigned integer (ushort), which represents values between 0 and 65,535. However, char is a unique type in C# and code should treat it as such.

The BCL name for char is System.Char.


Beginner Topic: The Unicode Standard

Unicode is an international standard for representing characters found in the majority of human languages. It provides computer systems with functionality for building localized applications, applications that display the appropriate language and culture characteristics for different cultures.



Advanced Topic: 16 Bits Is Too Small for All Unicode Characters

Unfortunately, not all Unicode characters can be represented by just one 16-bit char. The original Unicode designers believed that 16 bits would be enough, but as more languages were supported, it was realized that this assumption was incorrect. As a result, some (rarely used) Unicode characters are composed of “surrogate pairs” of two char values.


To construct a literal char, place the character within single quotes, as in 'A'. Allowable characters comprise the full range of keyboard characters, including letters, numbers, and special symbols.

Some characters cannot be placed directly into the source code and instead require special handling. These characters are prefixed with a backslash (\) followed by a special character code. In combination, the backslash and special character code constitute an escape sequence. For example, \n represents a newline, and \t represents a tab. Since a backslash indicates the beginning of an escape sequence, it can no longer identify a simple backslash; instead, you need to use \\ to represent a single backslash character.

Listing 2.9 writes out one single quote because the character represented by \' corresponds to a single quote.

LISTING 2.9: Displaying a Single Quote Using an Escape Sequence


class SingleQuote
{
static void Main()
{
System.Console.WriteLine('\'');
}
}


In addition to showing the escape sequences, Table 2.4 includes the Unicode representation of characters.

Image

TABLE 2.4: Escape Characters

You can represent any character using Unicode encoding. To do so, prefix the Unicode value with \u. You represent Unicode characters in hexadecimal notation. The letter A, for example, is the hexadecimal value 0x41. Listing 2.10 uses Unicode characters to display a smiley face (:)), and Output 2.8 shows the results.

LISTING 2.10: Using Unicode Encoding to Display a Smiley Face


System.Console.Write('\u003A');
System.Console.WriteLine('\u0029');


OUTPUT 2.8

:)

Strings

A finite sequence of zero or more characters is called a string. The string type in C# is string, whose BCL name is System.String. The string type includes some special characteristics that may be unexpected to developers familiar with other programming languages. In addition to the string literal format discussed in Chapter 1, strings include a “verbatim string” prefix character of @, string interpolation with the $ prefix character, and the fact that strings are immutable.

Literals

You can enter a literal string into code by placing the text in double quotes ("), as you saw in the HelloWorld program. Strings are composed of characters, and because of this, character escape sequences can be embedded within a string.

In Listing 2.11, for example, two lines of text are displayed. However, instead of using System.Console.WriteLine(), the code listing shows System.Console.Write() with the newline character, \n. Output 2.9 shows the results.

LISTING 2.11: Using the \n Character to Insert a Newline


class DuelOfWits
{
static void Main()
{
System.Console.Write(
"\"Truly, you have a dizzying intellect.\"");
System.Console.Write("\n\"Wait 'til I get going!\"\n");
}
}


OUTPUT 2.9

"Truly, you have a dizzying intellect."
"Wait 'til I get going!"

The escape sequence for double quotes differentiates the printed double quotes from the double quotes that define the beginning and end of the string.

In C#, you can use the @ symbol in front of a string to signify that a backslash should not be interpreted as the beginning of an escape sequence. The resultant verbatim string literal does not reinterpret just the backslash character. Whitespace is also taken verbatim when using the @ string syntax. The triangle in Listing 2.12, for example, appears in the console exactly as typed, including the backslashes, newlines, and indentation. Output 2.10 shows the results.

LISTING 2.12: Displaying a Triangle Using a Verbatim String Literal


class Triangle
{
static void Main()
{
System.Console.Write(@"begin
/\
/ \
/ \
/ \
/________\
end");
}
}


OUTPUT 2.10

begin
/\
/ \
/ \
/ \
/________\
end

Without the @ character, this code would not even compile. In fact, even if you changed the shape to a square, eliminating the backslashes, the code still would not compile because a newline cannot be placed directly within a string that is not prefaced with the @ symbol.

The only escape sequence the verbatim string does support is "", which signifies double quotes and does not terminate the string.


Language Contrast: C++—String Concatenation at Compile Time

Unlike C++, C# does not automatically concatenate literal strings. You cannot, for example, specify a string literal as follows:

"Major Strasser has been shot."
"Round up the usual suspects."

Rather, concatenation requires the use of the addition operator. (If the compiler can calculate the result at compile time, however, the resultant CIL code will be a single string.)


If the same literal string appears within an assembly multiple times, the compiler will define the string only once within the assembly and all variables will refer to the same string. That way, if the same string literal containing thousands of characters was placed multiple times into the code, the resultant assembly would reflect the size of only one of them.

Begin 6.0

String Interpolation

As discussed in Chapter 1, strings can support embedded expressions when using the string interpolation format. The string interpolation syntax prefixes a string literal with a dollar symbol and then embeds the expressions within curly brackets. The following is an example:

System.Console.WriteLine($"Your full name is {firstName} {lastName}.");

where firstName and lastName are simple expressions that refer to variables.

Note that string literals can be combined with string interpolation by specifying the “$” prior to the “@” symbol, as in this example:

System.Console.WriteLine($@"Your full name is:
{ firstName } { lastName }");

Since this is a string literal, the text output on two lines. You can, however, make a similar line break in the code without incurring a line break in the output by placing the line feeds inside the curly braces as follows:

System.Console.WriteLine($@"Your full name is: {
firstName } { lastName }");


Advanced Topic: Understanding the Internals of String Interpolation

String interpolation is a shorthand for invoking the string.Format() method. For example, a statement such as

System.Console.WriteLine($"Your full name is {firstName} {lastName}.")

will be transformed to the C# equivalent of

object[] args = new object[] { firstName, lastName };
Console.WriteLine(string.Format("Your full name is {0} {1}.", args));

This leaves in place support for localization in the same way it works with composite string and doesn’t introduce any post-compile injection of code via strings.


End 6.0

String Methods

The string type, like the System.Console type, includes several methods. There are methods, for example, for formatting, concatenating, and comparing strings.

The Format() method in Table 2.5 behaves exactly like the Console.Write() and Console.WriteLine() methods, except that instead of displaying the result in the console window, string.Format() returns the result to the caller. Of course, with string interpolation the need for string.Format() is significantly reduced (except for localization support). Under the covers, however, string interpolation compiles down to CIL that leverages string.Format().

Image

Image

TABLE 2.5: string Static Methods

All of the methods in Table 2.5 are static. This means that, to call the method, it is necessary to prefix the method name (for example, Concat) with the type that contains the method (for example, string). As illustrated below, however, some of the methods in the string class areinstance methods. Instead of prefixing the method with the type, instance methods use the variable name (or some other reference to an instance). Table 2.6 shows a few of these methods, along with an example.

Image

TABLE 2.6: string Methods

Begin 6.0


Advanced Topic: The using and using static Directives

The invocation of static methods as we have used them so far always involves a prefix of the namespace followed by the type name. When calling System.Console.WriteLine for example, even though the method invoked is WriteLine() and there is no other method with that name within the context, it is still necessary to prefix the method name with the namespace (System) followed by the type name (Console). On occasion, you may want a shortcut to avoid such explicitness; to do so, you can leverage the C# 6.0 using static directive as shown in Listing 2.13.

LISTING 2.13: The using static Directive


// The using directives allow you to drop the namespace
using static System.Console;
class HeyYou
{
static void Main()
{
string firstName;
string lastName;

WriteLine("Hey you!");

Write("Enter your first name: ");
firstName = ReadLine();

Write("Enter your last name: ");
lastName = ReadLine();

WriteLine(
$"Your full name is {firstName} {lastName}.");
}
}


The using static directive needs to appear at the top of the file.1 Each time we use the System.Console class, it is no longer necessary to also use the “System.Console” prefix. Instead, we can simply write the method name. An important point to note about the using static directive is that it works only for static methods and properties, not for instance members.

1. Or at the top of a namespace declaration.

A similar directive, the using directive, allows for eliminating the namespace prefix—for example, “System.” Unlike the using static directive, the using directive applies universally within the file (or namespace) in which it resides (not just to static members). With theusing directive, you can (optionally) eliminate all references to the namespace, whether during instantiation, during static method invocation, or even with the nameof operator found in C# 6.0.


End 6.0

String Formatting

Whether you use string.Format() or the C# 6.0 string interpolation feature to construct complex formatting strings, a rich and complex set of formatting patterns is available to display numbers, dates, times, timespans, and so on. For example, if price is a variable of type decimal, then string.Format("{0,20:C2}", price) or the equivalent interpolation $"{price,20:C2}" both convert the decimal value to a string using the default currency formatting rules, rounded to two figures after the decimal place, and right-justified in a 20-character-wide string. Space does not permit a detailed discussion of all the possible formatting strings; consult the MSDN documentation for string.Format() for a complete listing of formatting strings.

If you want an actual left or right curly brace inside an interpolated string or formatted string, you can double the brace to indicate that it is not introducing a pattern. For example, the interpolated string $"{{ {price:C2} }}" might produce the string "{ $1,234.56 }".

New Line

When writing out a new line, the exact characters for the new line will depend on the operating system on which you are executing. On Microsoft Windows platforms, the newline is the combination of both the carriage return (\r) and line feed (\n) characters, while a single line feed is used on UNIX. One way to overcome the discrepancy between platforms is simply to use System.Console.WriteLine() to output a blank line. Another approach, which is almost essential for a new line on multiple platforms when you are not outputting to the console, is to useSystem.Environment.NewLine. In other words, System.Console.WriteLine("Hello World") and System.Console.Write($"Hello World{System.Environment.NewLine}") are equivalent.


Advanced Topic: C# Properties

The Length member referred to in the following section is not actually a method, as indicated by the fact that there are no parentheses following its call. Length is a property of string, and C# syntax allows access to a property as though it were a member variable (known in C# as a field). In other words, a property has the behavior of special methods called setters and getters, but the syntax for accessing that behavior is that of a field.

Examining the underlying CIL implementation of a property reveals that it compiles into two methods: set_<PropertyName> and get_<PropertyName>. Neither of these, however, is directly accessible from C# code, except through the C# property constructs. SeeChapter 5 for more details on properties.


String Length

To determine the length of a string, you use a string member called Length. This particular member is called a read-only property. As such, it cannot be set, nor does calling it require any parameters. Listing 2.14 demonstrates how to use the Length property, and Output 2.11 shows the results.

LISTING 2.14: Using string’s Length Member


class PalindromeLength
{
static void Main()
{
string palindrome;

System.Console.Write("Enter a palindrome: ");
palindrome = System.Console.ReadLine();

System.Console.WriteLine(
$"The palindrome \"{palindrome}\" is"
+ $" {palindrome.Length} characters.");
}
}


OUTPUT 2.11

Enter a palindrome: Never odd or even
The palindrome "Never odd or even" is 17 characters.

The length for a string cannot be set directly; it is calculated from the number of characters in the string. Furthermore, the length of a string cannot change because a string is immutable.

Strings Are Immutable

A key characteristic of the string type is that it is immutable. A string variable can be assigned an entirely new value but there is no facility for modifying the contents of a string. It is not possible, therefore, to convert a string to all uppercase letters. It is trivial to create a new string that is composed of an uppercase version of the old string, but the old string is not modified in the process. Consider Listing 2.15 as an example.

LISTING 2.15: Error; string Is Immutable


class Uppercase
{
static void Main()
{
string text;

System.Console.Write("Enter text: ");
text = System.Console.ReadLine();

// UNEXPECTED: Does not convert text to uppercase
text.ToUpper();

System.Console.WriteLine(text);
}
}


Output 2.12 shows the results of Listing 2.15.

OUTPUT 2.12

Enter text: This is a test of the emergency broadcast system.
This is a test of the emergency broadcast system.

At a glance, it would appear that text.ToUpper() should convert the characters within text to uppercase. However, strings are immutable and, therefore, text.ToUpper() will make no such modification. Instead, text.ToUpper() returns a new string that needs to be saved into a variable or passed to System.Console.WriteLine() directly. The corrected code is shown in Listing 2.16, and its output is shown in Output 2.13.

LISTING 2.16: Working with Strings


class Uppercase
{
static void Main()
{
string text, uppercase;

System.Console.Write("Enter text: ");
text = System.Console.ReadLine();

// Return a new string in uppercase
uppercase = text.ToUpper();

System.Console.WriteLine(uppercase);
}
}


OUTPUT 2.13

Enter text: This is a test of the emergency broadcast system.
THIS IS A TEST OF THE EMERGENCY BROADCAST SYSTEM.

If the immutability of a string is ignored, mistakes similar to those shown in Listing 2.15 can occur with other string methods as well.

To actually change the value of text, assign the value from ToUpper() back into text, as in the following code:

text = text.ToUpper();

System.Text.StringBuilder

If considerable string modification is needed, such as when constructing a long string in multiple steps, you should use the data type System.Text.StringBuilder rather than string. The StringBuilder type includes methods such as Append(), AppendFormat(),Insert(), Remove(), and Replace(), some of which are also available with string. The key difference, however, is that with StringBuilder these methods will modify the data in the StringBuilder itself, and will not simply return a new string.

null and void

Two additional keywords relating to types are null and void. The null value identified with the null keyword, indicates that the variable does not refer to any valid object. void is used to indicate the absence of a type or the absence of any value altogether.

null

null can also be used as a type of string “literal.” null indicates that a variable is set to nothing. Reference types, pointer types, and nullable value types can be assigned the value null. The only reference type covered so far in this book is string; Chapter 5 covers the topic of creating classes (which are reference types) in detail. For now, suffice it to say that a variable of reference type contains a reference to a location in memory that is different from the value of the variable. Code that sets a variable to null explicitly assigns the reference to refer to no valid value. In fact, it is even possible to check whether a reference refers to nothing. Listing 2.17 demonstrates assigning null to a string variable.

LISTING 2.17: Assigning null to a String


static void Main()
{
string faxNumber;
// ...

// Clear the value of faxNumber.
faxNumber = null;

// ...
}


Assigning the value null to a reference type is not equivalent to not assigning it at all. In other words, a variable that has been assigned null has still been set, whereas a variable with no assignment has not been set and, therefore, will often cause a compile error if used prior to assignment.

Assigning the value null to a string variable is distinctly different from assigning an empty string, "". Use of null indicates that the variable has no value, whereas "" indicates that there is a value—an empty string. This type of distinction can be quite useful. For example, the programming logic could interpret a faxNumber of null to mean that the fax number is unknown, while a faxNumber value of "" could indicate that there is no fax number.

The void “Type”

Sometimes the C# syntax requires a data type to be specified but no data is actually passed. For example, if no return from a method is needed, C# allows you to specify void as the data type instead. The declaration of Main within the HelloWorld program is an example. The use ofvoid as the return type indicates that the method is not returning any data and tells the compiler not to expect a value. void is not a data type per se, but rather an indication that there is no data being returned.


Language Contrast: C++

In both C++ and C#, void has two meanings: as a marker that a method does not return any data, and to represent a pointer to a storage location of unknown type. In C++ programs it is quite common to see pointer types like void**. C# can also represent pointers to storage locations of unknown type using the same syntax, but this usage is comparatively rare in C# and typically encountered only when writing programs that interoperate with unmanaged code libraries.



Language Contrast: Visual Basic—Returning void Is Like Defining a Subroutine

The Visual Basic equivalent of returning a void in C# is to define a subroutine (Sub/End Sub) rather than a function that returns a value.


Begin 3.0


Advanced Topic: Implicitly Typed Local Variables

C# 3.0 added a contextual keyword, var, for declaring an implicitly typed local variable. As long as the code initializes a variable at declaration time with an expression of unambiguous type, C# 3.0 and later allow for the variable data type to be implied rather than stated, as shown in Listing 2.18.

LISTING 2.18: Working with Strings


class Uppercase
{
static void Main()
{
System.Console.Write("Enter text: ");
var text = System.Console.ReadLine();

// Return a new string in uppercase
var uppercase = text.ToUpper();

System.Console.WriteLine(uppercase);
}
}


This listing is different from Listing 2.16 in two ways. First, rather than using the explicit data type string for the declaration, Listing 2.18 uses var. The resultant CIL code is identical to using string explicitly. However, var indicates to the compiler that it should determine the data type from the value (System.Console.ReadLine()) that is assigned within the declaration.

Second, the variables text and uppercase are initialized by their declarations. To not do so would result in an error at compile time. As mentioned earlier, the compiler determines the data type of the initializing expression and declares the variable accordingly, just as it would if the programmer had specified the type explicitly.

Although using var rather than the explicit data type is allowed, consider avoiding such use when the data type is known—for example, use string for the declaration of text and uppercase. Not only does this make the code more understandable, but it also verifies that the data type returned by the right-hand side expression is the type expected. When using a var declared variable, the right-hand side data type should be obvious; if it isn’t, consider avoiding the use of the var declaration.

Support for var was added to the language in C# 3.0 to permit use of anonymous types. Anonymous types are data types that are declared “on the fly” within a method, rather than through explicit class definitions, as shown in Listing 2.19. (See Chapter 14 for more details on anonymous types.)

LISTING 2.19: Implicit Local Variables with Anonymous Types


class Program
{
static void Main()
{
var patent1 =
new { Title = "Bifocals",
YearOfPublication = "1784" };
var patent2 =
new { Title = "Phonograph",
YearOfPublication = "1877" };

System.Console.WriteLine(
$"{ patent1.Title } ({ patent1.YearOfPublication })");
System.Console.WriteLine(
$"{ patent2.Title } ({ patent2.YearOfPublication })");
}
}


The corresponding output is shown in Output 2.14.

OUTPUT 2.14

Bifocals (1784)
Phonograph (1877)

Listing 2.19 demonstrates the anonymous type assignment to an implicitly typed (var) local variable. This type of operation provides critical functionality in tandem with C# 3.0 support for joining (associating) data types or reducing the size of a particular type down to fewer data elements.


End 3.0

Categories of Types

All types fall into one of two categories: value types and reference types. The differences between the types in each category stem from how they are copied: Value type data is always copied by value, while reference type data is always copied by reference.

Value Types

With the exception of string, all the predefined types in the book so far have been value types. Variables of value types contain the value directly. In other words, the variable refers to the same location in memory where the value is stored. Because of this, when a different variable is assigned the same value, a copy of the original variable’s value is made to the location of the new variable. A second variable of the same value type cannot refer to the same location in memory as the first variable. Consequently, changing the value of the first variable will not affect the value in the second. Figure 2.1 demonstrates this. In the figure, number1 refers to a particular location in memory that contains the value 42. After assigning number1 to number2, both variables will contain the value 42. However, modifying either variable’s value will not affect the other.

Image

FIGURE 2.1: Value Types Contain the Data Directly

Similarly, passing a value type to a method such as Console.WriteLine() will also result in a memory copy, and any changes to the parameter inside the method will not affect the original value within the calling function. Since value types require a memory copy, they generally should be defined to consume a small amount of memory; value types should almost always be less than 16 bytes in size.

Reference Types

By contrast, the value of a reference type is a reference to a storage location that contains data. Reference types store the reference where the data is located instead of storing the data directly, as value types do. Therefore, to access the data, the runtime will read the memory location out of the variable and then “jump” to the location in memory that contains the data. The memory area of the data a reference type points to is called the heap (see Figure 2.2).

Image

FIGURE 2.2: Reference Types Point to the Heap

A reference type does not require the same memory copy of the data that a value type does, which makes copying reference types far more efficient than copying large value types. When assigning the value of one reference type variable to another reference type variable, only the reference is copied, not the data referred to. In practice, a reference is always the same size as the “native size” of the processor: A 32-bit processor will copy a 32-bit reference and a 64-bit processor will copy a 64-bit reference, and so on. Obviously, copying the small reference to a large block of data is faster than copying the entire block, as a value type would.

Since reference types copy a reference to data, two different variables can refer to the same data. If two variables refer to the same object, changing a field of the object through one variable causes the effect to be seen when accessing the field via another variable. This happens both for assignment and for method calls. Therefore, a method can affect the data of a reference type, and that change can be observed when control returns to the caller. For this reason, a key factor when choosing between defining a reference type or a value type is whether the object is logically like an immutable value of fixed size (and therefore possibly a value type), or logically a mutable thing that can be referred to (and therefore likely to be a reference type).

Besides string and any custom classes such as Program, all types discussed so far are value types. However, most types are reference types. Although it is possible to define custom value types, it is relatively rare to do so in comparison to the number of custom reference types.

Begin 2.0

Nullable Modifier

Value types cannot usually be assigned null because, by definition, they cannot contain references, including references to nothing. However, this presents a problem because we frequently wish to represent values that are “missing.” When specifying a count, for example, what do you enter if the count is unknown? One possible solution is to designate a “magic” value, such as -1 or int.MaxValue, but these are valid integers. Rather, it is desirable to assign null to the value type because it is not a valid integer.

To declare variables of value type that can store null, you use the nullable modifier, ?. This feature, which was introduced with C# 2.0, appears in Listing 2.20.

LISTING 2.20: Using the Nullable Modifier


static void Main()
{
int? count = null;
do
{
// ...
}
while(count == null);
}


Assigning null to value types is especially attractive in database programming. Frequently, value type columns in database tables allow null values. Retrieving such columns and assigning them to corresponding fields within C# code is problematic, unless the fields can contain nullas well. Fortunately, the nullable modifier is designed to handle such a scenario specifically.

End 2.0

Conversions between Data Types

Given the thousands of types predefined in the various CLI implementations and the unlimited number of types that code can define, it is important that types support conversion from one type to another where it makes sense. The most common operation that results in a conversion iscasting.

Consider the conversion between two numerical types: converting from a variable of type long to a variable of type int. A long type can contain values as large as 9,223,372,036,854,775,808; however, the maximum size of an int is 2,147,483,647. As such, that conversion could result in a loss of data—for example, if the variable of type long contains a value greater than the maximum size of an int. Any conversion that could result in a loss of magnitude or an exception because the conversion failed, requires an explicit cast. Conversely, a conversion operation that will not lose magnitude and will not throw an exception regardless of the operand types is an implicit conversion.

Explicit Cast

In C#, you cast using the cast operator. By specifying the type you would like the variable converted to within parentheses, you acknowledge that if an explicit cast is occurring, there may be a loss of precision and data, or an exception may result. The code in Listing 2.21 converts a long to an int and explicitly tells the system to attempt the operation.

LISTING 2.21: Explicit Cast Example

Image

With the cast operator, the programmer essentially says to the compiler, “Trust me, I know what I am doing. I know that the value will fit into the target type.” Making such a choice will cause the compiler to allow the conversion. However, with an explicit conversion, there is still a chance that an error, in the form of an exception, might occur while executing if the data is not converted successfully. It is, therefore, the programmer’s responsibility to ensure the data is successfully converted, or else to provide the necessary error-handling code when the conversion fails.



Advanced Topic: Checked and Unchecked Conversions

C# provides special keywords for marking a code block to indicate what should happen if the target data type is too small to contain the assigned data. By default, if the target data type cannot contain the assigned data, the data will truncate during assignment. For an example, seeListing 2.22.

LISTING 2.22: Overflowing an Integer Value


class Program
{
static void Main()
{
// int.MaxValue equals 2147483647
int n = int.MaxValue;
n = n + 1 ;
System.Console.WriteLine(n);
}
}


Output 2.15 shows the results.

OUTPUT 2.15

-2147483648

Listing 2.22 writes the value -2147483648 to the console. However, placing the code within a checked block, or using the checked option when running the compiler, will cause the runtime to throw an exception of type System.OverflowException. The syntax for a checked block uses the checked keyword, as shown in Listing 2.23.

LISTING 2.23: A Checked Block Example


class Program
{
static void Main()
{
checked
{
// int.MaxValue equals 2147483647
int n = int.MaxValue;
n = n + 1 ;
System.Console.WriteLine(n);
}
}
}


Output 2.16 shows the results.

OUTPUT 2.16

Unhandled Exception: System.OverflowException: Arithmetic operation
resulted in an overflow at Program.Main() in ...Program.cs:line 12

The result is that an exception is thrown if, within the checked block, an overflow assignment occurs at runtime.

The C# compiler provides a command-line option for changing the default checked behavior from unchecked to checked. C# also supports an unchecked block that overflows the data instead of throwing an exception for assignments within the block (see Listing 2.24).

LISTING 2.24: An Unchecked Block Example


using System;

class Program
{
static void Main()
{
unchecked
{
// int.MaxValue equals 2147483647
int n = int.MaxValue;
n = n + 1 ;
System.Console.WriteLine(n);
}
}
}


Output 2.17 shows the results.

OUTPUT 2.17

-2147483648

Even if the checked option is on during compilation, the unchecked keyword in the preceding code will prevent the runtime from throwing an exception during execution.


You cannot convert any type to any other type simply because you designate the conversion explicitly using the cast operator. The compiler will still check that the operation is valid. For example, you cannot convert a long to a bool. No such conversion is defined, and therefore, the compiler does not allow such a cast.


Language Contrast: Converting Numbers to Booleans

It may be surprising to learn that there is no valid cast from a numeric type to a Boolean type, since this is common in many other languages. The reason no such conversion exists in C# is to avoid any ambiguity, such as whether –1 corresponds to true or false. More importantly, as you will see in the next chapter, this constraint reduces the chance of using the assignment operator in place of the equality operator (avoiding if(x=42){...} when if(x==42){...} was intended, for example).


Implicit Conversion

In other instances, such as when going from an int type to a long type, there is no loss of precision and no fundamental change in the value of the type occurs. In these cases, the code needs to specify only the assignment operator; the conversion is implicit. In other words, the compiler is able to determine that such a conversion will work correctly. The code in Listing 2.25 converts from an int to a long by simply using the assignment operator.

LISTING 2.25: Not Using the Cast Operator for an Implicit Cast


int intNumber = 31416;
long longNumber = intNumber;


Even when no explicit cast operator is required (because an implicit conversion is allowed), it is still possible to include the cast operator (see Listing 2.26).

LISTING 2.26: Using the Cast Operator for an Implicit Cast


int intNumber = 31416;
long longNumber = (long) intNumber;


Type Conversion without Casting

No conversion is defined from a string to a numeric type, so methods such as Parse() are required. Each numeric data type includes a Parse() function that enables conversion from a string to the corresponding numeric type. Listing 2.27 demonstrates this call.

LISTING 2.27: Using int.Parse() to Convert a string to a Numeric Data Type


string text = "9.11E-31";
float kgElectronMass = float.Parse(text);


Another special type is available for converting one type to the next. This type is System.Convert, and an example of its use appears in Listing 2.28.

LISTING 2.28: Type Conversion Using System.Convert


string middleCText = "261.626";
double middleC = System.Convert.ToDouble(middleCText);
bool boolean = System.Convert.ToBoolean(middleC);


System.Convert supports only a small number of types and is not extensible. It allows conversion from any of the types bool, char, sbyte, short, int, long, ushort, uint, ulong, float, double, decimal, DateTime, and string to any other of those types.

Furthermore, all types support a ToString() method that can be used to provide a string representation of a type. Listing 2.29 demonstrates how to use this method. The resultant output is shown in Output 2.18.

LISTING 2.29: Using ToString() to Convert to a string


bool boolean = true;
string text = boolean.ToString();
// Display "True"
System.Console.WriteLine(text);


OUTPUT 2.18

True

For the majority of types, the ToString() method will return the name of the data type rather than a string representation of the data. The string representation is returned only if the type has an explicit implementation of ToString(). One last point to make is that it is possible to code custom conversion methods, and many such methods are available for classes in the runtime.

Begin 2.0


Advanced Topic: TryParse()

Starting with C# 2.0 (.NET 2.0), all the numeric primitive types include a static TryParse() method. This method is very similar to the Parse() method, except that instead of throwing an exception if the conversion fails, the TryParse() method returns false, as demonstrated in Listing 2.30.

LISTING 2.30: Using TryParse() in Place of an Invalid Cast Exception


double number;
string input;

System.Console.Write("Enter a number: ");
input = System.Console.ReadLine();
if (double.TryParse(input, out number))
{
// Converted correctly, now use number
// ...
}
else
{
System.Console.WriteLine(
"The text entered was not a valid number.");
}


Output 2.19 shows the results of Listing 2.30.

OUTPUT 2.19

Enter a number: forty-two
The text entered was not a valid number.

The resultant value that the code parses from the input string is returned via an out parameter—in this case, number.

The key difference between Parse() and TryParse() is the fact that TryParse() won’t throw an exception if it fails. Frequently, the conversion from a string to a numeric type depends on a user entering the text. It is expected, in such scenarios, that the user will enter invalid data that will not parse successfully. By using TryParse() rather than Parse(), you can avoid throwing exceptions in expected situations. (The expected situation in this case is that the user will enter invalid data and we try to avoid throwing exceptions for expected scenarios.)


End 2.0

Arrays

One particular aspect of variable declaration that Chapter 1 didn’t cover is array declaration. With array declaration, you can store multiple items of the same type using a single variable and still access them individually using the index when required. In C#, the array index starts at zero. Therefore, arrays in C# are zero based.


Beginner Topic: Arrays

Arrays provide a means of declaring a collection of data items that are of the same type using a single variable. Each item within the array is uniquely designated using an integer value called the index. The first item in a C# array is accessed using index 0. Programmers should be careful to specify an index value that is less than the array size. Since C# arrays are zero-based, the index for the last element in an array is one less than the total number of items in the array.

For beginners, it is helpful sometimes to think of the index as an offset. The first item is zero away from the start of the array. The second item is one away from the start of the array—and so on.


Arrays are a fundamental part of nearly every programming language, so they are required learning for virtually all developers. Although arrays are frequently used in C# programming, and necessary for the beginner to understand, most programs now use generic collection types rather than arrays when storing collections of data. Therefore, readers should skim over the following section, “Declaring an Array,” simply to become familiar with their instantiation and assignment rather. Table 2.7 provides the highlights of what to note. Generic collections will be covered in detail in Chapter 14.

Image

Image

TABLE 2.7 Array Highlights

In addition, the final section of the chapter, “Common Array Errors,” provides a review of some of the array idiosyncrasies.

Declaring an Array

In C#, you declare arrays using square brackets. First, you specify the element type of the array, followed by open and closed square brackets; then you enter the name of the variable. Listing 2.31 declares a variable called languages to be an array of strings.

LISTING 2.31: Declaring an Array


string[] languages;


Obviously, the first part of the array identifies the data type of the elements within the array. The square brackets that are part of the declaration identify the rank, or the number of dimensions, for the array; in this case, it is an array of rank one. These two pieces form the data type for the variable languages.


Language Contrast: C++ and Java—Array Declaration

The square brackets for an array in C# appear immediately following the data type instead of after the variable declaration. This keeps all the type information together instead of splitting it up both before and after the identifier, as occurs in C++ and Java.


Listing 2.31 defines an array with a rank of 1. Commas within the square brackets define additional dimensions. Listing 2.32, for example, defines a two-dimensional array of cells for a game of chess or tic-tac-toe.

LISTING 2.32: Declaring a Two-Dimensional Array


// | |
// ---+---+---
// | |
// ---+---+---
// | |
int[,] cells;


In Listing 2.32, the array has a rank of 2. The first dimension could correspond to cells going across and the second dimension represents cells going down. Additional dimensions are added, with additional commas, and the total rank is one more than the number of commas. Note that the number of items that occur for a particular dimension is not part of the variable declaration. This is specified when creating (instantiating) the array and allocating space for each element.

Instantiating and Assigning Arrays

Once an array is declared, you can immediately fill its values using a comma-delimited list of items enclosed within a pair of curly braces. Listing 2.33 declares an array of strings and then assigns the names of nine languages within curly braces.

LISTING 2.33: Array Declaration with Assignment


string[] languages = { "C#", "COBOL", "Java",
"C++", "Visual Basic", "Pascal",
"Fortran", "Lisp", "J#"};


The first item in the comma-delimited list becomes the first item in the array; the second item in the list becomes the second item in the array, and so on. The curly brackets are the notation for defining an array literal.

The assignment syntax shown in Listing 2.33 is available only if you declare and assign the value within one statement. To assign the value after declaration requires the use of the keyword new as shown in Listing 2.34.

LISTING 2.34: Array Assignment Following Declaration


string[] languages;
languages = new string[]{"C#", "COBOL", "Java",
"C++", "Visual Basic", "Pascal",
"Fortran", "Lisp", "J#" };


Starting in C# 3.0, specifying the data type of the array (string) following new is optional as long as the compiler is able to deduce the element type of the array from the types of the elements in the array initializer. The square brackets are still required.

C# also allows use of the new keyword as part of the declaration statement, so it allows the assignment and the declaration shown in Listing 2.35.

LISTING 2.35: Array Assignment with new during Declaration


string[] languages = new string[]{
"C#", "COBOL", "Java",
"C++", "Visual Basic", "Pascal",
"Fortran", "Lisp", "J#"};


The use of the new keyword tells the runtime to allocate memory for the data type. It instructs the runtime to instantiate the data type—in this case, an array.

Whenever you use the new keyword as part of an array assignment, you may also specify the size of the array within the square brackets. Listing 2.36 demonstrates this syntax.

LISTING 2.36: Declaration and Assignment with the new Keyword


string[] languages = new string[9]{
"C#", "COBOL", "Java",
"C++", "Visual Basic", "Pascal",
"Fortran", "Lisp", "J#"};


The array size in the initialization statement and the number of elements contained within the curly braces must match. Furthermore, it is possible to assign an array but not specify the initial values of the array, as demonstrated in Listing 2.37.

LISTING 2.37: Assigning without Literal Values


string[] languages = new string[9];


Assigning an array but not initializing the initial values will still initialize each element. The runtime initializes elements to their default values, as follows:

• Reference types (such as string) are initialized to null.

• Numeric types are initialized to zero.

• bool is initialized to false.

• char is initialized to \0.

Nonprimitive value types are recursively initialized by initializing each of their fields to their default values. As a result, it is not necessary to individually assign each element of an array before using it.

Begin 2.0


Note

In C# 2.0, it is possible to use the default() operator to produce the default value of a data type. default() takes a data type as a parameter. default(int), for example, produces 0 and default(bool) produces false.


End 2.0

Because the array size is not included as part of the variable declaration, it is possible to specify the size at runtime. For example, Listing 2.38 creates an array based on the size specified in the Console.ReadLine() call.

LISTING 2.38: Defining the Array Size at Runtime


string[] groceryList;
System.Console.Write("How many items on the list? ");
int size = int.Parse(System.Console.ReadLine());
groceryList = new string[size];
// ...


C# initializes multidimensional arrays similarly. A comma separates the size of each rank. Listing 2.39 initializes a tic-tac-toe board with no moves.

LISTING 2.39: Declaring a Two-Dimensional Array


int[,] cells = int[3,3];


Initializing a tic-tac-toe board with a specific position instead could be done as shown in Listing 2.40.

LISTING 2.40: Initializing a Two-Dimensional Array of Integers


int[,] cells = {
{1, 0, 2},
{1, 2, 0},
{1, 2, 1}
};


The initialization follows the pattern in which there is an array of three elements of type int[], and each element has the same size; in this example, the size is 3. Note that the sizes of each int[] element must all be identical. The declaration shown in Listing 2.41, therefore, is not valid.

LISTING 2.41: A Multidimensional Array with Inconsistent Size, Causing an Error


// ERROR: Each dimension must be consistently sized.
int[,] cells = {
{1, 0, 2, 0},
{1, 2, 0},
{1, 2}
{1}
};


Representing tic-tac-toe does not require an integer in each position. One alternative is a separate virtual board for each player, with each board containing a bool that indicates which positions the players selected. Listing 2.42 corresponds to a three-dimensional board.

LISTING 2.42: Initializing a Three-Dimensional Array


bool[,,] cells;
cells = new bool[2,3,3]
{
// Player 1 moves // X | |
{ {true, false, false}, // ---+---+---
{true, false, false}, // X | |
{true, false, true} }, // ---+---+---
// X | | X

// Player 2 moves // | | O
{ {false, false, true}, // ---+---+---
{false, true, false}, // | O |
{false, true, true} } // ---+---+---
// | O |
};


In this example, the board is initialized and the size of each rank is explicitly identified. In addition to identifying the size as part of the new expression, the literal values for the array are provided. The literal values of type bool[,,] are broken into two arrays of type bool[,], size 3 × 3. Each two-dimensional array is composed of three bool arrays, size 3.

As already mentioned, each dimension in a multidimensional array must be consistently sized. However, it is also possible to define a jagged array, which is an array of arrays. Jagged array syntax is slightly different from that of a multidimensional array; furthermore, jagged arrays do not need to be consistently sized. Therefore, it is possible to initialize a jagged array as shown in Listing 2.43.

LISTING 2.43: Initializing a Jagged Array


int[][] cells = {
new int[]{1, 0, 2, 0},
new int[]{1, 2, 0},
new int[]{1, 2},
new int[]{1}
};


A jagged array doesn’t use a comma to identify a new dimension. Rather, a jagged array defines an array of arrays. In Listing 2.43, [] is placed after the data type int[], thereby declaring an array of type int[].

Notice that a jagged array requires an array instance (or null) for each internal array. In this example, you use new to instantiate the internal element of the jagged arrays. Leaving out the instantiation would cause a compile error.

Using an Array

You access a specific item in an array using the square bracket notation, known as the array accessor. To retrieve the first item from an array, you specify zero as the index. In Listing 2.44, the value of the fifth item (using the index 4 because the first item is index 0) in the languagesvariable is stored in the variable language.

LISTING 2.44: Declaring and Accessing an Array


string[] languages = new string[9]{
"C#", "COBOL", "Java",
"C++", "Visual Basic", "Pascal",
"Fortran", "Lisp", "J#"};
// Retrieve fifth item in languages array (Java)
string language = languages[4];


The square bracket notation is also used to store data into an array. Listing 2.45 switches the order of "C++" and "Java".

LISTING 2.45: Swapping Data between Positions in an Array


string[] languages = new string[9]{
"C#", "COBOL", "Java",
"C++", "Visual Basic", "Pascal",
"Fortran", "Lisp", "J#"};
// Save "C++" to variable called language.
string language = languages[3];
// Assign "Java" to the C++ position.
languages[3] = languages[2];
// Assign language to location of "Java".
languages[2] = language;


For multidimensional arrays, an element is identified with an index for each dimension, as shown in Listing 2.46.

LISTING 2.46: Initializing a Two-Dimensional Array of Integers


int[,] cells = {
{1, 0, 2},
{0, 2, 0},
{1, 2, 1}
};
// Set the winning tic-tac-toe move to be player 1.
cells[1,0] = 1;


Jagged array element assignment is slightly different because it is consistent with the jagged array declaration. The first element is an array within the array of arrays; the second index specifies the item within the selected array element (see Listing 2.47).

LISTING 2.47: Declaring a Jagged Array


int[][] cells = {
new int[]{1, 0, 2},
new int[]{0, 2, 0},
new int[]{1, 2, 1}
};

cells[1][0] = 1;
// ...


Length

You can obtain the length of an array, as shown in Listing 2.48.

LISTING 2.48: Retrieving the Length of an Array


Console.WriteLine(
$"There are { languages.Length } languages in the array.");


Arrays have a fixed length; they are bound such that the length cannot be changed without re-creating the array. Furthermore, overstepping the bounds (or length) of the array will cause the runtime to report an error. This can occur when you attempt to access (either retrieve or assign) the array with an index for which no element exists in the array. Such an error frequently occurs when you use the array length as an index into the array, as shown in Listing 2.49.

LISTING 2.49: Accessing Outside the Bounds of an Array, Throwing an Exception


string languages = new string[9];
...
// RUNTIME ERROR: index out of bounds – should
// be 8 for the last element
languages[4] = languages[9];



Note

The Length member returns the number of items in the array, not the highest index. The Length member for the languages variable is 9, but the highest index for the languages variable is 8, because that is how far it is from the start.


It is a good practice to use Length in place of the hardcoded array size. To use Length as an index, for example, it is necessary to subtract 1 to avoid an out-of-bounds error (see Listing 2.50).

LISTING 2.50: Using Length - 1 in the Array Index


string languages = new string[9];
...
languages[4] = languages[languages.Length - 1];


To avoid overstepping the bounds on an array, use a length check to verify that the array has a length greater than 0 and use Length – 1 in place of a hardcoded value when accessing the last item in the array (see Listing 2.50).

Length returns the total number of elements in an array. Therefore, if you had a multidimensional array such as bool cells[,,] of size 2 × 3 × 3, Length would return the total number of elements, 18.

For a jagged array, Length returns the number of elements in the first array. Because a jagged array is an array of arrays, Length evaluates only the outside, containing array and returns its element count, regardless of what is inside the internal arrays.


Language Contrast: C++—Buffer Overflow Bugs

Unmanaged C++ does not always check whether you overstep the bounds on an array. Not only can this be difficult to debug, but making this mistake can also result in a potential security error called a buffer overrun. In contrast, the Common Language Runtime protects all C# (and Managed C++) code from overstepping array bounds, virtually eliminating the possibility of a buffer overrun issue in managed code.


More Array Methods

Arrays include additional methods for manipulating the elements within the array—for example, Sort(), BinarySearch(), Reverse(), and Clear() (see Listing 2.51).

LISTING 2.51: Additional Array Methods


class ProgrammingLanguages
{
static void Main()
{
string[] languages = new string[]{
"C#", "COBOL", "Java",
"C++", "Visual Basic", "Pascal",
"Fortran", "Lisp", "J#"};

System.Array.Sort(languages);

string searchString = "COBOL";
int index = System.Array.BinarySearch(
languages, searchString);
System.Console.WriteLine(
"The wave of the future, "
+ $"{ searchString }, is at index { index }.");

System.Console.WriteLine();
System.Console.WriteLine(
$"{ "First Element",-20 }\t{ "Last Element",-20 }");
System.Console.WriteLine(
$"{ "-------------",-20 }\t{ "------------",-20 }");
System.Console.WriteLine(
$"{ languages[0],-20 }\t{ languages[languages.Length-1],-20 }");
System.Array.Reverse(languages);
System.Console.WriteLine(
$"{ languages[0],-20 }\t{ languages[languages.Length-1],-20 }");
// Note this does not remove all items from the array.
// Rather it sets each item to the type's default value.
System.Array.Clear(languages, 0, languages.Length);
System.Console.WriteLine(
$"{ languages[0],-20 }\t{ languages[languages.Length-1],-20 }");
System.Console.WriteLine(
$"After clearing, the array size is: { languages.Length }");
}
}


The results of Listing 2.51 are shown in Output 2.20.

OUTPUT 2.20

The wave of the future, COBOL, is at index 2.

First Element Last Element
------------- ------------
C# Visual Basic
Visual Basic C#

After clearing, the array size is: 9

Access to these methods is obtained through the System.Array class. For the most part, using these methods is self-explanatory, except for two noteworthy items:

• Before using the BinarySearch() method, it is important to sort the array. If values are not sorted in increasing order, the incorrect index may be returned. If the search element does not exist, the value returned is negative. (Using the complement operator, ~index, returns the first index, if any, that is larger than the searched value.)

• The Clear() method does not remove elements of the array and does not set the length to zero. The array size is fixed and cannot be modified. Therefore, the Clear() method sets each element in the array to its default value (false, 0, or null). This explains whyConsole.WriteLine() creates a blank line when writing out the array after Clear() is called.


Language Contrast: Visual Basic—Redimensioning Arrays

Visual Basic includes a Redim statement for changing the number of items in an array. Although there is no equivalent C#-specific keyword, there is a method available in .NET 2.0 that will re-create the array and then copy all the elements over to the new array. This method is called System.Array.Resize.


Array Instance Methods

Like strings, arrays have instance members that are accessed not from the data type, but directly from the variable. Length is an example of an instance member because access to Length is through the array variable, not the class. Other significant instance members are GetLength(),Rank, and Clone().

Retrieving the length of a particular dimension does not require the Length property. To retrieve the size of a particular rank, an array includes a GetLength() instance method. When calling this method, it is necessary to specify the rank whose length will be returned (see Listing 2.52).

LISTING 2.52: Retrieving a Particular Dimension’s Size


bool[,,] cells;
cells = new bool[2,3,3];
System.Console.WriteLine(cells.GetLength(0)); // Displays 2


The results of Listing 2.52 appear in Output 2.21.

OUTPUT 2.21

2

Listing 2.52 displays 2 because that is the number of elements in the first dimension.

It is also possible to retrieve the entire array’s rank by accessing the array’s Rank member. cells.Rank, for example, will return 3.

By default, assigning one array variable to another copies only the array reference, not the individual elements of the array. To make an entirely new copy of the array, use the array’s Clone() method. The Clone() method will return a copy of the array; changing any of the members of this new array will not affect the members of the original array.

Strings As Arrays

Variables of type string are accessible like an array of characters. For example, to retrieve the fourth character of a string called palindrome you can call palindrome[3]. Note, however, that because strings are immutable, it is not possible to assign particular characters within a string. C#, therefore, would not allow palindrome[3]='a', where palindrome is declared as a string. Listing 2.53 uses the array accessor to determine whether an argument on the command line is an option, where an option is identified by a dash as the first character.

LISTING 2.53: Looking for Command-Line Options


string[] args;
...
if(args[0][0] == '-')
{
//This parameter is an option
}


This snippet uses the if statement, which is covered in Chapter 3. In addition, it presents an interesting example because you use the array accessor to retrieve the first element in the array of strings, args. Following the first array accessor is a second one, which retrieves the first character of the string. The code, therefore, is equivalent to that shown in Listing 2.54.

LISTING 2.54: Looking for Command-Line Options (Simplified)


string[] args;
...
string arg = args[0];
if(arg[0] == '-')
{
//This parameter is an option
}


Not only can string characters be accessed individually using the array accessor, but it is also possible to retrieve the entire string as an array of characters using the string’s ToCharArray() method. Using this approach, you could reverse the string with theSystem.Array.Reverse() method, as demonstrated in Listing 2.55, which determines whether a string is a palindrome.

LISTING 2.55: Reversing a String


class Palindrome
{
static void Main()
{
string reverse, palindrome;
char[] temp;

System.Console.Write("Enter a palindrome: ");
palindrome = System.Console.ReadLine();

// Remove spaces and convert to lowercase
reverse = palindrome.Replace(" ", "");
reverse = reverse.ToLower();

// Convert to an array
temp = reverse.ToCharArray();

// Reverse the array
System.Array.Reverse(temp);

// Convert the array back to a string and
// check if reverse string is the same.
if(reverse == new string(temp))
{
System.Console.WriteLine(
$"\"{palindrome}\" is a palindrome.");
}
else
{
System.Console.WriteLine(
$"\"{palindrome}\" is NOT a palindrome.");
}
}
}


The results of Listing 2.55 appear in Output 2.22.

OUTPUT 2.22

Enter a palindrome: NeverOddOrEven
"NeverOddOrEven" is a palindrome.

This example uses the new keyword; this time, it creates a new string from the reversed array of characters.

Common Array Errors

This section introduced the three types of arrays: single-dimension, multidimensional, and jagged arrays. Several rules and idiosyncrasies govern array declaration and use. Table 2.8 points out some of the most common errors and helps solidify the rules. Try reviewing the code in the Common Mistake column first (without looking at the Error Description and Corrected Code columns) as a way of verifying your understanding of arrays and their syntax.

Image

Image

TABLE 2.8: Common Array Coding Errors

Summary

Even for experienced programmers, C# introduces several new programming constructs. For example, as part of the section on data types, this chapter covered the type decimal, which can be used to perform financial calculations without floating point anomalies. In addition, the chapter introduced the fact that the Boolean type, bool, does not convert implicitly to or from the integer type, thereby preventing the mistaken use of the assignment operator in a conditional expression. Other characteristics of C# that distinguish it from many of its predecessors are the @ verbatim string qualifier, which forces a string to ignore the escape character, and the immutable nature of the string data type.

C# permits both implicit conversions and explicit conversions (that is, conversions that require a cast operation) to convert expressions to a given data type. In Chapter 9, you will learn how to define customized conversion operators on your own types.

This chapter closed with coverage of C# syntax for arrays, along with the various means of manipulating arrays. For many developers, the syntax can seem rather daunting at first, so the section included a list of the common errors associated with coding arrays.

The next chapter looks at expressions and control flow statements. The if statement, which appeared a few times toward the end of this chapter, is discussed as well.