Using Strings and Regular Expressions - Programming in C# - Sams Teach Yourself C# 5.0 in 24 Hours (2013)

Sams Teach Yourself C# 5.0 in 24 Hours (2013)

Part II: Programming in C#

Hour 9. Using Strings and Regular Expressions


What You’ll Learn in This Hour

Strings

Mutable strings using StringBuilder

Type formatting

Regular expressions


As computer programming has evolved from being primarily concerned with performing complex numeric computations to providing solutions for a broader range of business problems, programming languages have shifted to focus more on string data and the manipulation of such data. String data is simply a logical sequence of individual characters. The System.String class, which encapsulates the data manipulation, sorting, and searching methods you most commonly perform on strings, enables C# to provide rich support for string data and manipulation.


Tip: String or string?

In C#, string is an alias for System.String, so they are equivalent. Use whichever naming convention you prefer, although the common use is to use string when referring to the data type and String when accessing static members of the class.


In this hour, you learn to work with strings in C#, including how to manipulate and concatenate strings, extract substrings, and build new strings. After you understand the basics, you learn how to work with regular expressions to perform more complex pattern matching and manipulation.

Strings

A string in C# is an immutable sequence of Unicode characters that cannot be modified after creation. Strings are most commonly created by declaring a variable of type string and assigning to it a quoted string of characters, known as a string literal, as shown here:

string myString = "Now is the time.";


Note: String Interning

If you have two identical string literals in the same assembly, the runtime only creates one string object for all instances of that literal within the assembly. This process, called string interning, is used by the C# compiler to eliminate duplicate string literals, saving memory space at runtime and decreasing the time required to perform string comparisons.

String interning can sometimes have unexpected results when comparing string literals using the equality operator:

object obj = "String";
string string1 = "String";
string string2 = typeof(string).Name;

Console.WriteLine(string1 == string2); // true
Console.WriteLine(obj == string1); // true
Console.WriteLine(obj == string2); // false

The first comparison is testing for value equality, meaning it is testing to see if the two strings have the same content. The second and third comparisons use reference equality because you are comparing an object and a string. If you were to enter this code in a program, you would see two warnings about a “Possible Unintended Reference Comparison” that further tells you to “Cast the Left Hand Side to Type 'string'” to get a value comparison.

Because string interning applies only to literal string values, the value of string2 is not interned because it isn’t a literal. This means that obj and string2 actually refer to different objects in memory, so the reference equality fails.


These string literals can include special escape sequences to indicate nonprinting characters, such as a tab or new line that begin with the backslash character (\). If you want to include the backslash character as part of the string literal, it must also be escaped. Table 9.1 lists the defined C# character escape sequences.

Table 9.1. C# Character Escape Sequences

Image

Another option for creating string literals are verbatim string literals, which start with the @ symbol before the opening quote. The benefit of verbatim string literals is that the compiler treats the string exactly as it is written, even if it spans multiple lines or includes escape characters. Only the double-quote character must be escaped, by including two double-quote characters, in verbatim string literals so that the compiler knows where the string ends.

When the compiler encounters a verbatim string literal, it translates that literal in to the properly escaped string literal. Listing 9.1 shows four different strings. The first two declarations are equivalent, although the verbatim string literal is generally easier to read. The second two declarations are also equivalent, where multipleLines2 represents the translated string literal.

Listing 9.1. String Literals


string stringLiteral = "C:\\Program Files\\Microsoft Visual Studio 10\\VC#";
string verbatimLiteral = @"C:\Program Files\Microsoft Visual Studio 10\VC#";

string multipleLines = @"This is a ""line"" of text.
And this is the second line.";
string multipleLines2 =
"This is a \"line\" of text.\nAnd this is the second line.";



Note: The ToString Method

Strings can also be created by calling the ToString method. Because ToString is declared by System.Object, every object is guaranteed to have it; although, the default implementation is to simply return the name of the class. All the predefined data types override ToString to provide a meaningful string representation.


Empty Strings

An empty string is different from an unassigned string variable (which is null) and is a string containing no characters between the quotes ("").


Note: String.Empty or ""

There is no practical difference between "" and String.Empty, so which one you choose ultimately depends on personal preference, although String.Empty is generally easier to read.


The fastest and simplest way to determine if a string is empty is to test if the Length property is equal to 0. However, because strings are reference types, it is possible for a string variable to be null, which would result in a runtime error when you tried to access the Length property. Because testing to determine if a string is empty is such a common occurrence, C# provides the static method String.IsNullOrEmpty, shown in Listing 9.2.

Listing 9.2. The String.IsNullOrEmpty Method


public static bool IsNullOrEmpty(string value)
{
if (value != null)
{
return (value.Length == 0);
}

return true;
}


It is also common to consider a string that contains only whitespace characters as an empty string as well. You can use the static String.IsNullOrWhiteSpace method, shown in Listing 9.3.

Listing 9.3. The String.IsNullOrWhiteSpace Method


public static bool IsNullOrWhiteSpace(string value)
{
if (value != null)
{
for (int i = 0; i < value.Length; i++)
{
if (!char.IsWhiteSpace(value[i]))
{
return false;
}
}
}

return true;
}


Using either String.IsNullOrEmpty or String.IsNullOrWhiteSpace helps ensure correctness, readability, and consistency, so they should be used in all situations where you need to determine if a string is null, empty, or contains only whitespace characters.

String Manipulation

The System.String class provides a rich set of methods and properties for interacting with and manipulating strings. In fact, System.String defines more than 40 different public members.

Even though strings are a first-class data type and string data is usually manipulated as a whole, a string is still composed of individual characters. You can use the Length property to determine the total number of characters in the string. Unlike strings in other languages, such as C and C++, strings in C# do not include a termination character. Because strings are composed of individual characters, it is possible to access specific characters by position as if the string were an array of characters.

Working with Substrings

A substring is a smaller string contained within the larger original value. Several methods provided by System.String enable you to find and extract substrings.

To extract a substring, the String class provides an overloaded Substring method, which enables you to specify the starting character position and, optionally, the length of the substring to extract. If you don’t provide the length, the resulting substring ends at the end of the original string.

The code in Listing 9.4 creates two substrings. The first substring will start at character position 10 and continue to the end of the original string, resulting in the string “brown fox”. The second substring results in the string “quick”.

Listing 9.4. Working with Substrings


string original = "The quick brown fox";
string substring = original.Substring(10);
string substring2 = original.Substring(4, 5);


Extracting substrings in this manner is a flexible approach, especially when combined with other methods enabling you to find the position of specific characters within a string.

The IndexOf and LastIndexOf methods report the index of the first and last occurrence, respectively, of the specified character or string. If you need to find the first or last occurrence of any character in a given set of characters, you can use one of the IndexOfAny or LastIndexOfAny overloads, respectively. If a match is found, the index (or more intuitively, the offset) position of the character or start of the matched string is returned; otherwise, the value -1 is returned. If the string or character you are searching for is empty, the value 0 is returned.


Caution: Zero-Based Counting

When accessing a string by character position, as the IndexOf, LastIndexOf, IndexOfAny, and LastIndexOfAny methods do, C# starts counting at 0 not 1. This means that the first character of the string is at index position 0. A better way to think about these methods is that they return an offset from the beginning of the string.



Try It Yourself: Working with Substrings

To implement the code shown in Listing 9.4 and see how to create substrings, follow these steps. Keep Visual Studio open at the end of this exercise because you will use this application later.

1. Create a new console application.

2. In the Main method of the Program.cs file, enter the statements shown in Listing 9.4, followed by statements to print the value of each string.

3. Run the application using Ctrl+F5 and observe that the output matches what is shown in Figure 9.1.

Image

Figure 9.1. Results of working with substrings.

4. Modify the two substring calls to use IndexOf and IndexOfAny, respectively, to produce the same output as shown in Figure 9.1.


String Comparison

To perform string comparisons to determine if one string is equal to or contains another string, you can use the Compare, CompareOrdinal, CompareTo, Contains, Equals, EndsWith, and StartsWith methods.

There are 10 different overloaded versions of the static Compare method, enabling you to control everything from case sensitivity, culture rules used to perform the comparison, starting positions of both strings being compared, and the maximum number of characters in the strings to compare.


Caution: String Comparison Rules

By default, string comparisons using any of the Compare methods are performed in a case-sensitive, culture-aware manner. Comparisons using the equality (==) operator are always performed using ordinal comparison rules.


You can also use the static CompareOrdinal overloads (of which there are only two) if you want to compare strings based on the numeric ordinal values of each character, optionally specifying the starting positions of both strings and the maximum number of characters in the strings to compare.

The CompareTo method compares the current string with the specified one and returns an integer value indicating whether the current string precedes, follows, or appears in the same position in the sort order as the specified string.

The Contains method searches using ordinal sorting rules, and enables you to determine if the specified string exists within the current string. If the specified string is found or is an empty string, the method returns true.


Note: Changing Case

Even though the string comparison methods enable ways to perform string comparisons that are not case sensitive, you can also convert strings to an all-uppercase or all-lowercase representation. This is useful for string comparisons but also for standardizing the representation of string data.


The StartsWith and EndsWith methods (there are a total of six) determine if the beginning or ending of the current string matches a specified string. Just as with the Compare method, you can optionally indicate what culture rules should be used and if the search should be case sensitive.


Try It Yourself: String Comparison

To perform different string comparisons, follow these steps. If you closed Visual Studio, repeat the previous exercise first. Be sure to keep Visual Studio open at the end of this exercise because you will use this application later.

1. In the Main method of the Program.cs file, enter the following statements:

Console.WriteLine(original.StartsWith("quick"));
Console.WriteLine(substring2.StartsWith("quick"));
Console.WriteLine(substring.EndsWith("fox"));
Console.WriteLine(original.CompareTo(original));
Console.WriteLine(String.Compare(substring2, "Quick"));
Console.WriteLine(original.Contains(substring2));

2. Run the application using Ctrl+F5 and observe that the output matches what is shown in Figure 9.2.

Image

Figure 9.2. Results of performing with string comparisons.


The standard way to normalize case is to use the ToUpperInvariant method, which creates an all-uppercase representation of the string using the casing rules of the invariant culture. To create an all-lowercase representation, it is preferred that you use the ToLowerInvariant method, which uses the casing rules of the invariant culture. In addition to the invariant methods, you can also use the ToUpper and ToLower methods, which use the casing rules of either the current culture or the specified culture, depending on which overload you use.

Modifying Parts of a String

Although performing string comparisons is common, sometimes you need to modify all or part of a string. Because strings are immutable, these methods actually return a new string rather than modifying the current one.

To remove whitespace and other characters from a string, you can use the Trim, TrimEnd, or TrimStart methods. TrimEnd and TrimStart remove whitespace from either the end or beginning of the current string, respectively, whereas Trim removes from both ends.

To expand, or pad, a string to be a specific length, you can use the PadLeft or PadRight methods. By default, these methods pad using spaces, but they both have an overload that enables you to specify the padding character to use.

The String class also provides a set of overloaded methods enabling you to create new strings by removing or replacing characters from an existing string. The Remove method deletes all the characters from a string starting at a specified character position and either continues through the end of the string or for a specified number of positions. The Replace method simply replaces all occurrences of the specified character or string with another character or string by performing an ordinal search that is case sensitive but not culture sensitive.


Try It Yourself: Modifying Parts of a String

To use the Replace and Remove methods, follow these steps:

1. In the Main method of the Program.cs file, remove all the Console.WriteLine statements, leaving only the string variable declarations.

2. Enter Console.WriteLine statements that will print the string created by:

• Replacing all 'o' characters with 'x' characters in original

• Removing all characters after index position 4 in original

• Removing five characters after index position 4 in original

3. After each Console.WriteLine statement you entered, enter another Console.WriteLine statement that prints the current value of original.

4. Run the application using Ctrl+F5, and observe that the output matches what is shown in Figure 9.3.

Image

Figure 9.3. Results of string modification.


String Concatenation, Joining, and Splitting

You have already seen several ways to create new strings from string literals and from substrings, but you can also create new strings by combining existing strings in a process called string concatenation.

String concatenation typically occurs two different ways. The most common is to use the overloaded addition (+) operator to combine two or more strings. You can also use one of the nine different overloads of the Concat method, which enables you to concatenate an unlimited number of strings. Both of these methods are shown in Listing 9.5.


Tip: String Concatenation Using Addition

The addition operator actually just calls the appropriate overload of the Concat method.


Listing 9.5. String Concatenation


string string1 = "this is " + " basic concatenation.";
string string2 = String.Concat("this", "is", "more", "advanced", "concatenation");


Closely related to concatenation is the idea of joining strings, which uses the Join method. Unlike the Concat method, Join concatenates a specified separator between each element of a given set of strings.

If joining a string combines a set of strings, the opposite is splitting a string into an undetermined number of substrings based on a delimiting character, accomplished by using the Split method, which accepts either a set of characters or a set of strings for the delimiting set.

Listing 9.6 shows an example of joining a string and then splitting it based on the same delimiting character. First, an array of 10 strings is created and then joined using # as the separator character. The resulting string is then split on the same separator character and each word is printed on a separate line.

Listing 9.6. Joining and Splitting Strings


string[] strings = new string[10];
for (int i = 0; i < 10; i++)
{
strings[i] = String.Format("{0}", i * 2);
}

string joined = String.Join("#", strings);
Console.WriteLine(joined);

foreach (string word in joined.Split(new char[] { '#' }))
{
Console.WriteLine(word);
}


Mutable Strings Using StringBuilder

Because strings are immutable, every time you perform any string manipulation, you create new temporary strings. To allow mutable strings to be created and manipulated without creating new strings, C# provides the StringBuilder class. Using string concatenation is preferred if a fixed number of strings will be concatenated. If an arbitrary number of strings will be concatenated, such as inside an iteration statement, a StringBuilder is preferred.

StringBuilder supports appending data to the end of the current string, inserting data at a specified position, replacing data, and removing characters from the current string. Appending data uses one of the overloads of either the Append or AppendFormat methods.

The Append method adds text or a string representation of an object to the end of the current string. The AppendFormat method supports adding text to the end of the current string by using composite formatting. Because AppendFormat uses composite formatting, you can pass it a format string, which you learn about in the next section.


Go To

We discuss composite formatting a bit later in this hour.


Listing 9.7 shows the same example of joining and splitting strings shown in Listing 9.6 but uses a StringBuilder rather than a string array.

Listing 9.7. Using StringBuilder


StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < 10; i++)
{
stringBuilder.AppendFormat("{0}#", i * 2);
}

// Remove the trailing '#' character.
stringBuilder.Remove(stringBuilder.Length - 1, 1);

string joined = String.Join("#", stringBuilder.ToString());
Console.WriteLine(joined);

foreach (string word in joined.Split(new char[] { '#' }))
{
Console.WriteLine(word);
}


To insert data, one of the Insert overloads should be used. When you insert data, you must provide the position within the current StringBuilder string where the insertion will begin. To remove data, you use the Remove method and indicate the starting position where the removal begins and the number of characters to remove. The Replace method, or one of its overloads, can be used to replace characters within the current StringBuilder string with another specified character. The Replace method also supports replacing characters within a substring of the current StringBuilderstring, specified by a starting position and length.


Note: StringBuilder Capacity

Internally, the StringBuilder string is maintained in a buffer to accommodate concatenation. The StringBuilder needs to allocate additional memory only if the buffer does not have enough room to accommodate the new data.

The default size, or capacity, for this internal buffer is 16 characters. When the buffer reaches capacity, additional buffer space is allocated for an additional amount of characters as specified by the capacity. StringBuilder also has a maximum capacity, which isInt32.MaxValue, or 231, characters.

The length of the current string can be set using the Length property. By setting the Length to a value that is larger than the current capacity, the capacity is automatically changed. Similarly, by setting the Length to a value that is less than the current capacity, the current string is shortened.


Type Formatting

Formatting allows you to convert an instance of a class, structure, or enumeration value to a string representation. Every type that derives from System.Object automatically inherits a parameterless ToString method, which, by default, returns the name of the type. All the predefined value types have overridden ToString to return a general format for the type.

Because getting the name of the type from ToString isn’t generally useful, you can override the ToString method and provide a meaningful string representation of your type. Listing 9.8 shows the Contact class overriding the ToString method.

Listing 9.8. Overriding ToString


class Contact
{
private string firstName;
private string lastName;

public override string ToString()
{
return firstName + " " + lastName;
}
}



Caution: Overriding ToString

Before you start adding ToString overrides to all your classes, be aware that the Visual Studio debugging tools use ToString extensively to determine what values to display for an object when viewed through a debugger.


The value of an object often has multiple representations, and ToString enables you to pass a format string as a parameter that determines how the string representation should appear. A format string contains one or more format specifiers that define how the string representation should appear.

Standard Format Strings

A standard format string contains a single format specifier, which is a single character that defines a more complete format string, and an optional precision specifier that affects the number of digits displayed in the result. If supported, the precision specifier can be any integer value from 0 to 99. All the numeric types, date and time types, and enumeration types support a set of predefined standard format strings, including a "G" standard format specifier, which represents a general string representation of the value.

The standard format specifiers are shown in Table 9.2.

Table 9.2. Standard Format Specifiers

Image

Image

Image

Using the standard format strings to format a Days enumeration value is shown in Listing 9.9.

Listing 9.9. Standard Format Strings


Days days = Days.Monday;
string[] formats = { "G", "F", "D", "X" };
foreach (string format in formats)
{
Console.WriteLine(days.ToString(format));
}


Just as you can override the ToString method, you can define standard format specifiers for your own classes as well by defining a ToString(string) method, which should support the following:

• A "G" format specifier that represents a common format. Your override of the parameterless ToString method should simply call ToString(string) and pass it the "G" standard format string.

• A format specifier that is equal to a null reference that should be considered equivalent to the "G" format specifier.

Listing 9.10 shows an updated Celsius struct from Hour 6, “Creating Enumerated Types and Structures,” that supports format specifiers to represent the value in degrees Fahrenheit and degrees Kelvin.

Listing 9.10. Overriding ToString to Support the Standard Format Strings


struct Celsius
{
public float Degrees;

public Celsius(float temperature)
{
this.Degrees = temperature;
}

public override string ToString()
{
return this.ToString("C");
}

public string ToString(string format)
{
if (String.IsNullOrWhiteSpace(format))
{
format = "C";
}

format = format.ToUpperInvariant().Trim();

switch(format)
{
case "C":
return this.Degrees.ToString("N2") + " °C";

case "F":
return (this.Degrees * 9 / 5 + 32).ToString("N2") + " °F";

case "K":
return (this.Degrees + 273.15f).ToString("N2") + " °K";

default:
throw new FormatException();
}
}
}


Custom Format Strings

Custom format strings consist of one or more custom format specifiers that define the string representation of a value. If a format string contains a single custom format specifier, it should be preceded by the percent (%) symbol so that it won’t be confused with a standard format specifier.

All the numeric types and the date and time types support custom format strings. Many of the standard date and time format strings are aliases for custom format strings. Using custom format strings also provides a great deal of flexibility by enabling you to define your own formats by combining multiple custom format specifiers.

The custom format specifiers are described in Table 9.3.

Table 9.3. Custom Format Specifiers

Image

Image

Image

Listing 9.11 displays a DateTime instance using two different custom format strings.

Listing 9.11. Custom Format Strings


DateTime date = new DateTime(2013, 3, 22);

// Displays 3
Console.WriteLine(date.ToString("%M"));

// Displays Monday March 22, 2013
Console.WriteLine(date.ToString("dddd MMMM dd, yyyy"));


Composite Formatting

You have already seen composite formatting in some of the previous examples using Console.WriteLine and StringBuilder.AppendFormat. Methods that use composite formatting accept a composite format string and a list of objects as parameters. A composite format string defines a template consisting of fixed text and indexed placeholders, called format items, which correspond to the objects in the list. Composite formatting does not allow you to specify more format items than there are objects in the list, although you can include more objects in the list than there are format items.

The syntax for a format item is as follows:

{index[,alignment][:formatString]}

The matching curly braces and index are required.

The index corresponds to the position of the object it represents in the method’s parameter list. Indexes are zero-based but multiple format items can use the same index, and format items can refer to any object in the list, in any order.

The optional alignment component indicates the preferred field width. A positive value produces a right-aligned field, whereas a negative value produces a left-aligned field. If the value is less than the length of the formatted string, the alignment component is ignored.

The formatString component uses either the standard or custom format strings you just learned. If the formatString is not specified, the general format specifier "G" is used instead.

In Listing 9.12, the first format item, {0:D}, is replaced by the string representation of date and the second format item {1} is replaced by the string representation of temp.

Listing 9.12. Composite Formatting


Celsius temp = new Celsius(28);

// Using composite formatting with String.Format.
string result = String.Format("On {0:d}, the high temperature was {1}.",
DateTime.Today, temp);
Console.WriteLine(result);

// Using composite formatting with Console.WriteLine.
Console.WriteLine("On {0:dddd MMM dd, yyyy}, the high temperature was {1}.",
DateTime.Today, temp);


Regular Expressions

Often referred to as patterns, a regular expression describes a set of strings. A regular expression is applied to a string to find out if the string matches the provided pattern, to return a substring or a collection of substrings, or to return a new string that represents a modification of the original.


Tip: Regular Expression Compatibility

Regular expressions in the .NET Framework are designed to be compatible with Perl 5 regular expressions, incorporating the most popular features of other regular expression implementations, such as Perl and awk, and including features not yet seen in other implementations.


Regular expressions are a programming language in their own right and are designed and optimized for text manipulation by using both literal text characters and metacharacters. A literal character is one that should be matched in the target string, whereas metacharacters inform the regular expression parser, which is responsible for interpreting the regular expression and applying it to the target string, how to behave, so you can think of them as commands. These metacharacters give regular expressions their flexibility and processing power. The common metacharacters used in regular expressions are described in Table 9.4.

Table 9.4. Common Regular Expression Metacharacters

Image

The Regular Expression Classes in C#

Regular expressions are implemented in the .NET Framework by several classes in the System.Text.RegularExpression namespace that provide support for parsing and applying regular expression patterns and working with capturing groups.

The Regex Class

The Regex class provides the implementation of the regular expression parser and the engine that applies that pattern to an input string. Using this class, you can quickly parse large amounts of text for specific patterns and easily extract and edit substrings.

The Regex class provides both instance and static members, allowing it to be used two different ways. When you create specific instances of the Regex class, the expression patterns are not compiled and cached. However, by using the static methods, the expression pattern is compiled and cached. The regular expression engine caches the 15 most recently used static regular expressions by default. You might prefer to use the static methods rather than the equivalent instance methods if you extensively use a fixed set of regular expressions.

The Match and MatchCollection Classes

When a regular expression is applied to a string using the Match method of the Regex class, the first successful match found is represented by an instance of the Match class. The MatchCollection contains the set of Matches found by repeatedly applying the regular expression until the first unsuccessful match occurs.

The Group and Capture Classes

The Match.Groups property represents the collection of captured groups in a single match. Each group is represented by the Group class, which contains a collection of Capture objects returned by the Captures property. A Capture represents the results from a single subexpression match.

String Validation Using Regular Expressions

One of the most common uses of regular expressions is to validate a string by testing if it conforms to a particular pattern. To accomplish this, you can use one of the overloads for the IsMatch method. Listing 9.13 shows using a regular expression to validate United States Zip+4 postal codes.

Listing 9.13. Validation Using Regular Expressions


string pattern = @"^\d{5}(-\d{4})?$";
Regex expression = new Regex(pattern);

Console.WriteLine(expression.IsMatch("90210")); // true
Console.WriteLine(expression.IsMatch("00364-3276")); // true
Console.WriteLine(expression.IsMatch("3361")); // false
Console.WriteLine(expression.IsMatch("0036-43275")); // false
Console.WriteLine(expression.IsMatch("90210-")); // false


Using Regular Expressions to Match Substrings

Regular expressions can also be used to search for substrings that match a particular regular expression pattern. This searching can be performed once, in which case the first occurrence is returned, or it can be performed repeatedly, in which case a collection of occurrences is returned.

Searching for substrings in this manner uses the Match method to find the first occurrence matching the pattern or the Matches method to return a sequence of successful nonoverlapping matches.

Summary

Continuing to move further away from the foundational aspects of programming, in this hour, you learned how C# enables you to work with string data. This included how to perform string comparisons, create mutable strings, and how to use regular expressions.

Q&A

Q. Are strings immutable?

A. Yes, strings in C# are immutable, and any operation that modifies the content of a string actually creates a new string with the changed value.

Q. What does the @ symbol mean when it precedes a string literal?

A. The @ symbol is the verbatim string symbol and causes the C# compiler to treat the string exactly as it is written, even if it spans multiple lines or includes special characters.

Q. What are the common string manipulation functions supported by C#?

A. C# supports the following common string manipulations:

• Determining the string length

• Trimming and padding strings

• Creating substrings, concatenating strings, and splitting strings based on specific characters

• Removing and replacing characters

• Performing string containment and comparison operations

• Converting string case

Q. What is the benefit of the StringBuilder class?

A. The StringBuilder class enables you to create and manipulate a string that is mutable and is most often used for string concatenation inside a loop.

Q. What are regular expressions?

A. Regular expressions are a pattern that describes a set of strings that are optimized for text manipulation.

Workshop

Quiz

1. What is string interning and why is it used?

2. Using a verbatim string literal, must an embedded double-quote character be escaped?

3. What is the recommended way to test for an empty string?

4. What is the difference between the IndexOf and IndexOfAny methods?

5. Do any of the string manipulation functions result in a new string being created?

6. What will the output of the following statement be and why?

int i = 10;
Console.WriteLine(i.ToString("P"));

7. What will the output of the following statement be and why?

DateTime today = new DateTime(2009, 8, 23);
Console.WriteLine(today.ToString("MMMM"));

8. What will the output of the following Console.WriteLine statements be?

Console.WriteLine("¦{0}¦", 10);
Console.WriteLine("¦{0, 3}¦", 10);
Console.WriteLine("¦{0:d4}¦", 10);

int a = 24;
int b = -24;

Console.WriteLine(a.ToString("##;(##)"));
Console.WriteLine(b.ToString("##;(##)"));

9. What is the benefit of using a StringBuilder for string concatenation inside a loop?

10. What does the following regular expression pattern mean?

[\w-]+@([\w-]+\.)+[\w-]+

Answers

1. String interning is used by the C# compiler to eliminate duplicate string literals to save space at runtime.

2. The double-quote character is the only character that must be escaped using a verbatim string literal so that the compiler can determine where the string ends.

3. The recommended way to test for an empty string is to use the static String.IsNullOrEmpty method.

4. The IndexOf method reports the index of the first occurrence found of a single character or string, whereas the IndexOfAny method reports the first occurrence found of any character in a set of characters.

5. Yes, because strings are immutable, all the string manipulation functions result in a new string being created.

6. The output will be “1,000.00 %” because the "P" numeric format specifier results in the number being multiplied by 100 and displayed with a percent symbol.

7. The "MMMM" custom date and time format specifier represents the full name of the month, so the output will be “August”.

8. The output will be as follows:

¦10¦
¦ 10¦
¦0010¦
24
(24)

9. Because the StringBuilder represents a mutable string, using it for string concatenation inside a loop prevents multiple temporary strings from being created during each iteration to perform the concatenation.

10. This is a simple regular expression for parsing a string as an email address. Broken down, it means “Match any word character one or more times followed by the @ character followed by a group containing any word character one or more times followed by a period (.) character, where that group is repeated one or more times, followed by any word character one or more times.”

Exercises

1. Create a new console application and implement the Celsius struct shown in Listing 9.10. In the Main method, implement the code shown in Listing 9.12.

2. Create a new console application and in the Main method implement the code shown in Listing 9.13. Then implement a Validate method that is called from the Main method. The Validate method should use the static RegEx.IsMatch method to validate a string parameter as a phone number. The necessary regular expression pattern to match a phone number in the form of 555-555-5555 should look like:

^[2-9]\d{2}-\d{3}-\d{4}$