Strings and Characters - FUNDAMENTALS - Understanding Swift Programming: Swift 2 (2015)

Understanding Swift Programming: Swift 2 (2015)

PART 1: FUNDAMENTALS

7. Strings and Characters

Swift handles text by the use of strings and characters. A string in Swift is an ordered sequence of characters, and has the type String.

A variable or constant that contains a single character can have the type Character, but can also have the type String.

Thus,

var s = "Hello"

assigns the variable s to the string "Hello", and, through type inference, sets the type of s to String.

Alternatively, the type can be set explicitly:

var s: String = "Hello"

Note that double quotes must be used around literal character sequences. Unlike in Objective-C, single quotes will not work and will result in a compiler error:

var s = 'Hello' // Compiler Error

var ch = 'H' // Compiler Error

An assignment to a variable of a single letter will result in a type being inferred of String, rather than Character:

var s = "H" // Type inferred is String

To assign a single letter to a variable and have it be a type of Character requires that the type be set explicitly:

var ch: Character = "H" // Type set explicitly to Character

If you have a character stored in a variable with a type of Character, and you need it to be a string, you can't just change the type of the variable, or assign it to a variable that you explicitly declare to be of type String. You need to create a string from the character with the Stringfunction:

var ch: Character = "H" // Type set explicitly to Character

var s = String(ch) // Will be inferred to be String

Similarly, if you have a variable of type String that contains a single character, and you need it to be of type Character, you must create it with the Character function:

var s: String = "H" // Type is String

var ch = Character(s) // Type inferred to be Character

Strings can be added together, or concatenated, with the "+" operator:

let a = "The sun is "

let b = "rising."

var c = a + b

print(c) // Prints: The sun is rising.

Strings in Swift can represent a large number of different characters, including those from logographic writing systems like Chinese:

var s = "的" // Chinese

var s = "絵" // Japanese

Swift can also represent the specialized characters known as "emoji":

var hamburger = ""

To allow these and the more conventional characters to be represented, Swift makes use of the Unicode standard.

(To enter emoji characters in TextEdit on a Macintosh, go to Edit > Emoji & Symbols (or, on older Macs, Special Characters) and you will get a popup menu that you can navigate through with a large number of emoji characters. TextEdit works with Unicode; Microsoft Word 2011, the latest production version of Word available for the Macintosh as of mid 2015, does not. However, the 2016 Preview edition of Microsoft Word, available as of mid 2015, does support Unicode.)

The Unicode Standard

Unicode is a standard for representing characters that goes way beyond the traditional ASCII and allows the definition of some 110,000 characters. This includes characters for a large number of written languages, including the word-based (logographic) characters used in Chinese and Japanese, and the syllable-based characters used in Korean.

This makes things a little more complicated because each character does not take a fixed number of bits, like it does in Objective-C. Instead, a character can take from 1 to 4 bytes. A character like “A” takes one byte; a complicated emoji like might take 4 bytes.

An emoji like is simply a pictogram that is a single character. The idea of an emoji—-and the term—comes from Japan, where it literally means "picture character".

An emoji is not the same as an "emoticon", such as the smiley face :-), which just uses existing character shapes to crudely draw something resembling a picture. The terms have no relationship with each other: emoji is a word borrowed from the Japanese language, while emoticon is an English word derived from the words "emotion" and "icon".

Swift allows any Unicode character to be used in a string. Unicode characters are represented by a unique number, known as a code point, that is intended to represent a particular abstract character, known as a grapheme. A grapheme in a writing system like English that is based (more or less) on phonemes is an orthographic character, like “Z”. A grapheme in a writing system like Chinese uses a logographic character, like "的". Each has its own Unicode code point.

Unicode is concerned with the representation of abstract characters (graphemes). It does not involve itself with the specific shape that represents a grapheme, known as a glyph. That is left to the particular system software implementation. Thus Unicode has a single grapheme that represents a "hamburger". The iOS operating system provides a glyph that can be displayed as the visual form of the "hamburger" grapheme that looks like this: The picture of the hamburger will look slightly different, however, on an Android system, for example, as that system will provide its own glyph for "hamburger", such as the following:

Unicode characters can also be defined with a numeric code that represents the code point. Thus:

var a = "\u{1F354}"

print(a) // Prints:

will display the emoji character representing a hamburger.

It is common to refer to Unicode characters in terms of their code points using "U+" followed by the hexadecimal number for the code point. Thus, the code point for the hamburger character would be referred to as U+1F354.

The characters that you will be concerned with are known as Unicode scalars, the most commonly used Unicode characters. Unicode scalars have code points in the range U+0000 to U+D7FF or U+E000 to U+10FFFF. They do not have code points in the range U+D800 to U+DFFF, which are reserved for what are known as surrogate pair code points. Surrogate pair code points are used to represent characters that cannot be represented by a single code point (because there are not enough code points). They are instead represented by a pair of code points.

The Number of Characters in a Swift String

Characters in Unicode can be expressed in either the UTF-8 or UTF-16 encodings. In UTF-8, some characters, such as simple ASCII characters, require only one byte (8 bits) to represent. Others, such as the emoji characters, can require up to four bytes. UTF-16 usually represents characters with 16 bits, or two bytes, but in some cases can require up to four bytes.

Unlike with Objective-C, which uses UTF-16 for strings using the NSString class, there is no simple length property that indicates the length of a string. Instead, Swift considers the length of a string to be the number of characters contained in it. And each character is defined as a perceivably different character, according to the Unicode approach.

Thus, the English orthographic character “Z”, the Chinese logograph 的, and the pictogram for a hamburger are all considered to be a single character, even though each requires a different number of bytes to represent it.

The idea that a character is defined as what is perceived is taken quite seriously, as can be seen in examples in which a character can be displayed either by a single character or with a combination of characters. For example, the character é (e with an acute accent mark) can be displayed either by its own Unicode character:

var ch: Character = "\u{E9}" // Displays the character é

or by displaying two Unicode characters, an ordinary "e" followed by an acute accent character:

var ch: Character = "\u{65}\u{301}" // Displays ordinary e, then adds acute accent mark.

In both cases, Swift considers this to be a single character. A sequence of Unicode characters that displays a single perceivable character is known as an extended grapheme cluster.

From the Apple documentation:

Two String values (or two Character values) are considered equal if their extended grapheme clusters are canonically equivalent. Extended grapheme clusters are canonically equivalent if they have the same linguistic meaning and appearance, even if they are composed from different Unicode scalars behind the scenes.

Another character that is produced with two successive Unicode characters is the U.S flag, the so-called "regional indicator for US", which looks like this:

Its Unicode representation is:

var usFlag: Character = "\u{1F1FA}\u{1F1F8}"

To count the number of characters in a string, you must use the characters.count property for the string variable. (This is Swift 2 only.) This only works with strings, not characters. Below, we first convert the Character to a string, then access characters.count. We can do this first with the sequence of "characters" consisting of “e” followed by the acute accent character:

var ch: Character = "\u{65}\u{301}" // e followed by acute accent

var s: String = String(ch)

print(s.characters.count) // Prints: 1

We will see the same thing for the U.S. flag:

var ch: Character = "\u{1F1FA}\u{1F1F8}" // US Flag ("Regional indicator")

var s: String = String(ch)

print(s.characters.count) // Prints: 1

In both cases characters.count returned a count of 1, because there is only one perceived character, even though it is actually represented by two underlying characters.

And:

var s = "Hello Mr. "

var numberOfCharacters = s.characters.count

print(numberOfCharacters) // Prints: 10

This will count 10 as the number of perceivably different characters, reflecting the nine alphabetic characters (including one space) and single emoji character. (In the example shown, there should be no space between “Mr.” and the adjacent emoji character; my technique for representing emoji characters here, and making sure they display on all ebook devices, is a bit crude.)

We can determine the actual number of bytes used to represent this string by looking at the 8-bit codes one by one. The property utf8 contains a representation of the string in UTF-8 format, that is, byte by byte.

We can use a for-in loop to look at each byte and print out its code in decimal:

var s = "Hello Mr. "

for code in s.utf8 {

print("\(code) ", appendNewline: false)

}

This prints: 72 101 108 108 111 32 77 114 46 240 159 141 148

There are 13 numbers, suggesting that there are 9 bytes for each of the alphabetic characters, and 4 bytes for the hamburger glyph.

If we look at the numbers, we see 72, which is standard ASCII for "H", and, 46, the fifth from the end, which is standard ASCII for the "period" character. Thus it makes sense that the final 4 bytes in the sequence represent the emoji.

Note that there is no length property for a string in Swift, as there is in NSString. (NSString is the standard Cocoa library class for strings that is used by Objective-C.) This is presumably deliberate, to avoid misleading programmers who should be using characters.count. There are properties like s.utf8.count and s.utf16.count (which may not agree) and which provide counts with different rules. (Swift 2.)

Testing for an Empty String and Comparing Strings

TESTING FOR AN EMPTY STRING

The following will test to see if a string is empty:

let s = "" // Creates an empty string

if s.isEmpty { print("The string s is empty") }

let s ="This string has content"

if s.isEmpty {

print("s is empty")

} else {

print("s has content")

}

COMPARING TWO STRINGS

In Swift, a test for equality of two strings uses the "==" operator, and it compares the content of the strings, not just their pointers as in Objective-C. When characters are compared, they are compared based on how they are perceived by the user, not the actual data representing them.

var a: "Character = "\u{E9}" // The character é

var b: Character = "\u{65}\u{301}" // e, then an acute accent character

if a == b {

print("a and b are the same")

}

else {

print("a and b are different")

}

// Prints: a and b are the same

In the example shown, the strings in a and b are considered the same because they will be perceived the same, even thought the underlying data is different.

Methods for Manipulating Strings

A variety of methods are available for working with strings. Some of them work by executing the method directly on a Swift string of type String. Some of them will not work this way, but will work if you first use type casting to (temporarily) create a string with a type of NSString. These show just some of the methods available. Apparently all of the methods described in the NSString documentation can be called with the type casting approach.

For any of these methods to work, either UIKit or Foundation must be imported. This is especially important to remember when trying them out in Playground or the interactive REPL. Some string functions, particularly casting a Swift String type to an NSString type, do not appear to work in the REPL in at least some versions of Xcode.

Only a few string manipulation functions have been implemented in the Swift language itself as of Swift 1.2. However, bridging between Swift and the NSString class is relatively seamless, and it is thus relatively easy to use the NSString functions in ordinary Swift code. The strings are represented in NSString as 16-bit codes, and will not work properly if any of the characters are larger than this, such as the 4-byte emoji characters.

CREATING A STRING

The following creates an empty string:

var s = String()

Alternatively, the following will also create an empty string:

var s = ""

MANIPULATING THE CASE OF LETTERS

The methods lowercaseString and uppercaseString will take a string and make all of the characters lowercase or all uppercase, as follows:

let s = "Mixed Case"

var sUpperCase = s.uppercaseString

print(sUpperCase) // Prints: MIXED CASE

let s = "Mixed Case"

var sLowerCase = s.lowercaseString

print(sLowerCase) // Prints: mixed case

ACCESSING CHARACTERS IN A STRING

To access a single character at a particular index in a string:

let s = "A string with characters"

let index = advance(s.startIndex, 4)

var a = s[index] // Gets a character at index

print(a) // Prints: r

Because characters that take as many as four bytes are allowed, it is not possible (in unextended Swift) to use a subscript with an integer index to access a character in a String. You have to use the rather unwieldy advance function. There is a trick can be used to do this, however, as an alternative. You can define an extension that allows a String to use a subscript that takes a type of Int. This will work, however, only if you avoid characters like emoji and Chinese characters and the like.

ACCESSING SUBSTRINGS OF A STRING

Swift:

let str = "Hello cruel world"

let index = advance(str.startIndex, 6)

let endIndex = advance(str.endIndex, -6)

let sub = str[Range(start: index, end: endIndex)]

print(sub) // Prints string "cruel"

Using NSString:

You can also get a substring by using the type casting operator as! to have Swift treat the string as an NSString and execute the NSString substringFromIndex method:

var s = "HeyCharlie"

var sub = (s as! NSString).substringFromIndex(2)

print (sub) // Prints: yCharlie

And you can get the substring from the beginning of the string to a character pointed to by an integer index:

var s = "Hey Charlie"

var sub = (s as! NSString).substringToIndex(4)

print (sub) // Prints: HeyC

Warning: The numeric index refers to the sequence of 16-bit NSString words in the string, not Unicode's idea of a perceivable character. If characters that take more than 16 bits are involved, such as the hamburger emoji, and the index points to something other than the beginning of that character, you will get gibberish.

Or in Swift:

let str = "Hello, world!"

let index = advance(str.startIndex, 4)

str.substringFromIndex(index) // Returns String "o, world!"

str.substringToIndex(index) // Returns String "Hell"

REPLACING SEQUENCES OF CHARACTERS IN A STRING WITH OTHER SEQUENCES

You can replace a sequence of characters (a substring) in a string with another sequence of characters:

var s = "Hey Charlie"

var s2 = s.stringByReplacingOccurrencesOfString("Charlie", withString: "George")

print(s2) // Prints: Hey George

Note that this will replace ALL occurrences of "Charlie" in the string with "George":

var s = "Hey Charlie, meet the other Charlie"

var s2 = s.stringByReplacingOccurrencesOfString("Charlie", withString: "George")

print(s2) // Prints: Hey George, meet the other George

GETTING NUMERIC AND BOOLEAN VALUES FROM STRING

Using Swift Directly

If you have a string consisting of the digits representing an integer, you can get the integer value of it with:

var s = "12345"

var number = Int(s)

print(number) // Prints: Optional(12345)

The Int initializer returns an optional integer, rather than a pure integer. We haven't discussed optional values yet much (see Chapter 9), but an optional integer is a type that can contain either an actual integer or no value, otherwise known as a nil. To actually use an optional, you have to "unwrap" it:

var s = "12345"

var numberOptional = Int(s)

print(numberOptional) // Prints: Optional(12345)

if(let n = numberOptional) {

print("Value is \(n)") // Prints: Value is 12345

}

else {

print("No value, this is a nil")

}

This form of unwrapping is known as "optional binding" and unwraps it only if it has a value.

Using NSString

It's also possible to use the NSString property intValue. This is simpler than the direct Swift approach, but it is not as safe:

var s = "12345"

var m = (s as! NSString).intValue

print(m) // Prints: 12345

If you have a string consisting of a floating point value, you can get it as a Float value:

var s = "3.8877"

var n = (s as! NSString).floatValue

print(n) // Prints: 3.8877

// n is inferred to be Float

And as a Double value:

s = "3.8877138786"

var n = (s as! NSString).doubleValue

print(n) // Prints: 3.8877138786

// n is inferred to be Double

CONVERTING A STRING THAT HAS A BOOLEAN VALUE

If you have a string consisting of a Boolean name, you can convert it to a Boolean type, which is quite a different thing, although it doesn't look any different when you print it out:

var s = "true"

var p = (s as NSString).boolValue

print(p) // Prints: true

Hands-On Exercises

Go to the following web address with a Macintosh or Windows PC to do the Hands-On Exercises.

For Chapter 7 exercises, go to

understandingswiftprogramming.com/7