More Collections - Core Scala - Learning Scala (2015)

Learning Scala (2015)

Part I. Core Scala

Chapter 7. More Collections

In Chapter 6 we were introduced to the Iterable root type and three of its immutable subtypes: the ordered collection List and the unordered collections Set and Map. These collections were labeled common because they are ubiquitous in modern programming languages, not to imply that they are basic and unadorned. In this chapter we will uncover Scala collections that may not be ubiquitous but are just as important.

We’ll start with mutable collections, which probably can be considered ubiquitous because more languages support them than they do immutable collections. Then we’ll move on to arrays, streams, and other collections.

Mutable Collections

The List, Set, and Map immutable collections we are familiar with cannot be changed after they have been created (see the definition of “immutable”). They can, however, be transformed into new collections. For example, we can create an immutable map, and then transform it by removing one mapping and adding another:

scala> val m = Map("AAPL" -> 597, "MSFT" -> 40) 1

m: scala.collection.immutable.Map[String,Int] =

Map(AAPL -> 597, MSFT -> 40)

scala> val n = m - "AAPL" + ("GOOG" -> 521) 2

n: scala.collection.immutable.Map[String,Int] =

Map(MSFT -> 40, GOOG -> 521)

scala> println(m) 3

Map(AAPL -> 597, MSFT -> 40)

1

A new map with “AAPL” and “MSFT” keys.

2

Removing “APPL” and adding “GOOG” gives us a different collection…

3

… while the original collection in “m” remains the same.

What you end up with is a completely new collection stored in “n”. The original collection, stored in the “m” value, remains untouched. And this is exactly the point of immutable data, namely that data and data structures should not be mutable or change their state in order to improve code stability and prevent bugs. As an example, data structures that are rigid and never change state are safer to use with concurrent code than data structures that may change at any point and are prone to corruption (e.g., reading a data structure while it is undergoing a state change).

However, there are times when you do want mutable data, and when it is arguably safe to use it. For example, creating a mutable data structure that is only used within a function, or one that is converted to immutability before being returned, are considered to be safe use cases. You may want to add elements to a list in the course of a series of “if” condition blocks, or add them in the course of iterating over a separate data structure, without having to store each transformation in a series of local values.

In this section we will explore three methods for building mutable collections.

Creating New Mutable Collections

The most straightforward way to modify collections is with a mutable collection type. See Table 7-1 for the mutable counterparts to the standard immutable List, Map, and Set types.

Table 7-1. Mutable collection types

Immutable type

Mutable counterpart

collection.immutable.List

collection.mutable.Buffer

collection.immutable.Set

collection.mutable.Set

collection.immutable.Map

collection.mutable.Map

Whereas the collection.immutable package is automatically added to the current namespace in Scala, the collection.mutable package is not. When creating mutable collections, make sure to include the full package name for the type.

The collection.mutable.Buffer type is a general-purpose mutable sequence, and supports adding elements to its beginning, middle, and end.

Here is an example of using it to build a list of integers starting from a single element:

scala> val nums = collection.mutable.Buffer(1)

nums: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1)

scala> for (i <- 2 to 10) nums += i

scala> println(nums)

Buffer(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Here is an example of using the same buffer but starting with an empty collection. Because there is no default value, we will have to specify the collection’s type with a type parameter (Int, in this case):

scala> val nums = collection.mutable.Buffer[Int]()

nums: scala.collection.mutable.Buffer[Int] = ArrayBuffer()

scala> for (i <- 1 to 10) nums += i

scala> println(nums)

Buffer(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Building maps and sets is a similar process. Specifying the type parameter for a new set, or the key and value type parameters for a new map, is only required when creating an empty collection.

You can convert your mutable buffer back to an immutable list at any time with the toList method:

scala> println(nums)

Buffer(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> val l = nums.toList

l: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Likewise for sets and maps, use the toSet and toMap methods to convert these mutable collections to their immutable counterparts.

Creating Mutable Collections from Immutable Ones

An alternative to creating mutable collections directly is to convert them from immutable collections. This is useful when you already have a starting immutable collection that you want to modify, or would just rather type “List()” instead of “collection.mutable.Buffer().”

The List, Map, and Set immutable collections can all be converted to the mutable collection.mutable.Buffer type with the toBuffer method. For lists this is obviously straightforward, because the buffer and list type are both sequences. Maps, as subtypes of Iterable, can be considered as sequences as well, and are converted to buffers as sequences of key-value tuples. Converting sets to buffers is trickier, however, because buffers do not honor the uniqueness constraint of sets. Fortunately, any duplicates in the buffer’s data will be removed when converted back to a Set.

Here is an example of converting an immutable map to a mutable one and then changing it back:

scala> val m = Map("AAPL" -> 597, "MSFT" -> 40)

m: scala.collection.immutable.Map[String,Int] =

Map(AAPL -> 597, MSFT -> 40)

scala> val b = m.toBuffer 1

b: scala.collection.mutable.Buffer[(String, Int)] =

ArrayBuffer((AAPL,597), (MSFT,40))

scala> b trimStart 1 2

scala> b += ("GOOG" -> 521) 3

res1: b.type = ArrayBuffer((MSFT,40), (GOOG,521))

scala> val n = b.toMap 4

n: scala.collection.immutable.Map[String,Int] =

Map(MSFT -> 40, GOOG -> 521)

1

The map, containing key-value pairs, is now a sequence of tuples.

2

trimStart removes one or more items from the start of the buffer.

3

After removing the “AAPL” entry we’ll add a “GOOG” entry.

4

This buffer of tuples is now an immutable map again.

The buffer methods toList and toSet can be used in addition to toMap to convert a buffer to an immutable collection.

Let’s try converting this buffer of 2-sized tuples to a List and to a Set. After all, there’s no reason that a collection of 2-sized tuples created from a map must be converted back to its original form.

To verify that Set imposes a uniqueness constraint, we’ll first add a duplicate entry to the buffer. Let’s see how this works out:

scala> b += ("GOOG" -> 521)

res2: b.type = ArrayBuffer((MSFT,40), (GOOG,521), (GOOG,521))

scala> val l = b.toList

l: List[(String, Int)] = List((MSFT,40), (GOOG,521), (GOOG,521))

scala> val s = b.toSet

s: scala.collection.immutable.Set[(String, Int)] = Set((MSFT,40), (GOOG,521))

The list “l” and set “s” were created successfully, with the list containing the duplicated entries and the set restricted to contain only unique entries.

The Buffer type is a good, general-purpose mutable collection, similiar to a List but able to add, remove, and replace its contents. The conversion methods it supports, along with the toBuffer methods on its immutable counterparts, makes it a useful mechanism for working with mutable data.

About the only drawback of a buffer is that it may be too broadly applicable. If all you need is to put together a collection iteratively, e.g., inside a loop, a builder may be a good choice instead.

Using Collection Builders

A Builder is a simplified form of a Buffer, restricted to generating its assigned collection type and supporting only append operations.

To create a builder for a specific collection type, invoke the type’s newBuilder method and include the type of the collection’s element. Invoke the builder’s result method to convert it back into the final Set. Here is an example of creating a Set with a builder:

scala> val b = Set.newBuilder[Char]

b: scala.collection.mutable.Builder[Char,scala.collection.immutable.

Set[Char]] = scala.collection.mutable.SetBuilder@726dcf2c

scala> b += 'h' 1

res3: b.type = scala.collection.mutable.SetBuilder@d13d812

scala> b ++= List('e', 'l', 'l', 'o') 2

res4: b.type = scala.collection.mutable.SetBuilder@d13d812

scala> val helloSet = b.result 3

helloSet: scala.collection.immutable.Set[Char] = Set(h, e, l, o)

1

Adding a single item, one of two append operations.

2

Adding multiple items, the second of two append operations.

3

Unlike with buffers, a builder knows its immutable counterpart.

So, why use Builder versus Buffer or one of the mutable collection types? The Builder type is a good choice if you are only building a mutable collection iteratively in order to convert it to an immutable collection. If you need Iterable operations while building your mutable collection, or don’t plan on converting to an immutable collection, using one of the Buffer or other mutable collection types is a better match.

In this section we have investigated methods to convert between immutable and mutable collections, which are either unchangeable or fully modifiable. In the next section we’ll cover a “collection” that breaks these rules, being immutable in size but mutable in content.

Arrays

An Array is a fixed-size, mutable, indexed collection. It’s not officially a collection, because it isn’t in the “scala.collections” package and doesn’t extend from the root Iterable type (although it has all of the Iterable operations like map and filter). The Array type is actually just a wrapper around Java’s array type with an advanced feature called an implicit class allowing it to be used like a sequence. Scala provides the Array type for compatibility with JVM libraries and Java code and as a backing store for indexed collections, which really require an array to be useful.

Here are some examples of working with arrays, demonstrating their cell mutability and support for Iterable operations:

scala> val colors = Array("red", "green", "blue")

colors: Array[String] = Array(red, green, blue)

scala> colors(0) = "purple" 1

scala> colors 2

res0: Array[String] = Array(purple, green, blue)

scala> println("very purple: " + colors) 3

very purple: [Ljava.lang.String;@70cf32e3

scala> val files = new java.io.File(".").listFiles 4

files: Array[java.io.File] = Array(./Build.scala, ./Dependencies.scala,

./build.properties, ./JunitXmlSupport.scala, ./Repositories.scala,

./plugins.sbt, ./project, ./SBTInitialization.scala, ./target)

scala> val scala = files map (_.getName) filter(_ endsWith "scala")

scala: Array[String] = Array(Build.scala, Dependencies.scala,

JunitXmlSupport.scala, Repositories.scala, SBTInitialization.scala)

1

Use a zero-based index to replace any item in an Array.

2

The Scala REPL knows how to print an Array …

3

… but not println(), which can only call a type’s toString() method.

4

The listFiles method in java.io.File, a JDK class, returns an array that we can easily map and filter.

Java arrays do not override the toString() method inherent in all Java and Scala objects, and thus use the default implementation of printing out the type parameter and reference. Thus, calling toString() on an Array results in the unreadable output seen in the last example. Fortunately you won’t see this output with Scala collections, because they all override toString() to provide human-readable printouts of their contents and structure.

It’s important to hear about and understand the Array type, but I don’t recommend using it in regular practice unless you need it for JVM code. There are many other fine sequences that you can use instead, as you’ll see in the next section.

Seq and Sequences

Seq is the root type of all sequences, including linked lists like List and indexed (direct-access) lists like Vector. The Array type, if it were a collection, could be considered an indexed sequence because its elements are directly accessible without traversal. As a root type, Seq itself cannot be instantiated, but you can invoke it as a shortcut for creating a List:

scala> val inks = Seq('C','M','Y','K')

inks: Seq[Char] = List(C, M, Y, K)

The Seq hierarchy of sequence collections appears in Figure 7-1, and Table 7-2 contains the descriptions for each of these types.

Sequence Collections

Figure 7-1. The sequence collections hierarchy

Table 7-2. Sequence types

Name

Description

Seq

The root of all sequences. Shortcut for List().

IndexedSeq

The root of indexed sequences. Shortcut for Vector().

Vector

A list backed by an Array instance for indexed access.

Range

A range of integers. Generates its data on-the-fly.

LinearSeq

The root of linear (linked-list) sequences.

List

A singly linked list of elements.

Queue

A first-in-last-out (FIFO) list.

Stack

A last-in-first-out (LIFO) list.

Stream

A lazy list. Elements are added as they are accessed.

String

A collection of characters.

The Vector type is implemented with an Array for storage. As an indexed sequence (since arrays are indexed), you can access items in a Vector directly by their index. By contrast, accessing the nth item of a List (a linked list) requires n – 1 steps from the head of its list. Java developers will recognize Vector as analogous to Java’s “ArrayList,” whereas C++ developers will (more easily) recognize it as being similar to the “Vector” template.

The Seq shortcut for List linked lists and the IndexedSeq shortcut for Vector indexed lists are only marginally useful, because the savings for writing them is one character and negative four characters, respectively. Unless you have a fondness for high-level types (e.g., Seq) over concrete implementations (e.g., List), you may find little reason to use them yourself.

Seeing the String type listed with sequences may be a surprise, but in Scala it is a valid collection just like the others. A “string” derives its name, after all, from being a string of characters, in this case a sequence of Char elements. The String type is an immutable collection, extendingIterable and supporting its operations, while also serving as a wrapper for Java strings and supporting such java.lang.String operations as split and trim.

Here’s an example of using String as a subtype of Iterable and as a java.lang.String wrapper, using methods from both types:

scala> val hi = "Hello, " ++ "worldly" take 12 replaceAll ("w","W")

hi: String = Hello, World

The ++ and take operations derive from Iterable and act on the sequence of characters, while replaceAll is a java.lang.String operation invoked as a Scala operator.

The last sequence we’ll discuss in this chapter is the Stream type, which builds itself as its elements are accessed. It’s a popular collection in functional programming languages, but it takes some extra time to learn so it has its own section. Take the time to try out the examples and get familiar with Stream because it can be a very helpful collection to know about.

Streams

The Stream type is a lazy collection, generated from one or more starting elements and a recursive function. Elements are added to the collection only when they are accessed for the first time, in constrast to other immutable collections that receive 100% of their contents at instantiation time. The elements that a stream generates are cached for later retrieval, ensuring that each element is only generated once. Streams can be unbounded, theoretically infinite collections where elements are only realized upon access. They can also be terminated with Stream.Empty, a counterpart toList.Nil.

Streams, like lists, are recursive data structures consisting of a head (the current element) and a tail (the rest of the collection). They can be built with a function that returns a new stream containing the head element and a recursive invocation of that function to build the tail. You can useStream.cons to construct a new stream with the head and tail.

Here is an example function that builds and recursively generates a new stream. By incrementing the starting integer value, it will end up creating a collection of consecutively increasing integers:

scala> def inc(i: Int): Stream[Int] = Stream.cons(i, inc(i+1))

inc: (i: Int)Stream[Int]

scala> val s = inc(1)

s: Stream[Int] = Stream(1, ?)

We have our stream but it only contains our starting value (1) and a promise of future values (?). Let’s force it to build out the next four elements by “taking” them and retrieving the contents as a list:

scala> val l = s.take(5).toList

l: List[Int] = List(1, 2, 3, 4, 5)

scala> s

res1: Stream[Int] = Stream(1, 2, 3, 4, 5, ?)

We took the first five elements and retrieved them as a plain old list. Printing out the original stream instance shows that it now contains five elements and is ready to generate more. We could follow this up by taking 20, or 200, or 2,000 elements. The stream contains a recursive function call (specifically, a function value) that it can use to generate new elements without end.

An alternate syntax for the Stream.cons operator is the slightly cryptic #:: operator, which we’ll just call the cons operator for streams. This performs the same function as Stream.cons except with right-associative notation, complementing the cons operator for lists, :: (see The Cons Operator).

Here’s the “inc” function again, using the cons operator #::. I’ve also renamed the parameter to “head” to better demonstrate its use as the head element of the new Stream instance:

scala> def inc(head: Int): Stream[Int] = head #:: inc(head+1)

inc: (head: Int)Stream[Int]

scala> inc(10).take(10).toList

res0: List[Int] = List(10, 11, 12, 13, 14, 15, 16, 17, 18, 19)

WHERE IS THE STREAM BEING CONSTRUCTED?

Many developers find the stream cons operator syntax, #::, confusing because it lacks an explicit creation of the underlying collection. The confusion is indeed warranted, because an advanced feature called implicit conversion is used to generate the new Stream instance from the recursive function’s type, => Stream[A]. If you can accept that the recursive function invocation (inc(i+1) in the preceding example) magically generates a Stream that acts as the tail, then prefixing this magic tail with your head element to create a new Stream should be acceptable.

Let’s try creating a bounded stream. We’ll use two arguments to our recursive function, one specifying the new head element and another specifying the last element to add:

scala> def to(head: Char, end: Char): Stream[Char] = (head > end) match {

| case true => Stream.empty

| case false => head #:: to((head+1).toChar, end)

| }

to: (head: Char, end: Char)Stream[Char]

scala> val hexChars = to('A', 'F').take(20).toList

hexChars: List[Char] = List(A, B, C, D, E, F)

Using the new “to” function, we were able to create a bounded stream consisting of the letters used in writing hexadecimal numbers. The stream’s take operation only returned the available elements, ending after we placed Stream.empty to terminate the collection.

The recursive function is used to generate a new Stream and (in our examples thus far) derive the new head element each time. We have used one or two parameter functions in the examples thus far, but you could easily have a function that had zero or a dozen parameters. The important part is defining the head value for your new stream.

All of the collections we have covered can hold zero, one, two, or more items, with the expectation that they can scale as far as your machine’s memory or the JVM environment will allow. Next, we’ll learn about a set of collections that can scale no longer than a single element, but are surprisingly applicable to situations where you never considered a collection would fit.

Monadic Collections

The last set of collections we’ll discuss in this chapter are the monadic ones, which support transformative operations like the ones in Iterable but can contain no more than one element. The term “monadic” applies in its Greek origins to mean a single unit, and in the category theory sense of a single link in a chain of operations.

We’ll start with the Option type, the one monadic collection that extends Iterable.

Option Collections

As a collection whose size will never be larger than one, the Option type represents the presence or absence of a single value. This potentially missing value (e.g., it was never initialized, or could not be calculated) can thus be wrapped in an Option collection and have its potential absence clearly advertised.

Some developers see Option as a safe replacement for null values, notifying users that the value may be missing and reducing the likelihood that its use will trigger a NullPointerException. Others see it as a safer way to build chains of operations, ensuring that only valid values will persist for the duration of the chain.

The Option type is itself unimplemented but relies on two subtypes for the implementation: Some, a type-parameterized collection of one element; and None, an empty collection. The None type has no type parameters because it never contains contents. You can work with these types directly, or invoke Option() to detect null values and let it choose the appropriate subtype.

Let’s try creating an Option with nonnull and null values:

scala> var x: String = "Indeed"

x: String = Indeed

scala> var a = Option(x)

a: Option[String] = Some(Indeed)

scala> x = null

x: String = null

scala> var b = Option(x)

b: Option[String] = None

You can use isDefined and isEmpty to check if a given Option is Some or None, respectively:

scala> println(s"a is defined? ${a.isDefined}")

a is defined? true

scala> println(s"b is not defined? ${b.isEmpty}")

b is not defined? true

Let’s use a more realistic example by defining a function that returns an Option value. In this case we will wrap the divide operator (/) with a check to prevent division by zero. Valid inputs will return a Some wrapper and invalid inputs will return None:

scala> def divide(amt: Double, divisor: Double): Option[Double] = { 1

| if (divisor == 0) None

| else Option(amt / divisor) 2

| }

divide: (amt: Double, divisor: Double)Option[Double]

scala> val legit = divide(5, 2)

legit: Option[Double] = Some(2.5) 3

scala> val illegit = divide(3, 0)

illegit: Option[Double] = None 4

1

The return type is the Option with a Double type parameter, ensuring that valid results will retain the correct type.

2

This will return a Some wrapper because the dividend will be a nonnull value.

3

With a valid divisor, our dividend comes wrapped in a Some.

4

With an invalid divisor we get None, the absence of a value.

A function that returns a value wrapped in the Option collection is signifying that it may not have been applicable to the input data, and as such may not have been able to return a valid result. It offers a clear warning to callers that its value is only potential, and ensures that its results will need to be carefully handled. In this way, Option provides a type-safe option for handling function results, far safer than the Java standard of returning null values to indicate missing data.

Scala’s collections use the Option type in this way to provide safe operations for handling the event of empty collections. For example, although the head operation works for non-empty lists, it will throw an error for empty lists. A safer alternative is headOption, which (as you may have guessed) returns the head element wrapped in an Option, ensuring that it will work even on empty lists.

Here is an example of calling headOption to safely handle empty collections:

scala> val odds = List(1, 3, 5)

odds: List[Int] = List(1, 3, 5)

scala> val firstOdd = odds.headOption

firstOdd: Option[Int] = Some(1)

scala> val evens = odds filter (_ % 2 == 0)

evens: List[Int] = List()

scala> val firstEven = evens.headOption

firstEven: Option[Int] = None

Another use of options in collections is in the find operation, a combination of filter and headOption that returns the first element that matches a predicate function. Here is an example of successful and unsuccessful collection searches with find:

scala> val words = List("risible", "scavenger", "gist")

words: List[String] = List(risible, scavenger, gist)

scala> val uppercase = words find (w => w == w.toUpperCase)

uppercase: Option[String] = None

scala> val lowercase = words find (w => w == w.toLowerCase)

lowercase: Option[String] = Some(risible)

In a way, we have used list reduction operations to reduce a collection down to a single Option. Because Option is itself a collection, however, we can continue to transform it.

For example, we could use filter and map to transform the “lowercase” result in a way that retains the value, or in a way that loses the value. Each of these operations is type-safe and will not cause null pointer exceptions:

scala> val filtered = lowercase filter (_ endsWith "ible") map (_.toUpperCase)

filtered: Option[String] = Some(RISIBLE)

scala> val exactSize = filtered filter (_.size > 15) map (_.size)

exactSize: Option[Int] = None

In the second example, the filter is inapplicable to the “RISIBLE” value and so it returns None. The ensuing map operation against None has no effect and simply returns None again.

This is a great example of the Option as a monadic collection, providing a single unit that can be executed safely (and type-safely) in a chain of operations. The operations will apply to present values (Some) and not apply to missing values (None), but the resulting type will still match the type of the final operation (an Option[Int] in the preceding example).

We have covered creating and transforming the Option collection. You may be wondering, however, what you do with an Option after you transform it into the desired value, type, or existence.

Extracting values from Options

The Option collection provides a safe mechanism and operations for storing and transforming values that may or may not be present. You shouldn’t be surprised that it also provides safe operations to extract its potential value.

For the curious, there is also an unsafe extraction operation, the get() method. If you call this for an Option that is actually a Some instance, you will successfully receive the value it contains. However, if you call get() on an instance of None, a “no such element” error will be triggered.

AVOID OPTION.GET()

Option.get() is unsafe and should be avoided, because it disrupts the entire goal of type-safe operations and can lead to runtime errors. If possible, use an operation like fold or getOrElse that allows you to define a safe default value.

We will focus on the safe operations for extracting Option values. The core strategy of these operations is to provide a framework for handling missing values, such as a replacement (aka “default”) value to use instead of the missing one, or a function that can either generate a replacement or raise an error condition.

Table 7-3 has a selection of these operations. The examples in the table could have been written with literal Option values like Some(10) or None, but these would not have helped to illustrate the challenges of working with potential data. Instead, the following examples call the functionnextOption, which randomly returns either a valid or missing Option value each time. Try out this function and the examples in the REPL to see how Some and None change the result of these operations:

scala> def nextOption = if (util.Random.nextInt > 0) Some(1) else None

nextOption: Option[Int]

scala> val a = nextOption

a: Option[Int] = Some(1)

scala> val b = nextOption

b: Option[Int] = None

Table 7-3. Safe Option extractions

Name

Example

Description

fold

nextOption.fold(-1)(x => x)

Returns the value from the given function for Some (in this case, based on the embedded value) or else the starting value. The foldLeft, foldRight, and reduceXXX methods are also available for reducing an Option down to its embedded value or else a computed value.

getOrElse

nextOption getOrElse 5 or nextOption getOrElse {println("error!"); -1 }

Returns the value for Some or else the result of a by-name parameter (see By-Name Parameters) for None.

orElse

nextOption orElse nextOption

Doesn’t actually extract the value, but attempts to fill in a value for None. Returns this Option if it is nonempty, otherwise returns an Option from the given by-name parameter.

Match expressions

nextOption match { case Some(x) => x; case None => -1 }

Use a match expression to handle the value if present. The Some(x) expression extracts its data into the named value “x”, which can be used as the return value of the match expression or reused for further transformation.

The Option type is a great example of a monadic collection, being a singular and chainable unit. It is used throughout Scala’s collections library, for example in the find and headOption operations for sequences. It is also helpful in your own functions for input parameters and return values when you need to represent potential values. Many consider it a safer alternative than using null (i.e., the absence of an initialized value), as its potential nature is made clear, because its use cannot prevent all null pointer errors.

Option is a general-purpose monadic collection for potential values, able to contain any type of value as specified in its type parameters. We will now look at two monadic collections for specific purposes: Try for successful values and Future for eventual values.

Try Collections

The util.Try collection turns error handling into collection management. It provides a mechanism to catch errors that occur in a given function parameter, returning either the error or the result of the function if successful.

Scala provides the ability to raise errors by throwing exceptions, error types that may include a message or other information. Throwing an exception in your Scala code will disrupt your program’s flow and return control to the closest handler for that particular exception. Unhandled exceptions will terminate applications, although most Scala application frameworks and web containers take care to prevent this.

Exceptions may be thrown by your own code, by library methods that you invoke, or by the Java Virtual Machine (JVM). The JVM will throw a java.util.NoSuchElementException if you call None.get or Nil.head (the head of an empty list) or ajava.lang.NullPointerException if you access a field or method of a null value.

To throw an exception, use the throw keyword with a new Exception instance. The text message provided to Exception is optional:

scala> throw new Exception("No DB connection, exiting...")

java.lang.Exception: No DB connection, exiting...

... 32 elided

To really test out exceptions, let’s create a function that will throw an exception based on the input criteria. We can then use it to trigger exceptions for testing:

scala> def loopAndFail(end: Int, failAt: Int): Int = {

| for (i <- 1 to end) {

| println(s"$i) ")

| if (i == failAt) throw new Exception("Too many iterations")

| }

| end

| }

loopAndFail: (end: Int, failAt: Int)Int

Let’s try loopAndFail with a larger iteration number than the check, ensuring we get an exception. This will demonstrate how the for-loop and the function overall get disrupted by an exception:

scala> loopAndFail(10, 3)

1)

2)

3)

java.lang.Exception: Too many iterations

at $anonfun$loopAndFail$1.apply$mcVI$sp(<console>:10)

at $anonfun$loopAndFail$1.apply(<console>:8)

at $anonfun$loopAndFail$1.apply(<console>:8)

at scala.collection.immutable.Range.foreach(Range.scala:160)

at .loopAndFail(<console>:8)

... 32 elided

The corollary to throwing exceptions is catching and handling them. To “catch” an exception, wrap the potentially offending code in the util.Try monadic collection.

NO TRY/CATCH BLOCKS?

Scala does support try {} .. catch {} blocks, where the catch block contains a series of case statements that attempt to match the thrown error. I recommend using util.Try() exclusively because it offers a safer, more expressive, and fully monadic approach to handling errors.

The util.Try type, like Option, is unimplemented but has two implemented subtypes, Success and Failure. The Success type contains the return value of the attempted expression if no exception was thrown, and the Failure type contains the thrown Exception.

Let’s wrap some invocations of the loopAndFail function with util.Try and see what we get:

scala> val t1 = util.Try( loopAndFail(2, 3) ) 1

1)

2)

t1: scala.util.Try[Int] = Success(2) 2

scala> val t2 = util.Try{ loopAndFail(4, 2) } 3

1)

2)

t2: scala.util.Try[Int] = Failure(

java.lang.Exception: Too many iterations) 4

1

util.Try() takes a function parameter, so our invocation of loopAndFail is automatically converted to a function literal.

2

The function literal (our safe invocation of loopAndFail) exited safely, so we have a Success containing the return value.

3

Invoking util.Try with expression blocks (see Function Invocation with Expression Blocks) is also acceptable.

4

An exception was thrown in this function literal, so we have a Failure containing said exception.

Now we’ll look at how to handle potential errors. Because util.Try and its subtypes are also monadic collections, you can expect to find a number of thrilling and yet familiar methods for handling these situations. You may find that selecting the right error-handling approach (including whether to handle them at all) for your application will depend on its requirements and context, however. Few error-handling methods are generally applicable, in my experience.

Table 7-4 has a selection of strategies for handling errors. To better portray the inherent dichotomy of the success and failure states, let’s define a randomized error function for use in the examples:

scala> def nextError = util.Try{ 1 / util.Random.nextInt(2) }

nextError: scala.util.Try[Int]

scala> val x = nextError

x: scala.util.Try[Int] = Failure(java.lang.ArithmeticException:

/ by zero)

scala> val y = nextError

y: scala.util.Try[Int] = Success(1)

Now when you try out the following examples you’ll be able to test them with successes and failures.

Table 7-4. Error-handling methods with Try

Name

Example

Description

flatMap

nextError flatMap { _ => nextError }

In case of Success, invokes a function that also returns util.Try, thus mapping the current return value to a new, embedded return value (or an exception). Because our “nextError” demo function does not take an input, we’ll use an underscore to represent the unused input value from the current Success.

foreach

nextError foreach(x => println("success!" + x))

Executes the given function once in case of Success, or not at all in case of a Failure.

getOrElse

nextError getOrElse 0

Returns the embedded value in the Success or the result of a by-name parameter in case of a Failure.

orElse

nextError orElse nextError

The opposite of flatMap. In case of Failure, invokes a function that also returns a util.Try. With orElse you can potentially turn a Failure into a Success.

toOption

nextError.toOption

Convert your util.Try to Option, where a Success becomes Some and a Failure becomes None. Useful if you are more comfortable working with options, but the downside is you may lose the embedded Exception.

map

nextError map (_ * 2)

In case of Success, invokes a function that maps the embedded value to a new value.

Match expressions

nextError match { case util.Success(x) => x; case util.Failure(error) => -1 }

Use a match expression to handle a Success with a return value (stored in “x”) or a Failure with an exception (stored in “error”). Not shown: logging the error with a good logging framework, ensuring it gets noticed and tracked.

Do nothing

nextError

This is the easiest error-handling method of all and a personal favorite of mine. To use this method, simply allow the exception to propagate up the call stack until it gets caught or causes the current application to exit. This method may be too disruptive for certain sensitive cases, but ensures that thrown exceptions will never be ignored.

A common exception that many developers have to work with is validating numbers stored in strings. Here’s an example using the orElse operation to try to parse a number out of a string, and the foreach operation to print the result if successful:

scala> val input = " 123 "

input: String = " 123 "

scala> val result = util.Try(input.toInt) orElse util.Try(input.trim.toInt)

result: scala.util.Try[Int] = Success(123)

scala> result foreach { r => println(s"Parsed '$input' to $r!") }

Parsed ' 123 ' to 123!

scala> val x = result match {

| case util.Success(x) => Some(x)

| case util.Failure(ex) => {

| println(s"Couldn't parse input '$input'")

| None

| }

| }

x: Option[Int] = Some(123)

I’ll repeat the assertion that the best error-handling strategy will depend on your current requirements and context. The one error-handling method to avoid is to encounter an exception and ignore it, e.g., by replacing it with a default value. If an exception was thrown, it at least deserves to be reported and handled.

Future Collections

The final monadic collection we’ll review is concurrent.Future, which initiates a background task. Like Option and Try, a future represents a potential value and provides safe operations to either chain additional operations or to extract the value. Unlike with Option and Try, a future’s value may not be immediately available, because the background task launched when creating the future could still be working.

By now you know that Scala code executes on the Java Virtual Machine, aka the JVM. What you may not know is that it also operates inside Java’s “threads,” lightweight concurrent processes in the JVM. By default Scala code runs in the JVM’s “main” thread, but can support running background tasks in concurrent threads. Invoking a future with a function will execute that function in a separate thread while the current thread continues to operate. A future is thus a monitor of a background Java thread in addition to being a monadic container of the thread’s eventual return value.

Fortunately, creating a future is a trivial task—just invoke it with a function you want to run in the background.

Let’s try creating a future with a function that prints a message. Before creating the future, it is necessary to specify the “context” in the current session or application for running functions concurrently. We’ll use the default “global” context, which makes use of Java’s thread library for this purpose:

scala> import concurrent.ExecutionContext.Implicits.global

import concurrent.ExecutionContext.Implicits.global

scala> val f = concurrent.Future { println("hi") }

hi

f: scala.concurrent.Future[Unit] =

scala.concurrent.impl.Promise$DefaultPromise@29852487

Our background task printed “hi” before the future could even be returned to the value. Let’s try another example that “sleeps” the background thread with Java’s Thread.sleep to make sure we get the future back while the background task is still running!

scala> val f = concurrent.Future { Thread.sleep(5000); println("hi") }

f: scala.concurrent.Future[Unit] =

scala.concurrent.impl.Promise$DefaultPromise@4aa3d36

scala> println("waiting")

waiting

scala> hi

The background task, after sleeping for 5 seconds (i.e., 5,000 milliseconds), printed the “hi” message. In the meantime, our code in the “main” thread had time to print a “waiting” message before the background task completed.

You can set callback functions or additional futures to execute when a future’s task completes. As an example, an API call could start an important but prolonged operation in the background while it returns control to the caller. You can also choose to wait, blocking the “main” thread until the background task completes. An already-asynchronous event such as a network file transfer could be started in a future while the “main” thread sleeps until the task completes or a “timeout” duration is reached.

Futures can be managed asynchronously (while the “main” thread continues to operate) or synchronously (with the “main” thread waiting for the task to complete). Because asynchronous operations are more efficient, allowing both the background and current threads to continue executing, we will review them first.

Handling futures asynchronously

Futures, in addition to spawning background tasks, can be treated as monadic collections. You can chain a function or another future to be executed following the completion of a future, passing the first future’s successful result to the new function or feature.

A future handled this way will eventually return a util.Try containing either its function’s return value or an exception. In case of success (with a return value), the chained function or future will be passed to the return value and converted into a future to return its own success or failure. In case of a failure (i.e., an exception was thrown), no additional functions or futures will be executed. In this way, the future-as-monadic-collection is just a chain in a sequence of operations that carry an embedded value. This is similar to Try, which breaks the chain when a failure is reached, and Option, which breaks the chain when the value is no longer present.

To receive the eventual result of a future, or of a chain of futures, you can specify a callback function. Your callback function receives the eventual successful value or the exception, freeing the original code that created the future to move on to other tasks.

Table 7-5 has a selection of operations for chaining futures and setting callback functions. As with the previous tables of operations, we’ll start with a randomized function that can provide us with a realistic test case. This function, nextFtr, will sleep and then either return a value or throw an exception. Its inner function “rand” makes it easier to set a sleep time (up to 5 seconds / 5,000 milliseconds) and determine whether to succeed or fail:

scala> import concurrent.ExecutionContext.Implicits.global

import concurrent.ExecutionContext.Implicits.global

scala> import concurrent.Future

import concurrent.Future

scala> def nextFtr(i: Int = 0) = Future {

| def rand(x: Int) = util.Random.nextInt(x)

|

| Thread.sleep(rand(5000))

| if (rand(3) > 0) (i + 1) else throw new Exception

| }

nextFtr: (i: Int)scala.concurrent.Future[Int]

IS THREAD.SLEEP() SAFE TO USE?

Some of the examples in this section on futures use the Java library method Thread.sleep to help demonstrate the concurrent and potentially delayed nature of running background tasks. However, actually using Thread.sleep in your own futures is a practice best avoided due to its inefficiences. If you really need to put a future to sleep, you should consider using callback functions instead.

Table 7-5. Asynchronous future operations

Name

Example

Description

fallbackTo

nextFtr(1) fallbackTo nextFtr(2)

Chains the second future to the first and returns a new overall future. If the first is unsuccessful, the second is invoked.

flatMap

nextFtr(1) flatMap nextFtr()

Chains the second future to the first and returns a new overall future. If the first is successful, its return value will be used to invoke the second.

map

nextFtr(1) map (_ * 2)

Chains the given function to the future and returns a new overall future. If the future is successful, its return value will be used to invoke the function.

onComplete

nextFtr() onComplete { _ getOrElse 0 }

After the future’s task completes, the given function will be invoked with a util.Try containing a value (if success) or an exception (if failure).

onFailure

nextFtr() onFailure { case _ => "Error!" }

If the future’s task throws an exception, the given function will be invoked with that exception.

onSuccess

nextFtr() onSuccess { case x => s"Got $x" }

If the future’s task completes successfully, the given function will be invoked with the return value.

Future.sequence

concurrent.Future sequence List(nextFtr(1), nextFtr(5))

Runs the futures in the given sequence concurrently, returning a new future. If all futures in the sequence are successful, a list of their return values will be returned. Otherwise the first exception that occurs across the futures will be returned.

The code examples we have used with futures should help to illustrate how to create and manage them. However, futures require creation, management, and extraction to be useful. Let’s try a more realistic example of futures that shows how to work with them from start to finish.

In this example we will use the OpenWeatherMap API (remember this from Exercises ?) to check the current temperature (in Kelvin!) for two cities and report which one is warmer. Because calling a remote API can be time-intensive we will make the API calls in concurrent futures, running concurrently with our main thread:

scala> import concurrent.Future 1

import concurrent.Future

scala> def cityTemp(name: String): Double = {

| val url = "http://api.openweathermap.org/data/2.5/weather"

| val cityUrl = s"$url?q=$name"

| val json = io.Source.fromURL(cityUrl).mkString.trim 2

| val pattern = """.*"temp":([\d.]+).*""".r 3

| val pattern(temp) = json 4

| temp.toDouble

| }

cityTemp: (name: String)Double

scala> val cityTemps = Future sequence Seq( 5

| Future(cityTemp("Fresno")), Future(cityTemp("Tempe"))

| )

cityTemps: scala.concurrent.Future[Seq[Double]] =

scala.concurrent.impl.Promise$DefaultPromise@51e0301d

scala> cityTemps onSuccess {

| case Seq(x,y) if x > y => println(s"Fresno is warmer: $x K") 6

| case Seq(x,y) if y > x => println(s"Tempe is warmer: $y K")

| }

Tempe is warmer: 306.1 K

1

Okay, sometimes typing “concurrent.Future” too many times is a pain. The import command brings a package’s type into the current session’s namespace.

2

io.Source has many useful I/O operations for Scala applications.

3

Capturing the “temp” field in a JSON response.

4

Using Regex to produce a value from a capture group (see Regular expressions for a refresh on this topic).

5

By invoking Future.sequence, the sequence of futures are invoked concurrently and a list of their results are returned.

6

Pattern matching on sequences using a pattern guard (see Pattern Matching with Collections for an overview of using pattern matching with collections).

In this example we were able to make multiple concurrent calls to a remote API without blocking the main thread, i.e. the Scala REPL session. Calling a remote API and parsing its JSON result using regular expressions only took a few lines to implement (“few” = “less then a dozen” here), and executing this concurrently took up about the same amount of code.

You should now have a good understanding of how to create futures and work with them asynchronously. In the next section we will cover what to do if you absolutely must wait for a future to complete.

Handling futures synchronously

Blocking a thread while waiting for a background thread to complete is a potentially resource-heavy operation. It should be avoided for high-traffic or high-performance applications in favor of using callback functions like onComplete or onSuccess. However, there are some times when you just need to block the current thread and wait for a background thread to complete, successfully or otherwise.

To block the current thread and wait for another thread to complete, use concurrent.Await.result(), which takes the background thread and a maximum amount of time to wait. If the future completes in less time than the given duration, its result is returned, but a future that doesn’t complete in time will result in a java.util.concurrent.TimeoutException. This thrown unwieldy exception may require using util.Try to manage timeout conditions safely, so be sure to choose acceptable durations that can minimize the chance of this occurring.

To demonstrate the use of concurrent.Await.result, let’s use the “nextFtr” demonstration function we created for testing asynchronous operations (see Handling futures asynchronously). We’ll start by importing the contents of the “duration” package to get access to the Duration typefor specifying time spans as well as the types for their units:

scala> import concurrent.duration._ 1

import concurrent.duration._

scala> val maxTime = Duration(10, SECONDS) 2

maxTime: scala.concurrent.duration.FiniteDuration = 10 seconds

scala> val amount = concurrent.Await.result(nextFtr(5), maxTime)

amount: Int = 6 3

scala> val amount = concurrent.Await.result(nextFtr(5), maxTime)

java.lang.Exception 4

at $anonfun$nextFtr$1.apply$mcI$sp(<console>:18)

at $anonfun$nextFtr$1.apply(<console>:15)

at $anonfun$nextFtr$1.apply(<console>:15)

...

1

The underscore (_) at the end imports every member of the given package into the current namespace.

2

SECONDS is a member of the concurrent.duration package and signifies that the given duration (10, in this case) is in seconds.

3

When “nextFtr()” returns a successful value, concurrent.Await will return it…

4

… but when “nextFtr()” throws an exception, the current thread will be disrupted.

While our first call to concurrent.Await.result gave us a successful call, the second one caused an exception that disrupted the Scala REPL. When working with synchronous operations, you may want to add your own util.Try wrapper to ensure that exceptions thrown in a future will not disrupt the current flow. Doing so is not a requirement, because allowing exceptions to propagate may be a valid design choice.

Summary

Mutable collections, well known and available in most programming languages, have the best of both worlds in Scala. They can be used as incremental buffers to expand collections one item at a time using buffers, builders, or other approaches, but also support the wide variety of operations available to immutable collections.

And collections are, especially as Scala broadly defines them, more than simple containers for application data. Monadic collections provide type-safe chainable operations and management for sensitive and complex situations such as missing data, error conditions, and concurrent processing.

In Scala, immutable, mutable, and monadic collections are indispensable building blocks and foundations for safe and expressive software development. They are ubiquitious in Scala code, and are generally applicable to a wide range of uses.

By learning and becoming familiar with the core operations of Iterable, and with the safe operation chaining of monadic collections, you can better leverage them as a core foundation for your applications in Scala.

This chapter concludes the Scala instructions for Part 1. In Part 2 we will cover object-oriented Scala, a core feature of this programming language, while continuing to use what we have learned thus far.

Exercises

1. The Fibonacci series starts with the numbers “1, 1” and then computes each successive element as the sum of the previous two elements. We’ll use this series to get familiarized with the collections in this chapter.

a. Write a function that returns a list of the first x elements in the Fibonacci series Can you write this with a Buffer? Would a Builder be appropriate here?

b. Write a new Fibonacci function that adds new Fibonacci numbers to an existing list of numbers. It should take a list of numbers (List[Int]) and the count of new elements to add and return a new list (List[Int]). Although the input list and returned lists are immutable, you should be able to use a mutable list inside your function. Can you also write this function using only immutable lists? Which version, using mutable versus immutable collections, is more appropriate and readable?

c. The Stream collection is a great solution for creating a Fibonacci series. Create a stream that will generate a Fibonacci series. Use it to print out the first 100 elements in the series, in a formatted report of 10 comma-delimited elements per line.

d. Write a function that takes an element in the Fibonacci series and returns the following element in the series. For example, fibNext(8) should return 13. How will you handle invalid input such as fixNext(9)? What are your options for conveying the lack of a return value to callers?

2. In the example for Array collections (see Arrays) we used the java.io.File(<path>).listFiles operation to return an array of files in the current directory. Write a function that does the same thing for a directory, and converts each entry into its String representation using the toString method. Filter out any dot-files (files that begin with the . character) and print the rest of the files separated by a semicolon (;). Test this out in a directory on your computer that has a significant number of files.

3. Take the file listing from exercise 2 and print a report showing each letter in the alphabet followed by the number of files that start with that letter.

4. Write a function to return the product of two numbers that are each specified as a String, not a numeric type. Will you support both integers and floating-point numbers? How will you convey if either or both of the inputs are invalid? Can you handle the converted numbers using a match expression? How about with a for-loop?

5. Write a function to safely wrap calls to the JVM library method System.getProperty(<String>), avoiding raised exceptions or null results. System.getProperty(<String>) returns a JVM environment property value given the property’s name. For example,System.getProperty("java.home") will return the path to the currently running Java instance, while System. getProperty("user.timezone") returns the time zone property from the operating system. This method can be dangerous to use, however, because it may throw exceptions or return null for invalid inputs. Try invoking System.getProperty("") or System.getProperty("blah") from the Scala REPL to see how it responds.

Experienced Scala developers build their own libraries of functions that wrap unsafe code with Scala’s monadic collections. Your function should simply pass its input to the method and ensure that exceptions and null values are safely handled and filtered. Call your function with the example property names used here, including the valid and invalid ones, to verify that it never raises exceptions or returns null results.

6. Write a function that reports recent GitHub commits for a project. GitHub provides an RSS feed of recent commits for a given user, repository, and branch, containing XML that you can parse out with regular expressions. Your function should take the user, repository, and branch, read and parse the RSS feed, and then print out the commit information. This should include the date, title, and author of each commit.

You can use the following RSS URL to retrieve recent commits for a given repository and branch:

https://github.com/<user name>/<repo name>/commits/<branch name>.atom

Here is one way to grab the RSS feed as a single string:

scala> val u = "https://github.com/scala/scala/commits/2.11.x.atom"

u: String = https://github.com/scala/scala/commits/2.11.x.atom

scala> val s = io.Source.fromURL(u)

s: scala.io.BufferedSource = non-empty iterator

scala> val text = s.getLines.map(_.trim).mkString("")

text: String = <?xml version="1.0" encoding="UTF-8"?><feed xmlns=...

Working with the XML will be a bit tricky. You may want to use text.split(<token>) to split the text into the separate <entry> components, and then use regular expression capture groups (see Regular expressions) to parse out the <title> and other elements. You could also just try iterating through all the lines of the XML file, adding elements to a buffer as you find them, and then converting that to a new list.

Once you have completed this exercise (and there is a lot to do here), here are some additional features worth investigating:

a. Move the user, repo, and branch parameters into a tuple parameter.

b. Following exercise (a), have the function take a list of GitHub projects and print a report of each one’s commits, in order of specified project.

c. Following exercise (b), retrieve all of the projects, commit data concurrently using futures, await the result (no more than 5 seconds), and then print a commit report for each project, in order of project specified.

d. Following exercise (c), mix the commits together and sort by commit date, then print your report with an additional “repo” column.

These additional features will take some time to implement, but are definitely worthwhile for learning and improving your Scala development skills.

Once you have finished these features, test out your commit report using entries from the following projects:

https://github.com/akka/akka/tree/master

https://github.com/scala/scala/tree/2.11.x

https://github.com/sbt/sbt/tree/0.13

https://github.com/scalaz/scalaz/tree/series/7.2.x

These features are all active (as of 2014), so you should see an interesting mix of commit activity data in your report. It’s worthwhile to browse the repositories for these core open source Scala projects, or at least their documentation, to understand some of the excellent work being done.

7. Write a command-line script to call your GitHub commit report function from exercise 6 and print out the results. This will require a Unix shell; if you are on a Windows system you will need a compatible Unix environment such as Cygwin or Virtualbox (running a Unix virtual machine). You’ll also need to install SBT (Simple Build Tool), a build tool that supports dependency management and plug-ins and is commonly used by Scala projects. You can download SBT from http://www.scala-sbt.org/ for any environment, including an MSI Windows Installer version. SBT is also available from popular package managers. If you are using Homebrew on OS X you can install it with brew install sbt.

ISN’T SBT HARD TO LEARN?

Maybe. In this exercise we’ll only use it as a shell script launcher, so you can get comfortable with writing and executing shell scripts in Scala. We’ll cover how to write SBT-built scripts to manage your own projects in later chapters.

Here is an example SBT-based Scala script that reads the command-line arguments as a List and prints a greeting. The comment block starting with triple asterisks is reserved for SBT settings. In this script we are specifying that we want version 2.11.1 of the Scala language to be used:

#!/usr/bin/env sbt -Dsbt.main.class=sbt.ScriptMain

/***

scalaVersion := "2.11.1"

*/

def greet(name: String): String = s"Hello, $name!"

// Entry point for our script

args.toList match {

case List(name) => {

val greeting = greet(name)

println(greeting)

}

case _ =>

println("usage: HelloScript.scala <name>")

}

Copy this into a file titled HelloScript.scala, and change the permissions to be executable (chmod a+x HelloScript.scala in a Unix environment). Then you can run the script directly:

$ ./HelloScript.scala Jason

[info] Set current project to root-4926629s8acd7bce0b (in

build file:/Users/jason/.sbt/boot/4926629s8acd7bce0b/)

Hello, Jason!

Your commit report script will need to take multiple GitHub projects as arguments. To keep the arguments concise, you may want to combine each project’s input into a single string to be parsed, such as scala/scala/2.11.x.

The printout should be clean, well-formatted, and easily readable. Using fixed column widths could help, using the printf-style formatting codes in string interpolation (see String interpolation).