Going to Collections - The Book of F#: Breaking Free with Managed Functional Programming (2014)

The Book of F#: Breaking Free with Managed Functional Programming (2014)

Chapter 6. Going to Collections

Programming tasks often require working with collections of data. The .NET Framework has always supported this scenario with constructs such as arraysand the ArrayList class, but it wasn’t until generics were introduced in .NET 2.0 that collection support really matured.

F# builds upon .NET’s legacy by not only supporting all of the existing collection types but also bringing a few of its own to the party. In this chapter, we’ll see the role a few of the classic collection types play in F# and then explore the F#-specific types. Along the way, we’ll see how the built-in collection modules add some functional flair and make working with both the traditional and F#-specific types a breeze.

Sequences

In .NET, sequence is an all-encompassing term for a collection of values that share a common type. More specifically, a sequence is any type that implements IEnumerable<'T>.

Nearly all of the major collection types in .NET are sequences. For instance, the generic collection types (like Dictionary<'TKey, 'TValue> andList<'T>) and even some types (like String) that aren’t typically thought of as collections implement IEnumerable<'T>. Conversely, the legacy collection types (like ArrayList and Hashtable) predate generics, so they implement only the nongeneric IEnumerable interface. Accordingly, they don’t enforce a single, common type, and they’re generally regarded as enumerable collections rather than sequences.

In F#, IEnumerable<'T> is often expressed as seq<'T> or 'T seq. Type annotations like values : 'A seq compile to IEnumerable<'A>, and any type that implements IEnumerable<'T> can be used wherever a sequence is expected. Because I Enumerable<'T> defines only the overloaded GetEnumerator method, sequences are inherently immutable. Be careful when using the specific collection types directly, however, because underlying implementations may be mutable.

Creating Sequences

Today’s .NET developers take working with sequences for granted, but before LINQ’s introduction, programming directly against IEnumerable<'T> was relatively rare. Instead, developers typically coded against specific collection types. LINQ’s IEnumerable<'T> extension methods brought the abstraction to the forefront, though, and taught developers that they didn’t always need to know anything about a collection other than that it implements the GetEnumerator method. Even with all of the goodness that LINQ gives us, it provides only a framework for working with IEnumerable<'T>; creating arbitrary sequences in LINQ still requires a method to create an instance of a specific sequence type.

F# takes the abstraction even further than LINQ by codifying sequence creation into the language through concepts like sequence and range expressions. While each sequence is ultimately still an implementation of IEnumerable<'T>, the compiler is free to provide its own implementations. The Seq module also includes several functions for creating new sequences.

Sequence Expressions

Sequence expressions allow you to create new sequences by iteratively applying other F# expressions and yielding (returning) the results into a new sequence. In some situations, particularly when you are working with large or computationally expensive collections, the sequence types used internally by sequence expressions are preferable to other collection types because they create values only as needed. These sequence types typically also hold only one value in memory at a time, making them ideal for large data sets.

NOTE

Sequence expressions are technically a built-in workflow called a computation expression. We’ll cover these constructs in detail in Chapter 12.

You create a sequence expression by enclosing one or more expressions within a sequence builder and using a do binding in conjunction with the yield keyword. For example, say you have a file named ArnoldMovies.txt that contains the following data:

The Terminator,1984

Predator,1987

Commando,1985

The Running Man,1987

True Lies,1994

Last Action Hero,1993

Total Recall,1990

Conan the Barbarian,1982

Conan the Destroyer,1984

Hercules in New York,1969

You can read each line of the text file into a sequence with a sequence expression like this:

let lines = seq { use r = new System.IO.StreamReader("ArnoldMovies.txt")

while not r.EndOfStream do yield r.ReadLine() }

Here, a while loop is used to iteratively read lines from a StreamReader, yielding a line for each iteration. (In some simpler sequence expressions—such as those using an enumerable for loop—do yield can be replaced with the -> operator, but for consistency I usually stick with do yield.)

If you wanted to write this sequence to the console, you could send it to the printfn function and use the default formatter (via the %A token), but only the first four values are included in the output, as shown here:

> lines |> printfn "%A";;

seq ["The Terminator,1984"; "Predator,1987"; "Commando,1985"; "The Running Man,1987"; ...]

val it : unit = ()

To print every value in the sequence, you need to force enumeration over the entire construct.

Range Expressions

Although range expressions resemble the slice expressions you learned about in Chapter 4 in that they use the .. operator, they’re actually specialized sequence expressions that allow you to create sequences over a range of values. Range expressions are similar to the Enumerable.Rangemethod but are a bit more powerful because they’re not restricted to integers. For instance, you can easily create a sequence containing the integers 0 through 10 like this:

seq { 0..10 }

Or you could create a sequence containing 0 through 10 as floats this way:

seq { 0.0..10.0 }

Likewise, you could create a sequence containing the characters a through z like this:

seq { 'a'..'z' }

In most cases, you can also include a value that identifies how many items to skip between values when generating the sequence. Creating a sequence containing the integral multiples of 10 from 0 through 100 is easy with the following expression:

seq { 0..10..100 }

This range expression form works only with numeric types, so you can’t use it with character data. For example, the following expression results in an error.

seq { 'a'..2..'z' }

Finally, you can create sequences with declining values by using a negative step value like this:

seq { 99..-1..0 }

Empty Sequences

When you need a sequence without any elements, you can turn to the Seqmodule’s generic empty function to create one for you. For instance, to create an empty string sequence, you could call Seq.empty like this:

> let emptySequence = Seq.empty<string>;;

val emptySequence : seq<string>

Alternatively, if you don’t need any particular type, you can let the compiler automatically generalize the sequence by omitting the type argument:

> let emptySequence = Seq.empty;;

val emptySequence : seq<'a>

Initializing a Sequence

Another module function, Seq.init, creates a sequence with up to a specified number of elements. For example, to create a sequence containing 10 random numbers, you could write:

> let rand = System.Random();;

val rand : System.Random

> Seq.init 10 (fun _ -> rand.Next(100));;

val it : seq<int> = seq [22; 34; 73; 42; ...]

Working with Sequences

The Seq module provides a number of functions for working with any sequence. The list of functions covered next is a sampling of the most useful functions in the Seq module, but it is by no means comprehensive.

While each of the functions discussed in the coming sections belongs to the Seq module, many have specialized counterparts in the other collection modules. In the interest of space, I’ll cover the common functions only once, but I strongly encourage you to explore the other modules and discover the right tools for your task.

WHEN IS A FUNCTION NOT A FUNCTION?

You may have noticed in both of the empty sequence examples that Seq.empty was invoked without any arguments. Seq.empty differs from every function we’ve encountered so far in that it behaves more like a basic value binding than a function. In fact, if you were to call Seq.empty with an argument, you’d get a compiler error telling you that the value (Seq.empty) is not a function and cannot be applied.

Why is Seq.empty called a function when the compiler claims otherwise? Because it, along with some other functions (such as Operators.typeof and Operators.typedefof), is a special-case value called a type function. Type functions are generally reserved for pure functions that compute values based on their type arguments, and therefore—despite being represented as methods in the compiled assemblies—they are treated as values within F# code.

Finding Sequence Length

You use Seq.length to determine how many elements a sequence contains like this:

seq { 0..99 } |> Seq.length

Be careful with Seq.length, though, because, depending on the underlying collection type, it can force enumeration of the entire sequence or otherwise impair performance. Consider the following code, which checks if a sequence is empty using Seq.length = 0:

seq { for i in 1..10 do

printfn "Evaluating %i" i

yield i }

|> Seq.length = 0

To determine the sequence’s length, the system must iterate over the sequence by calling the enumerator’s MoveNext method until it returns false. Each invocation of MoveNext involves doing whatever work is necessary to obtain the next value. In this case, getting the next value involves writing a string to the console, as shown here:

Evaluating 1

Evaluating 2

Evaluating 3

-- snip --

Evaluating 10

val it : bool = false

Writing some text to the console is trivial, but even so, it is unnecessary work since the result isn’t actually being used for anything. Going beyond this simple example, you can easily imagine each call to MoveNext triggering an expensive computation or database call. If you just need to determine whether the sequence has any elements, you should use the Seq.isEmpty function instead.

Seq.isEmpty checks whether a sequence contains any elements without forcing enumeration of the entire sequence. Consider the following code, which replaces Seq.length = 0 with Seq.isEmpty:

seq { for i in 1..10 do

printfn "Evaluating %i" i

yield i }

|> Seq.isEmpty

Because Seq.isEmpty returns false as soon as it finds an element, MoveNext is called only once, resulting in:

Evaluating 1

val it : bool = false

As you can see, although the sequence expression defines 10 elements, only the first one was printed because evaluation stopped as soon as the function found a value.

Iterating over Sequences

The Seq.iter function is the functional equivalent of the enumerable for loop in that it iterates over a sequence, applying a function to each element. For example, to print each element of a sequence containing the values 0 through 99, you could write:

> seq { 0..99 } |> Seq.iter (printfn "%i");;

0

1

2

-- snip --

97

98

99

val it : unit = ()

Transforming Sequences

Seq.map is similar to Seq.iter in that it applies a function to every element in a sequence, but unlike Seq.iter, it builds a new sequence with the results. For instance, to create a new sequence containing the squares of elements from a sequence, you could write:

> seq { 0..99 } |> Seq.map (fun i -> i * i);;

val it : seq<int> = seq [0; 1; 4; 9; ...]

Sorting Sequences

The Seq module defines several functions for sorting sequences. Each sorting function creates a new sequence, leaving the original unchanged.

The simplest sorting function, Seq.sort, orders the elements using a default comparison based on the IComparable<'T> interface. For instance, you can apply Seq.sort to a sequence of random integer values like this:

> let rand = System.Random();;

val rand : System.Random

> Seq.init 10 (fun _ -> rand.Next 100) |> Seq.sort;;

val it : seq<int> = seq [0; 11; 16; 19; ...]

For more complex sorting needs, you can use the Seq.sortBy function. In addition to the sequence to be sorted, it accepts a function that returns the value to sort upon for each element in the sequence.

For example, each movie listed in ArnoldMovies.txt in Sequence Expressions included the release year. If you wanted to sort the movies by their release years, you could revise the sequence expression to isolate the individual values as follows:

let movies =

seq { use r = new System.IO.StreamReader("ArnoldMovies.txt")

while not r.EndOfStream do

let l = r.ReadLine().Split(',')

yield ① l.[0], int l.[1] }

At ① the sequence expression now yields tuples containing each movie title and release year. We can send the sequence to Seq.sortBy along with the snd function (to get the year) like this:

> movies |> Seq.sortBy snd;;

val it : seq<string * int> =

seq

[("Hercules in New York", 1969); ("Conan the Barbarian", 1982);

("The Terminator", 1984); ("Conan the Destroyer", 1984); ...]

Alternatively, to sort the movies by title, you can replace snd with fst.

> seq { use r = new System.IO.StreamReader(fileName)

while not r.EndOfStream do

let l = r.ReadLine().Split(',')

yield l.[0], int l.[1] }

|> Seq.sortBy fst;;

val it : seq<string * int> =

seq

[("Commando", 1985); ("Conan the Barbarian", 1982);

("Conan the Destroyer", 1984); ("Hercules in New York", 1969); ...]

Filtering Sequences

When you want to work only with elements that meet certain criteria, you can use the Seq.filter function to create a new sequence containing only those elements. For example, continuing with the movie theme, you can get the movies released prior to 1984 like this:

> movies |> Seq.filter (fun (_, year) -> year < 1985);;

val it : seq<string * int> =

seq

[("The Terminator", 1984); ("Conan the Barbarian", 1982);

("Conan the Destroyer", 1984); ("Hercules in New York", 1969)]

Aggregating Sequences

The Seq module provides a number of functions for aggregating the elements in a sequence. The most flexible (and complex) of the aggregation functions is Seq.fold, which iterates over a sequence, applying a function to each element and returning the result as an accumulator value. For example, Seq.fold makes it easy to compute the sum of a sequence’s elements:

> seq { 1 .. 10 } |> Seq.fold(fun s c -> s + c)0;;

val it : int = 55

This example shows just one way to add the values 1 through 10. The function that Seq.fold uses for aggregation ① accepts two values: an aggregation value (essentially a running total), and the current element. We also need to give the fold function an initial aggregation value ②, which we do with 0. As fold executes, it applies the aggregation function to each element in the sequence and returns the new aggregation value for use in the next iteration.

Because the addition operator function itself satisfies the requirements for the aggregation function, we can simplify the previous expression like this:

> seq { 1..10 } |> Seq.fold (+) 0;;

val it : int = 55

A slightly more specialized aggregation function is Seq.reduce. The reduce function is very much like the fold function except that the aggregation value that’s passed through the computation is always the same type as the sequence’s elements, whereas fold can transform the data to another type. The reduce function also differs from fold in that it doesn’t accept an initial aggregation value. Instead, reduce initializes the aggregation value to the first value in the sequence. To see Seq.reduce in action, we can rewrite the previous expression as follows:

> seq { 1 .. 10 } |> Seq.reduce (+);;

val it : int = 55

As expected, the result of adding the items in the sequence is the same regardless of whether we use Seq.fold or Seq.reduce.

Seq.fold and Seq.reduce aren’t the only ways to calculate aggregate values from a sequence; some common aggregations like summations and averages have functions of their own. For example, rather than using Seq.reduce to calculate the sum of the elements like we did previously, we can use Seq.sum:

> seq { 1..10 } |> Seq.sum;;

val it : int = 55

Similarly, to compute the average, you can use Seq.average like this:

> seq { 1.0..10.0 } |> Seq.average;;

val it : float = 5.5

One thing to note about Seq.average is that it works only with types that support division by an integer. If you try to use it with a sequence of integers, you’ll receive the following error:

> seq { 1..10 } |> Seq.average;;

seq { 1..10 } |> Seq.average;;

-----------------^^^^^^^^^^^

stdin(2,18): error FS0001: The type 'int' does not support the operator 'DivideByInt'

Like Seq.sort, the Seq.sum and Seq.average functions have the Seq.sumBy and Seq.averageBy counterparts that accept a function that lets you identify which value should be used in the calculation. The syntax for these functions is the same as Seq.sortBy, so I’ll leave it to you to experiment a bit more with the Seq module.

Arrays

F# arrays are the same construct as traditional .NET arrays. They contain a fixed number of values (each of the same type) and are zero-based. Although an array binding itself is immutable, individual array elements are mutable, so you need to be careful that you don’t introduce unwanted side effects. That said, the mutable nature of arrays makes them more desirable in some situations than other collection constructs because no further allocations are required to change element values.

Creating Arrays

F# provides a number of ways to create new arrays and control each element’s initial value, using both native syntax and module functions.

Array Expressions

One of the most common ways to create an array is with an array expression. Array expressions consist of a semicolon-delimited list of values enclosed between the [| and |] tokens. For instance, you can create an array of strings like this (if you place each value on a separate line, you can omit the semicolons):

> let names = [| "Rose"; "Martha"; "Donna"; "Amy"; "Clara" |];;

val names : string [] = [|"Rose"; "Martha"; "Donna"; "Amy"; "Clara"|]

Finally, you can generate an array by enclosing a sequence expression between [| and |]. Unlike with the sequence builder, however, the array will be fully constructed when the array expression is evaluated. Compare this example with the corresponding one from the sequence expression discussion:

> let lines = [| use r = new System.IO.StreamReader("ArnoldMovies.txt")

while not r.EndOfStream do yield r.ReadLine() |];;

val lines : string [] =

[|"The Terminator,1984"; "Predator,1987"; "Commando,1985";

"The Running Man,1987"; "True Lies,1994"; "Last Action Hero,1993";

"Total Recall,1990"; "Conan the Barbarian,1982";

"Conan the Destroyer,1984"; "Hercules in New York,1969"|]

As you can see, the default array print formatter prints every element (it caps the output at 100 elements) rather than printing only the first four.

Empty Arrays

Should you need to create an empty array, you can use an empty pair of square brackets:

let emptyArray = [| |]

The downside of this approach is that, depending on context, you may need to include a type annotation to ensure that the compiler doesn’t automatically generalize the array. Such a definition would look something like this:

let emptyArray : int array = [| |];;

In the preceding example, the type annotation, int array, is an English-like syntax. If you prefer a more traditional form, you could use int[] instead. Without the type annotation, the compiler would define the array as 'a [].

Another way to create an empty array is with the Array.empty function. Just like its counterpart in the Seq module, Array.empty is a type function, so you invoke it without any arguments to create a zero-length array. To create an empty string array with this function, you simply write:

Array.empty<string>

If you prefer to let the compiler infer the underlying type or automatically generalize it, you can omit the type parameter.

Initializing Arrays

To quickly create an array where all elements are initialized to the underlying type’s default value, you can use Array.zeroCreate. Suppose you know that you need an array of five strings, but you don’t yet know what values will be stored in each element. You could create the array like this:

> let stringArray = Array.zeroCreate<string> 5;;

val stringArray : string [] = [|null; null; null; null; null|]

Because Array.zeroCreate uses the underlying type’s default value, it’s possible that the elements will be initialized to null like they were here. If null is valid for the type and you’re creating arrays like this, you’ll need to code against NullReferenceExceptions.

Alternatively, Array.init lets you initialize each element to a specific value. Array.init is the array-specific equivalent of Seq.init. Its syntax is the same, but it creates and returns an array instead. For instance, to create a new array where the elements are initialized to the empty string, you could write:

> let stringArray = Array.init 5 (fun _ -> "");;

val stringArray : string [] = [|""; ""; ""; ""; ""|]

Here, the supplied function only returns the empty string, but your initialization function could easily have more complicated logic, allowing you to compute a different value for each element.

Working with Arrays

Working with arrays in F# is similar to working with them in other .NET languages, but F# extends their usefulness with constructs like slice expressions and the Array module.

Accessing Elements

Individual array elements are accessible through an indexed property. For instance, to retrieve the fourth element from the lines array defined previously, you’d write:

> lines.[3];;

val it : string = "The Running Man,1987"

You can combine the indexer syntax with the assignment operator to change individual elements of an array. For instance, to replace Last Action Hero, you could write:

lines.[5] <- "Batman & Robin,1997"

If you prefer a more functional approach to retrieving and mutating array elements, the Array module has you covered with the get and set functions. In the following example we’ll create an array, change the second element’s value, retrieve the new value, and write it to the console.

> let movies = [| "The Terminator"; "Predator"; "Commando" |];;

val movies : string [] = [|"The Terminator"; "Predator"; "Commando"|]

> Array.set movies 1 "Batman & Robin"

Array.get movies 1 |> printfn "%s";;

Batman & Robin

val it : unit = ()

Finally, arrays also support slice expressions. As noted in Chapter 4, slice expressions let you easily retrieve a range of values from a collection like this:

> lines.[1..3];;

val it : string [] =

[|"Predator,1987"; "Commando,1985"; "The Running Man,1987"|]

Copying Arrays

You can easily copy the elements from one array to a new array with Array.copy. Here, we create an array containing the numbers 1 through 10 and immediately copy them to another.

[| 1..10 |] |> Array.copy

Behind the scenes, Array.copy is a wrapper around the CLR’s Array.Clone method, which creates a shallow copy of the source array. Array.copy offers the added benefit of automatically downcasting the object instance returned by Clone to the appropriate array type; that is, passing an integer array directly to Array.Clone will give you an obj instance, whereas passing that same array to Array.copy will give you an instance of int array.

Sorting Arrays

Arrays can be sorted like any other sequence, but the Array module provides a few specialized sorting functions to take advantage of the fact that individual array elements are mutable. Unfortunately, each of these functions returns unit instead of the sorted array, so they’re not particularly effective in pipelining or composition chains.

The first in-place sorting function, sortInPlace, sorts an array with the default comparison mechanism. The following snippet shows how to sort an array of random integers.

> let r = System.Random()

let ints = Array.init 5 (fun _ -> r.Next(-100, 100));;

val r : System.Random

val ints : int [] = [|-94; 20; 13; -99; 0|]

> ints |> Array.sortInPlace;;

val it : unit = ()

> ints;;

val it : int [] = [|-99; -94; 0; 13; 20|]

If you need more control over how sorting is performed, you can turn to the sortInPlaceBy or sortInPlaceWith functions. The sortInPlaceBy function lets you provide a transformation function that’s used in the sorting process. The sortInPlaceWith function accepts a comparison function that returns an integer where less than zero means the first value is greater than the second, greater than zero means that the first value is less than the second value, and zero means the first and second values are equal.

To better understand both approaches, consider the following array containing some movies and their release years as tuples.

let movies = [| ("The Terminator", "1984")

("Predator", "1987")

("Commando", "1985")

("Total Recall", "1990")

("Conan the Destroyer", "1984") |]

The easiest way to sort by year is to just project the year value via sortInPlaceBy like this:

> movies |> Array.sortInPlaceBy (fun (_, y) -> y)

movies;;

val it : (string * string) [] =

[|("The Terminator", "1984"); ("Conan the Destroyer", "1984");

("Commando", "1985"); ("Predator", "1987"); ("Total Recall", "1990")|]

Alternatively, we can directly compare two elements with sortInPlaceWith:

> movies |> Array.sortInPlaceWith (fun (_, y1) (_, y2) -> if y1 < y2 then -1

elif y1 > y2 then 1

else 0)

movies;;

val it : (string * string) [] =

[|("The Terminator", "1984"); ("Conan the Destroyer", "1984");

("Commando", "1985"); ("Predator", "1987"); ("Total Recall", "1990")|]

As you can see, sortInPlaceBy allows you to sort according to the default equality semantics for a particular element’s underlying type, whereas sortInPlaceWith allows you to essentially define your own equality semantics for each element in the array.

Multidimensional Arrays

All of the arrays we’ve looked at so far have been one-dimensional. While it’s also possible to create multidimensional arrays, it’s a bit more complicated because there’s no direct syntactic support. For two-dimensional arrays, you can pass a sequence of sequences (typically either arrays or lists) to the array2D operator. To create arrays with more than two dimensions, you need to use either the Array3D.init or Array4D.init functions. Multidimensional arrays have modules (like Array2D and Array3D) that contain specialized subsets of those defined in the Arraymodule.

NOTE

The maximum number of dimensions F# supports is four.

Suppose you wanted to represent the movies from the previous sections as a two-dimensional array instead of as an array of tuples. You could write something like the following, which passes an array of arrays to the array2D operator:

let movies = array2D [| [| "The Terminator"; "1984" |]

[| "Predator"; "1987" |]

[| "Commando"; "1985" |]

[| "The Running Man"; "1987" |]

[| "True Lies"; "1994" |]

[| "Last Action Hero"; "1993" |]

[| "Total Recall"; "1990" |]

[| "Conan the Barbarian"; "1982" |]

[| "Conan the Destroyer"; "1984" |]

[| "Hercules in New York"; "1969" |] |]

You can access any value in the two-dimensional array with the familiar indexer syntax. For instance, to get Commando’s release year you’d write movies.[2, 1], which would return 1985. Much more interesting, though, is what you can do with slice expressions.

Slice expressions make it easy to create new arrays containing subsets of data from the source. For instance, you can slice the movies array vertically to create new arrays containing only the movie titles or release years like this:

> movies.[0..,0..0];;

val it : string [,] = [["The Terminator"]

["Predator"]

["Commando"]

["The Running Man"]

-- snip --]

> movies.[0..,1..1];;

val it : string [,] = [["1984"]

["1987"]

["1985"]

["1987"]

-- snip --]

You can also slice arrays horizontally to create new arrays containing only a few rows:

> movies.[1..3,0..];;

val it : string [,] = [["Predator"; "1987"]

["Commando"; "1985"]

["The Running Man"; "1987"]]

Multidimensional arrays are useful when the data has a nice, rectangular shape, but they don’t work when even a single row has a different number of items. Consider what happens if we try to include a director name in the two-dimensional movies array (for brevity, we’ll just work with three titles here).

> let movies = array2D [| [| "The Terminator"; "1984"; "James Cameron" |]

[| "Predator"; "1987"; "John McTiernan" |]

[| "Commando"; "1985" |] |];;

System.ArgumentException: The arrays have different lengths.

Parameter name: vals

-- snip --

Stopped due to error

Of course, one possible solution would be to provide an empty string as the third element in the row that’s missing the director name. Alternatively, you can use a jagged array.

Jagged Arrays

Jagged arrays are arrays of arrays. Unlike multidimensional arrays, jagged arrays don’t require a rectangular structure. To convert the preceding failing example, we just need to remove the call to the array2D function.

> let movies = [| [| "The Terminator"; "1984"; "James Cameron" |]

[| "Predator"; "1987"; "John McTiernan" |]

[| "Commando"; "1985" |] |];;

val movies : string [] [] =

[|[|"The Terminator"; "1984"; "James Cameron"|];

[|"Predator"; "1987"; "John McTiernan"|]; [|"Commando"; "1985"|]|]

As you might expect, since movies is now a jagged array, you need to use a different syntax to access each element. You also need to code a bit more defensively when using jagged arrays because there’s no guarantee that a particular index will be valid for any given row. That said, you can get the director name from the second row like this:

> movies.[1].[2];;

val it : string = "John McTiernan"

ANY WAY YOU SLICE IT

F# 3.1 features a few extensions to array slicing that aren’t covered here but do prove useful. Array slicing in F# 3.0 requires slices to have the same dimensions as the source array. Under F# 3.1 this restriction has been removed, so you can create a one-dimensional slice from a two-dimensional array, and so on.

Lists

Lists are used extensively in F# development. When .NET developers discuss lists, they typically mean the generic List<'T> class. Although it’s possible (and sometimes even desirable) to use the generic list in F#, the language defines another immutable construct based on singly linked lists. In F#, lists created with the list syntax compile to instances of the FSharpList<'T> class found in the Microsoft.FSharp.Collections namespace, and that’s the kind of list we’ll be covering in this section.

Aside from both List<'T> and FSharpList<'T> being generic sequence types (they both implement IEnumerable<'T>), they have little in common and cannot be used interchangeably. You need to be careful to not mix list types when working in multilanguage solutions.

NOTE

You can use the generic List<'T> class directly by opening the System.Collections.Generic namespace or through the built-in ResizeArray<'T> type abbreviation.

Creating Lists

Creating lists in F# is so similar to creating arrays that I won’t spend much time explaining the various forms here. The only notable syntactic difference between creating arrays and lists is the brace style. To create a new list, you enclose semicolon-delimited values, range expressions, or list sequence expressions between square brackets ([]) like this:

> let names = [ "Rose"; "Martha"; "Donna"; "Amy"; "Clara" ];;

val names : string list = ["Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

> let numbers = [ 1..11 ];;

val numbers : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11]

To create an empty list, you can use either List.empty or a pair of empty brackets.

Working with Lists

Although there are some similarities between working with F# lists and List<'T>, they’re mostly syntactic and deal with accessing individual known elements. Beyond that, F# lists are quite unique, especially because of their head and tail structure, which lends itself well to functional programming and to recursive techniques in particular.

Accessing Elements

When you want to get the element at a particular position, you can use the familiar indexer syntax just like you would with an array. Alternatively, you can use List.nth to get the same result:

> List.nth [ 'A'..'Z' ] 3;;

val it : char = 'D'

What’s more interesting (and often more useful) than accessing a particular element by index is a list’s head and tail. A list’s head is simply its first element, whereas its tail is all elements except the head. You can get a list’s head and tail through the Head or Tail properties or theList.head or List.tail module functions. Here’s an example using the module functions:

> let names = [ "Rose"; "Martha"; "Donna"; "Amy"; "Clara" ];;

val names : string list = ["Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

> List.head names;;

val it : string = "Rose"

> List.tail names;;

val it : string list = ["Martha"; "Donna"; "Amy"; "Clara"]

NOTE

Pattern matching is another way to get the head and tail, but we’ll save that discussion for Chapter 7.

Why would you want to get only the first element or everything else? Recursion. If you had to iterate over a list using indexes, you’d need to track both the list and the current position. By separating a list into head and tail components, you’re free to operate against the head and then iterate with the tail.

Consider this function, which returns a Boolean value indicating whether a list contains a particular value (much like the List.exists module function).

let rec contains fn l =

if l = [] then false

else fn(List.head l) || contains fn (List.tail l)

The contains function accepts both a function for testing the elements and a list to scan. The first thing contains does is check whether the supplied list is empty. If the list is empty, contains immediately returns false; otherwise, it tests the list’s head with the provided function or recursively calls contains with both the function and the list’s tail.

Now let’s test for a few values, starting with an empty list:

> [] |> contains (fun n -> n = "Rose");;

val it : bool = false

You can see that contains correctly returns false when the list is empty, but what about a populated list?

> let names = [ "Rose"; "Martha"; "Donna"; "Amy"; "Clara" ];;

val names : string list = ["Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

> names |> contains (fun n -> n = "Amy");;

val it : bool = true

> names |> contains (fun n -> n = "Rory");;

val it : bool = false

The contains function recursively walked the list, examining each element with the supplied function and passing the tail to contains if the element didn’t match.

Combining Lists

Even though F# lists are immutable, we can still construct new lists from existing ones. F# provides two primary mechanisms: the cons operator (::) and list concatenation with the @ operator.

The cons operator (so named because it constructs a new list) essentially prepends an item to an existing list like this:

> let names = [ "Rose"; "Martha"; "Donna"; "Amy"; "Clara" ]

let newNames = "Ace" :: names;;

val names : string list = ["Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

val newNames : string list =

["Ace"; "Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

The cons operator doesn’t make any changes to the existing list. Instead, it simply creates a new list with its head set to the new value and tail set to the existing list. The cons operator can add only a single item to the list, but since it’s at the beginning of the list it’s a quick operation. If you want to combine two lists, you’ll need to turn to list concatenation.

To concatenate two lists, you can use either the list concatenation operator (@) or the List.append module function, as follows:

> let classicNames = [ "Susan"; "Barbara"; "Sarah Jane" ]

let modernNames = [ "Rose"; "Martha"; "Donna"; "Amy"; "Clara" ];;

val classicNames : string list = ["Susan"; "Barbara"; "Sarah Jane"]

val modernNames : string list = ["Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

> classicNames @ modernNames;;

val it : string list =

["Susan"; "Barbara"; "Sarah Jane"; "Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

> List.append classicNames modernNames;;

val it : string list =

["Susan"; "Barbara"; "Sarah Jane"; "Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

There’s no difference between the list created with the concatenation operator and the list created by List.append. Internally, List.append wraps the append operator so they’re functionally equivalent.

To combine more than two lists at once, you can pass a sequence of lists to List.concat like this:

> List.concat [[ "Susan"; "Sarah Jane" ]

[ "Rose"; "Martha" ]

["Donna"; "Amy"; "Clara"]];;

val it : string list =

["Susan"; "Sarah Jane"; "Rose"; "Martha"; "Donna"; "Amy"; "Clara"]

Now, what started as three independent lists was combined into a single list containing each item.

Sets

In F#, a set is an immutable collection of unique values whose order is not preserved. F# sets closely correlate to mathematical sets (think Venn diagrams) and provide a number of operations useful for comparing sets.

Creating Sets

There are no syntactic niceties like special bracket formats for creating sets, so if you want to use one, you’ll need to rely on either the type constructor or some of the Set module functions (like Set.ofList, which creates a set from an F# list). For instance, to create a set containing the letters of the alphabet, you could write:

> let alphabet = [ 'A'..'Z' ] |> Set.ofList;;

val alphabet : Set<char> =

set ['A'; 'B'; 'C'; 'D'; 'E'; 'F'; 'G'; 'H'; 'I'; ...]

The Set<'T> class defines methods to add and remove values from a set, but because F# sets are immutable, both of these methods return new sets and leave the original intact. The Add method can be useful for populating a new set from an empty one, like so:

> let vowels = Set.empty.Add('A').Add('E').Add('I').Add('O').Add('U');;

val vowels : Set<char> = set ['A'; 'E'; 'I'; 'O'; 'U']

Of course, creating sets in this manner is a more object-oriented approach than is typical in F#.

Working with Sets

Because sets are so closely related to mathematical sets, the Set module provides several functions for performing a variety of set operations like finding unions, intersections, and differences, and even determining if two sets are related as subsets or supersets.

Unions

To find the union of two sets—that is, those elements contained within either the first or second set—you use the Set.union function as follows:

> let set1 = [ 1..5 ] |> Set.ofList

let set2 = [ 3..7 ] |> Set.ofList

Set.union set1 set2;;

val set1 : Set<int> = set [1; 2; 3; 4; 5]

val set2 : Set<int> = set [3; 4; 5; 6; 7]

val it : Set<int> = set [1; 2; 3; 4; 5; 6; 7]

Here, set1 contains the integers one through five, while set2 contains the integers three through seven. Because the union of two sets contains each distinct value found in either set, the union of set1 and set2 is the range of integers from one through seven.

The Set<'T> class also defines a custom + operator you can use to find the union of two sets:

> set1 + set2;;

val it : Set<int> = set [1; 2; 3; 4; 5; 6; 7]

Intersections

The Set.intersect function returns a new set containing only the elements found in both sets. For example, if you have a set containing the values one through five, and another set containing the values three through seven, you’d find the intersection like this:

> let set1 = [ 1..5 ] |> Set.ofList

let set2 = [ 3..7 ] |> Set.ofList

Set.intersect set1 set2;;

val set1 : Set<int> = set [1; 2; 3; 4; 5]

val set2 : Set<int> = set [3; 4; 5; 6; 7]

val it : Set<int> = set [3; 4; 5]

The resulting intersection set contains only the three values common to both set1 and set2—in this case, 3, 4, and 5.

Differences

While the intersection contains all elements common to both sets, the difference contains those elements found only in the first set. You can find the difference between two sets with the Set.difference function.

> let set1 = [ 1..5 ] |> Set.ofList

let set2 = [ 3..7 ] |> Set.ofList

Set.difference set1 set2;;

val set1 : Set<int> = set [1; 2; 3; 4; 5]

val set2 : Set<int> = set [3; 4; 5; 6; 7]

val it : Set<int> = set [1; 2]

Here, the first set contains two elements not found in the second, 1 and 2; therefore, the difference set contains only those values.

Just as with intersections, the Set<'T> class defines a custom – operator that returns a set containing the difference between two sets.

> set1 - set2;;

val it : Set<int> = set [1; 2]

Subsets and Supersets

The Set module makes it easy to determine whether two sets are related as subsets or supersets through four functions: isSubset, isProperSubset, isSuperset, and isProperSuperset. The difference between basic subset/superset and proper subset/supersets is that proper subsets/supersets require at least one additional element not present in the opposite set. The following sets illustrate:

> let set1 = [ 1..5 ] |> Set.ofList

let set2 = [ 1..5 ] |> Set.ofList;;

val set1 : Set<int> = set [1; 2; 3; 4; 5]

val set2 : Set<int> = set [1; 2; 3; 4; 5]

Because both set1 and set2 contain the same values, set1 can be considered a superset of set2. Conversely, set2 can be considered a subset of set1. For the same reason, however, set2 cannot be a proper subset of set1, as shown in the following snippet.

> Set.isSuperset set1 set2;;

val it : bool = true

> Set.isProperSuperset set1 set2;;

val it : bool = false

> Set.isSubset set2 set1;;

val it : bool = true

> Set.isProperSubset set2 set1;;

val it : bool = false

To make set2 a proper subset of set1, we need to redefine set1 to include at least one more value.

> let set1 = [ 0..5 ] |> Set.ofList;;

val set1 : Set<int> = set [0; 1; 2; 3; 4; 5]

Now, if we test for subsets and supersets again, we should see that set2 is both a subset and proper subset of set1.

> Set.isSuperset set1 set2;;

val it : bool = true

> Set.isProperSuperset set1 set2;;

val it : bool = true

> Set.isSubset set2 set1;;

val it : bool = true

> Set.isProperSubset set2 set1;;

val it : bool = true

Maps

The Map type represents an unordered, immutable dictionary (a map of keys to values) and provides many of the same capabilities as the generic Dictionary<'TKey, 'TValue> class.

NOTE

Although the Map<'Key, 'Value> class and the associated Map module provide methods for adding and removing entries, as an immutable construct, maps make sense only when the underlying entries won’t change. Adding and removing entries from a map requires creating a new map instance and copying the data from the source instance, so it is significantly slower than modifying a mutable dictionary.

Creating Maps

As with sets, F# doesn’t provide any direct syntactic support for creating maps, so the type constructor or Map module functions are required to create them, too. Regardless of the approach you choose, maps are always based on a sequence of tuples consisting of both the key and the mapped value. Here, a list of states and their respective capitals is passed to the type’s constructor:

> let stateCapitals =

Map [("Indiana", "Indianapolis")

("Michigan", "Lansing")

("Ohio", "Columbus")

("Kentucky", "Frankfort")

("Illinois", "Springfield")];;

val stateCapitals : Map<string, string> =

map

[("Illinois", "Springfield"); ("Indiana", "Indianapolis");

("Kentucky", "Frankfort"); ("Michigan", "Lansing"); ("Ohio", "Columbus")]

Working with Maps

Because maps are like immutable dictionaries, interacting with them is similar to Dictionary<'TKey, 'TValue>.

Finding Values

Like the generic dictionary, the Map type provides an indexed property for accessing a value via a known key. For instance, using the stateCapitals map, we can find Indiana’s capital like this:

> stateCapitals.["Indiana"];;

val it : string = "Indianapolis"

The Map.find function lets us do the same thing functionally.

> stateCapitals |> Map.find "Indiana";;

val it : string = "Indianapolis"

The biggest problem with both of the preceding approaches is that they’ll throw a KeyNotFoundException when the key isn’t present in the map. To avoid the exception, you can see if the map contains a particular key with the Map.containsKey function. If you wanted to test whetherstateCapitals included Washington, you could write this:

> stateCapitals |> Map.containsKey "Washington";;

val it : bool = false

Finally, if you prefer to test for the key and get the mapped value in a single operation you can turn to the Map.tryFind function, which returns an option indicating whether the key was found and the associated value, as shown here:

> stateCapitals |> Map.tryFind "Washington";;

val it : string option = None

> stateCapitals |> Map.tryFind "Indiana";;

val it : string option = Some "Indianapolis"

Finding Keys

Occasionally, you may need to find a key based on its mapped value. The Map module provides two functions for this: findKey and tryFindKey. Like their value-finding counterparts, the difference between findKey and tryFindKey is that findKey throws KeyNotFoundExceptionwhen it can’t find a value that satisfies the criteria, whereas tryFindKey does not.

To look up a key, you pass a function that accepts both the key and its mapped value and returns a Boolean indicating whether the value matches your criteria. For instance, to find a state by its capital using Map.tryFindKey, you could write:

> stateCapitals |> Map.tryFindKey (fun k v -> v = "Indianapolis");;

val it : string option = Some "Indiana"

> stateCapitals |> Map.tryFindKey (fun k v -> v = "Olympia");;

val it : string option = None

As you can see, tryFindKey returns an option, so you’ll need to test for Some and None accordingly.

Converting Between Collection Types

Sometimes you’ll have an instance of one collection type but you really need a different one. For instance, you might be working with an F# list but want to apply a function that works only with arrays. Each of the collection modules includes several functions that make converting between many of the other collection types easy.

In each module, the conversion functions are named according to the conversion direction and target type. For instance, to convert a sequence to an array, you could pass the sequence to either Seq.toArray or Array.ofSeq like this:

> seq { 1..10 } |> Seq.toArray;;

val it : int [] = [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]

> seq { 1..10 } |> Array.ofSeq;;

val it : int [] = [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]

Similarly, to convert from a list to a sequence, you could pass the list to either List.toSeq or Seq.ofList. The Set and Map modules let you convert to and from sequences, arrays, and maps according to the same conventions.

Although most of the conversion functions create a new collection, some of them work by casting. For example, Seq.ofList simply casts the source list to seq<'t> (remember, FSharpList<'T> implements IEnumerable<'T>, so it’s a valid conversion), whereas List.ofArraycreates a new array and populates it with the list’s values. If there’s ever a question as to whether the resulting collection is a type conversion or a new object, you can inspect them with the static obj.ReferenceEquals method as shown here:

> let l = [ 1..10 ]

obj.ReferenceEquals(l, Seq.ofList l);;

val l : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]

val it : bool = ① true

> let a = [| 1..10 |]

obj.ReferenceEquals(a, List.ofArray a);;

val a : int [] = [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]

val it : bool = ② false

The preceding snippet shows the result of calling both Seq.ofList and List.ofArray. You can see that ① Seq.ofList returns the same object, whereas List.ofArray ② returns a new object.

Summary

Working with data collections is something virtually every nontrivial application must do. F# lets you work with all of the traditional .NET collections like arrays and generic lists but also adds several other types like the F# list, sets, and maps, which are more suitable for functional programming.

In many regards, working with data collections in F# is more streamlined than in traditional .NET development because language features like sequence expressions, range expressions, and slice expressions make it easier to not only create collections, but also get at individual elements.

Finally, the various collection modules like Seq, Array, and List provide an easy mechanism for performing many common tasks with their respective collection types.