Arrays, Hashes, and Other Enumerables - THE RUBY WAY, Third Edition (2015)

THE RUBY WAY, Third Edition (2015)

Chapter 8. Arrays, Hashes, and Other Enumerables

All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can’t get them together again, there must be a reason. By all means, do not use a hammer.

—IBM maintenance manual (1925)

Simple variables are not adequate for real-life programming. Every modern language supports more complex forms of structured data and also provides mechanisms for creating new abstract data types.

Historically, arrays are the earliest known and most widespread of the complex data structures. Long ago, in FORTRAN, they were called subscripted variables. Today, they have changed somewhat, but the basic idea is the same in all languages.

In the last few decades, the hash has also become an extremely popular programming tool. Like the array, it is an indexed collection of data items; unlike the array, it may be indexed by any arbitrary object. (In Ruby, as in most languages, array elements are accessed by a numerical index.)

Finally, we’ll take a more general look at the Enumerable module itself and how it works. Arrays and hashes both mix in this module, as can any other class for which this functionality makes sense.

But let’s not get ahead of ourselves. We will begin with arrays.

8.1 Working with Arrays

Arrays in Ruby are indexed by integers and are zero-based, just like C arrays. There the resemblance ends, however.

A Ruby array is dynamic. It is possible (but not necessary) to specify its size when you create it. After creation, it can grow as needed without any intervention by the programmer.

A Ruby array is heterogeneous in the sense that it can store multiple data types rather than just one type. Actually, it stores object references rather than the objects themselves, except in the case of immediate values such as Fixnums.

An array keeps track of its own length so that we don’t have to waste our time calculating it or keeping an external variable in sync with the array. Iterators also are defined so that, in practice, we rarely need to know the array length anyway.

Finally, the Array class in Ruby provides arrays with many useful functions for accessing, searching, concatenating, and otherwise manipulating arrays. We’ll spend the remainder of this section exploring the built-in functionality and expanding on it.

8.1.1 Creating and Initializing an Array

The [] syntax or the class method [] is used to create an array; the data items listed in the brackets are used to populate the array. The three ways of creating an array are shown in the following lines (note that arrays a, b, and c will all be populated identically):

a = Array.[](1,2,3,4)
b = Array[1,2,3,4]
c = [1,2,3,4]

There is also a class method called new that can take zero, one, or two parameters. The first parameter is the initial size of the array (number of elements). The second parameter is the initial value of each of the elements. Here is an example:

d = Array.new # Create an empty array
e = Array.new(3) # [nil, nil, nil]
f = Array.new(3, "blah") # ["blah", "blah", "blah"]

Look carefully at the last line of the preceding example. A common error is to think that the objects in the array are distinct. Actually, they are three references to the same object. Therefore, if you change that object (as opposed to replacing it with another object), you will change all elements of the array. To avoid this behavior, use a block. Then that block is evaluated once for each element, and all elements are different objects:

f[0].upcase! # f is now: ["BLAH", "BLAH", "BLAH"]
g = Array.new(3) { "blah" } # ["blah", "blah", "blah"]
g[0].upcase! # g is now: ["BLAH", "blah", "blah"]

Finally, the Array() method takes one argument and wraps it in an array if necessary. This is often useful when a variable might be an array or might be a single item because it converts single values into one-element arrays:

h = Array(1) # [1] non-arrays are wrapped in an array
i = Array([1]) # [1] arrays stay arrays
i = Array(nil) # [] nil is ignored
j = Array([nil]) # [nil] arrays containing nil are left alone

8.1.2 Accessing and Assigning Array Elements

Element reference and assignment are done using the instance methods [] and []=, respectively. Each can take an integer parameter, a pair of integers (start and length), or a range. A negative index counts backward from the end of the array, starting at -1.

The special instance method at works like the simple case of element reference. Because it can take only a single-integer parameter, it is slightly faster:

a = [1, 2, 3, 4, 5, 6]
b = a[0] # 1
c = a.at(0) # 1
d = a[-2] # 5
e = a.at(-2) # 5
f = a[9] # nil
g = a.at(9) # nil
h = a[3, 3] # [4, 5, 6]
i = a[2..4] # [3, 4, 5]
j = a[2...4] # [3, 4]

a[1] = 8 # [1, 8, 3, 4, 5, 6]
a[1,3] = [10, 20, 30] # [1, 10, 20, 30, 5, 6]
a[0..3] = [2, 4, 6, 8] # [2, 4, 6, 8, 5, 6]
a[-1] = 12 # [2, 4, 6, 8, 5, 12]

Note in the following example how a reference beyond the end of the array causes the array to grow. Note also that a subarray can be replaced with more elements than were originally there, also causing the array to grow:

k = [2, 4, 6, 8, 10]
k[1..2] = [3, 3, 3] # [2, 3, 3, 3, 8, 10]
k[7] = 99 # [2, 3, 3, 3, 8, 10, nil, 99]

Finally, we should mention that an array assigned to a single element actually inserts that element as a nested array (unlike an assignment to a range):

m = [1, 3, 5, 7, 9]
m[2] = [20, 30] # [1, 3, [20, 30], 7, 9]

# On the other hand...
m = [1, 3, 5, 7, 9]
m[2..2] = [20, 30] # [1, 3, 20, 30, 7, 9]

The method slice is simply an alias for the [] method:

x = [0, 2, 4, 6, 8, 10, 12]
a = x.slice(2) # 4
b = x.slice(2, 4) # [4, 6, 8, 10]
c = x.slice(2..4) # [4, 6, 8]

The special methods first and last return the first and last elements of an array, respectively, and they return nil if the array is empty:

x = %w[alpha beta gamma delta epsilon]
a = x.first # "alpha"
b = x.last # "epsilon"

We have seen that some of the element-referencing techniques actually can return an entire subarray. There are other ways to access multiple elements, which we’ll look at now.

The method values_at takes a list of indices and returns an array consisting of only those elements. It can be used where a range cannot (when the elements are not all contiguous):

x = [10, 20, 30, 40, 50, 60]
y = x.values_at(0, 1, 4) # [10, 20, 50]
z = x.values_at(0..2, 5) # [10, 20, 30, 60]

8.1.3 Finding an Array’s Size

The method length or its alias size gives the number of elements in an array. (As always, this is one greater than the index of the last item.)

x = ["a", "b", "c", "d"]
a = x.length # 4
b = x.size # 4

8.1.4 Comparing Arrays

Comparing arrays is tricky. If you do it at all, do it with caution.

The instance method <=>, also called the spaceship operator, is used to compare arrays. It works the same as in the other contexts in which it is used, returning either -1 (meaning “less than”), 0 (meaning “equal”), or 1 (meaning “greater than”). The methods == and != depend on this method.

Arrays are compared in an “elementwise” manner; the first two elements that are not equal determine the inequality for the whole comparison. (Thus, preference is given to the leftmost elements, just as if we were comparing two long integers “by eye,” looking at one digit at a time.)

a = [1, 2, 3, 9, 9]
b = [1, 2, 4, 1, 1]
c = a <=> b # -1 (meaning a < b)

If all elements are equal, the arrays are equal. If one array is longer than another, and they are equal up to the length of the shorter array, the longer array is considered to be greater:

d = [1, 2, 3]
e = [1, 2, 3, 4]
f = [1, 2, 3]
if d < e # false
puts "d is less than e"
end
if d == f
puts "d equals f" # Prints "d equals f"
end

Because the Array class does not mix in the Comparable module, the usual operators <, >, <=, and >= are not defined for arrays. But we can easily define them ourselves if we choose:

class Array

def <(other)
(self <=> other) == -1
end

def <=(other)
(self < other) or (self == other)
end

def >(other)
(self <=> other) == 1
end

def >=(other)
(self > other) or (self == other)
end

end

However, it would be easier simply to include Comparable ourselves:

class Array
include Comparable
end

Having defined these new operators, we can use them as you would expect:

if a < b
print "a < b" # Prints "a < b"
else
print "a >= b"
end
if d < e
puts "d < e" # Prints "d < e"
end

It is conceivable that comparing arrays will result in the comparison of two elements for which <=> is undefined or meaningless. The following code results in an ArgumentError because the comparison 3 <=> "x" is problematic:

g = [1, 2, 3]
h = [1, 2, "x"]
if g < h # Error!
puts "g < h" # No output
end

However, in case you are still not confused, equal and not-equal will still work in this case. This is because two objects of different types are naturally considered to be unequal, even though we can’t say which is greater or less than the other:

if g != h # No problem here.
puts "g != h" # Prints "g != h"
end

Finally, it is conceivable that two arrays containing mismatched data types will still compare with < and > operators. In the case shown here, we get a result before we stumble across the incomparable elements:

i = [1, 2, 3]
j = [1, 2, 3, "x"]
if i < j # No problem here.
puts "i < j" # Prints "i < j"
end

8.1.5 Sorting an Array

The easiest way to sort an array is to use the built-in sort method, as follows:

words = %w(the quick brown fox)
list = words.sort # ["brown", "fox", "quick", "the"]
# Or sort in place:
words.sort! # ["brown", "fox", "quick", "the"]

This method assumes that all the elements in the array are comparable with each other. Sorting a mixed array such as [1, 2, "three", 4] gives an error.

In a case like this one, you can use the block form of the same method call. The following example assumes that there is at least a to_s method for each element (to convert it to a string):

a = [1, 2, "three", "four", 5, 6]
b = a.sort {|x,y| x.to_s <=> y.to_s}
# b is now [1, 2, 5, 6, "four", "three"]

Of course, such an ordering (in this case, depending on ASCII) may not be meaningful. If you have such a heterogeneous array, you may want to ask yourself why you are sorting it in the first place or why you are storing different types of objects.

This technique works because the block returns an integer (-1, 0, or 1) on each invocation. When a -1 is returned, meaning that x is less than y, the two elements are swapped. Therefore, to sort in descending order, we could simply swap the order of the comparison:

x = [1, 4, 3, 5, 2]
y = x.sort {|a,b| b <=> a} # [5, 4, 3, 2, 1]

The block style can also be used for more complex sorting. Let’s suppose that we want to sort a list of book and movie titles in a certain way: We ignore case, we ignore spaces entirely, and we want to ignore certain kinds of embedded punctuation. Here, we present a simple example. (Both English teachers and computer programmers will be equally confused by this kind of alphabetizing.)

titles = ["Starship Troopers",
"A Star is Born",
"Star Wars",
"Star 69",
"The Starr Report"]
sorted = titles.sort do |x,y|
# Delete leading articles
a = x.sub(/^(a |an |the )/i, "")
b = y.sub(/^(a |an |the )/i, "")
# Delete spaces and punctuation
a.delete!(" .,-?!")
b.delete!(" .,-?!")
# Convert to uppercase
a.upcase!
b.upcase!
# Compare a and b
a <=> b
end

# sorted is now:
# [ "Star 69", "A Star is Born", "The Starr Report"
# "Starship Troopers", "Star Wars"]

This example is not overly useful, and it could certainly be written more compactly. The point is that any arbitrarily complex set of operations can be performed on two operands to compare them in a specialized way. (Note, however, that we left the original operands untouched by manipulating copies of them.) This general technique can be useful in many situations—for example, sorting on multiple keys or sorting on keys that are computed at runtime.

The Enumerable module also has a sort_by method (which of course is mixed into Array). This is important to understand.

The sort_by method employs what Perl people call a Schwartzian transform (after Randal Schwartz). Rather than sort based on the array elements themselves, we apply a function to the values and sort based on the results.

For a contrived example, imagine that we had a list of files and wanted to sort them by size. A straightforward way would be as follows:

files = files.sort {|x,y| File.size(x) <=> File.size(y) }

However, there are two problems here. First, this seems slightly verbose. We should be able to condense it a little.

Second, this results in multiple disk accesses, each of which is a fairly expensive operation (compared to simple in-memory operations). To make it worse, we are doing many of these operations more than once.

Using sort_by addresses both these issues. Here is the “right” way to do it:

files = files.sort_by {|x| File.size(x) }

In the preceding example, each key is computed only once and is then stored internally as part of a key/data tuple. Then the items in the array are sorted based on the stored values returned by the block. For smaller arrays, this may actually decrease efficiency, but might be worth it for more readable code.

There is no sort_by! method. However, you could always write your own.

What about a multikey sort? Imagine that we had an array of objects and needed to sort them based on three of their attributes: name, age, and height. The fact that arrays are comparable means that this technique will work:

list = list.sort_by {|x| [x.name, x.age, x.height] }

Of course, you’re not limited to simple array elements like these. Any arbitrary expression could be an array element.

8.1.6 Selecting from an Array by Criteria

Sometimes we want to locate an item or items in an array much as though we were querying a table in a database. There are several ways to do this; the ones we outline here are all mixed in from the Enumerable module.

The detect method will find at most a single element. It takes a block (into which the elements are passed sequentially) and returns the first element for which the block evaluates to a value that tests true:

x = [5, 8, 12, 9, 4, 30]
# Find the first multiple of 6
x.detect {|e| e % 6 == 0 } # 12
# Find the first multiple of 7
x.detect {|e| e % 7 == 0 } # nil

Of course, the objects in the array can be of arbitrary complexity, as can the test in the block.

The find method is a synonym for detect; the method find_all is a variant that returns multiple elements as opposed to a single element. Finally, the method select is a synonym for find_all:

# Continuing the above example...
x.find {|e| e % 2 == 0} # 8
x.find_all {|e| e % 2 == 0} # [8, 12, 4, 30]
x.select {|e| e % 2 == 0} # [8, 12, 4, 30]

The grep method invokes the relationship operator (that is, the case equality operator) to match each element against the pattern specified. In its simplest form, it returns an array containing the matched elements. Because the relationship operator (===) is used, the so-called pattern need not be a regular expression. (The name grep, of course, comes from the UNIX command, historically related to the old editor command g/re/p.)

a = %w[January February March April May]
a.grep(/ary/) # ["January, "February"]
b = [1, 20, 5, 7, 13, 33, 15, 28]
b.grep(12..24) # [20, 13, 15]

There is a block form that effectively transforms each result before storing it in the array; the resulting array contains the return values of the block rather than the values passed into the block:

# Continuing above example...
# Let's store the string lengths
a.grep(/ary/) {|m| m.length} # [7, 8]
# Let's square each value
b.grep(12..24) {|n| n*n} # {400, 169, 225}

The reject method is complementary to select. It excludes each element for which the block evaluates to true. The in-place mutator reject! is also defined:

c = [5, 8, 12, 9, 4, 30]
d = c.reject {|e| e % 2 == 0} # [5, 9]
c.reject! {|e| e % 3 == 0}
# c is now [5, 8, 4]

The min and max methods may be used to find the minimum and maximum values, respectively, in an array. There are two forms of each of these; the first form uses the “default” comparison, whatever that may be in the current situation (as defined by the <=> method). The second form uses a block to do a customized comparison:

a = %w[Elrond Galadriel Aragorn Saruman Legolas]
b = a.min # "Aragorn"
c = a.max # "Saruman"
d = a.min {|x,y| x.reverse <=> y.reverse} # "Elrond"
e = a.max {|x,y| x.reverse <=> y.reverse} # "Legolas"

Suppose we wanted to find the index of the minimum or maximum element (assuming it is unique). We could use the index method for tasks such as this:

# Continuing above example...
i = a.index a.min # 2
j = a.index a.max # 3

This same technique can be used in other similar situations. However, if the element is not unique, the first one in the array will naturally be the one found.

8.1.7 Using Specialized Indexing Functions

The internals of a language handle the mapping of array indices to array elements through an indexing function. Because the methods that access array elements can be overridden, we can in effect index an array in any way we want.

For example, in the following code, we implement a one-based array rather than a zero-based array:

class Array1 < Array

def [](index)
if index > 0
super(index - 1)
else
raise IndexError
end
end

def []=(index, obj)
if index > 0
super(index - 1, obj)
else
raise IndexError
end
end

end

x = Array1.new

x[1]=5
x[2]=3
x[0]=1 # Error
x[-1]=1 # Error

Note that the negative indexing (from the end of an array) is disallowed here. And be aware that if this were a real-life solution, there would be other changes to make, such as the slice method and others. But this gives the general idea.

A similar approach can be used to implement multidimensional arrays (as we will see in Section 8.1.11, “Using Multidimensional Arrays”).

It is also possible to implement something like a triangular matrix, as shown here. This is like a special case of a two-dimensional array in which element x,y is always the same as element y,x (so that only one need be stored). This is sometimes useful, for example, in storing an undirected graph (as we will see toward the end of this chapter).

class TriMatrix

def initialize
@store = []
end

def [](x, y)
if x > y
index = (x * x + x)/2 + y
@store[index]
else
raise IndexError
end
end

def []=(x, y, v)
if x > y
index = (x * x + x)/2 + y
@store[index] = v
else
raise IndexError
end
end

end


t = TriMatrix.new

t[3, 2] = 1
puts t[3, 2] # 1
puts t[2, 3] # IndexError

In the preceding example, we chose to implement the matrix so that the row number must be greater than or equal to the column number; we also could have coded it so that the same pair of indices simply mapped to the same element. These design decisions will depend on your use of the matrix.

It would have been possible to inherit from Array, but we thought this solution was easier to understand. The indexing formula is a little complex, but 10 minutes with pencil and paper should convince anyone it is correct. Enhancements probably could be made to this class to make it truly useful, but we will leave that to you.

Also, it is possible to implement a triangular matrix as an array containing arrays that increase in size as the row number gets higher. This is similar to what we do in Section 8.1.11, “Using Multidimensional Arrays.” The only tricky part would be to make sure that a row does not accidentally grow past its proper size.

8.1.8 Implementing a Sparse Matrix

Sometimes we need an array that has very few elements defined; the rest of its elements can be undefined (or more often zero). This so-called sparse matrix has historically been a waster of memory that led people to seek indirect ways of implementing it.

Of course, in most cases, a Ruby array will suffice because modern architectures typically have large amounts of memory. An unassigned element will have the value nil, which takes only a few bytes to store.

But on the other hand, assigning an array element beyond the previous bounds of the array also creates all the nil elements in between. For example, if elements 0 through 9 are defined, and we suddenly assign to element 1000, we have in effect caused elements 10 through 999 to spring into being as nil values. If this is unacceptable, you might consider another alternative.

The alternative we have to suggest, however, does not involve arrays at all. If we really need a sparse matrix, a hash might be the best solution. See Section 8.2.14, “Using a Hash as a Sparse Matrix.”

8.1.9 Using Arrays as Mathematical Sets

Most languages do not directly implement sets (Pascal being one exception). But Ruby arrays have some features that make them usable as sets. We’ll present these here and add a few of our own.

For more serious needs, however, consider using the Set class from the standard library rather than actual arrays. These are covered in Chapter 9, “More Advanced Data Structures.”

An array isn’t a perfect fit for representing a set because an array can have duplicate entries. If you specifically want to treat the array as a set, you can remove these (using uniq or uniq!).

The two most basic operations performed on sets are union and intersection. These are accomplished by the | (or) and & (and) operators, respectively. In accordance with the idea that a set does not contain duplicates, any duplicates will be removed. (This may be contrary to your expectations if you are used to array union and intersection operations in some other language.)

a = [1, 2, 3, 4, 5]
b = [3, 4, 5, 6, 7]
c = a | b # [1, 2, 3, 4, 5, 6, 7]
d = a & b # [3, 4, 5]

# Duplicates are removed...
e = [1, 2, 2, 3, 4]
f = [2, 2, 3, 4, 5]
g = e & f # [2, 3, 4]

The concatenation operator (+) can be used for set union, but it does not remove duplicates.

The - method is a “set difference” operator that produces a set with all the members of the first set except the ones appearing in the second set (see Section 8.1.12, “Finding Elements in One Array But Not Another”).

a = [1, 2, 3, 4, 5]
b = [4, 5, 6, 7]
c = a - b # [1, 2, 3]
# Note that the extra items 6 and 7 are irrelevant.

To “accumulate” sets, it is possible to use the |= operator; as expected, a |= b simply means a = a | b. Likewise &= can progressively “narrow down” the elements of a set.

There is no exclusive-or defined for arrays, but we can make our own easily. In set terms, this corresponds to elements that are in the union of two sets but not in the intersection:

class Array

def ^(other)
(self | other) - (self & other)
end

end

x = [1, 2, 3, 4, 5]
y = [3, 4, 5, 6, 7]
z = x ^ y # [1, 2, 6, 7]

To check for the presence of an element in a set, we can use the method include? (or member?, which is essentially an alias mixed in from Comparable):

x = [1, 2, 3]
puts "yes" if x.include?(2) # Prints "yes"

Of course, this is a little backward from what we are used to in mathematics, where the operator resembling a Greek epsilon denotes set membership. It is backward in the sense that the set is on the left rather than on the right; we are not asking “Is this element in this set?” but rather “Does this set contain this element?”

Many people will not be bothered by this at all. But if you are used to Pascal or Python (or you have ingrained mathematical inclinations), you may want a different way. We present an option in the following code:

class Object

def in(other)
other.include? self
end

end

x = [1, 2, 3]
if 2.in x
puts "yes" # Prints "yes"
else
puts "no"
end

I personally have championed an in operator for Ruby. This would be similar to the operator in Pascal or Python or even SQL. The idea has its advantages (and in is already a reserved word), but it has been received with mixed popularity. It may or may not ever be part of Ruby. (The Elixr language, which is heavily inspired by Ruby, does implement it.)

Now let’s look at subsets and supersets. How do we tell whether a set is a subset or a superset of another? There are no built-in methods, but we can do it this way:

class Array

def subset?(other)
self.each do |x|
return false if !(other.include? x)
end
true
end

def superset?(other)
other.subset?(self)
end

end

a = [1, 2, 3, 4]
b = [2, 3]
c = [2, 3, 4, 5]

flag1 = c.subset? a # false
flag2 = b.subset? a # true
flag3 = c.superset? b # true

Note that we’ve chosen the “natural” ordering—that is, x.subset? y means “Is x a subset of y?” rather than vice versa.

To detect the null set (or empty set), we simply detect the empty array. The empty? method does this.

The concept of set negation (or complement) depends on the concept of a universal set. Because in practical terms this varies from one application or situation to another, the best way is the simplest: Define the universe; then do a set difference.

universe = [1, 2, 3, 4, 5, 6]
a = [2, 3]
b = universe - a # complement of a = [1, 4, 5, 6]

Of course, if you really felt the need, you could define a unary operator (such as - or ~) to do this.

You can iterate through a set just by iterating through the array. The only difference is that the elements will come out in order, which you may not want. To iterate randomly, see Section 8.1.18, “Iterating over an Array.”

Finally, we may sometimes want to compute the powerset of a set. This is simply the set of all possible subsets (including the null set and the original set itself). Ruby provides an array method that returns each combination of array elements with length n. The powerset is comprised of the results for every n, from zero to the length of the array:

class Array

def powerset
(0..length).flat_map{|n| combination(n).to_a }
end

end

x = [1, 2, 3]
y = x.powerset
# y is now:
# [[], [1], [2], [3], [1,2], [1,3], [2,3], [1,2,3]]

The flat_map method uses map to get arrays of combinations for each length n, and then combines each result array into a single array that is returned.

8.1.10 Randomizing an Array

Sometimes we want to scramble an array into a random order. The first example that might come to mind is a card game, but there are other circumstances such as presenting a list of questions to a user in a random order.

To accomplish this task, we can use the shuffle method:

x = [1, 2, 3, 4, 5]
y = x.shuffle # [3, 2, 4, 1 ,5]
x.shuffle! # x is now [3, 5, 4, 1, 2]

If we wanted simply to pick an array element at random (without disallowing duplicates), we could use sample:

x = [1, 2, 3, 4, 5]
x.sample # 4
x.sample(2) # [5, 1]

Finally, we should remember that we can generate a predictable sequence (for example, in unit testing) simply by seeding with a known seed using srand (see Section 5.27, “Generating Random Numbers”).

8.1.11 Using Multidimensional Arrays

For very simple cases, it can be possible to simply nest arrays to create additional dimensions. The following example presents a way of handling multidimensional arrays by overloading the [] and []= methods to map elements onto a nested array. The class Array3 presented here handles three-dimensional arrays in a rudimentary fashion, but it is far from complete:

class Array3

def initialize
@store = [[[]]]
end

def [](a,b,c)
if @store[a]==nil ||
@store[a][b]==nil ||
@store[a][b][c]==nil
return nil
else
return @store[a][b][c]
end
end

def []=(a,b,c,x)
@store[a] = [[]] if @store[a]==nil
@store[a][b] = [] if @store[a][b]==nil
@store[a][b][c] = x
end

end

x = Array3.new
x[0,0,0] = 5
x[0,0,1] = 6
x[1,2,3] = 99

puts x[1,2,3]

Note that all we really gain here is the convenience of a “comma” notation [x,y,z] instead of the more C-like [x][y][z]. If the C-style notation is acceptable to you, you can just use nested arrays in Ruby. Another minor benefit is the prevention of the situation in which nil is the receiver for the bracket method.

For more complex multidimensional arrays, there is a Matrix standard library, as mentioned in Section 5.10, “Matrix Manipulation.” However, the performance of the standard library is likely insufficient for serious numerical work. Instead, use the nmatrix gem, an optimized multidimensional array library created by the SciRuby project. For more information, see SciRuby.com.

8.1.12 Finding Elements in One Array But Not Another

Finding elements in one array but not another is simpler in Ruby than in many languages. It is a simple “set difference” problem:

text = %w[the magic words are squeamish ossifrage]
dictionary = %w[an are magic the them these words]
# Find potential misspellings
unknown = text - dictionary # ["squeamish", "ossifrage"]

8.1.13 Transforming or Mapping Arrays

The collect method (part of Enumerable) is a useful tool that proves to be a time and labor saver in many circumstances. If you are a Smalltalk programmer, this may be more intuitive than if you come from a C background.

This method simply operates on each element of an array in some arbitrary way to produce a new array. In other words, it “maps” an array onto another array (hence the synonym map):

x = %w[alpha bravo charlie delta echo foxtrot]
# Get the initial letters
a = x.collect {|w| w[0..0]} # %w[a b c d e f]
# Get the string lengths
b = x.collect {|w| w.length} # [5, 5, 7, 5, 4, 7]
# map is a shorter alias
c = x.map {|w| w.length} # [5, 5, 7, 5, 4, 7]

The in-place variant collect! (or map!) is also defined:

x.collect! {|w| w.upcase }
# x is now %w[ALPHA BRAVO CHARLIE DELTA ECHO FOXTROT]
x.map! {|w| w.reverse }
# x is now %w[AHPLA OVARB EILRAHC ATLED OHCE TORTXOF]

8.1.14 Removing nil Values from an Array

The compact method (or its in-place version compact!) removes nil values from an array, leaving the rest untouched:

a = [1, 2, nil, 3, nil, 4, 5]
b = a.compact # [1, 2, 3, 4, 5]
a.compact! # a is now [1, 2, 3, 4, 5]

8.1.15 Removing Specific Array Elements

It is easy to delete elements from a Ruby array, and there are many ways to do it. If you want to delete one specific element by index, delete_at is a good way:

a = [10, 12, 14, 16, 18]
a.delete_at(3) # Returns 16
# a is now [10, 12, 14, 18]
a.delete_at(9) # Returns nil (out of range)

If you want to delete all instances of a certain piece of data, delete will do the job. It returns the value of the objects deleted or nil if it was not found:

b = %w(spam spam bacon spam eggs ham spam)
b.delete("spam") # Returns "spam"
# b is now ["bacon", "eggs", "ham"]
b.delete("caviar") # Returns nil

The delete method also accepts a block. This may be a little counterintuitive; all that happens is that the block is evaluated (potentially performing a wide range of operations) if the object is not found, and the value of the block is returned:

c = ["alpha", "beta", "gamma", "delta"]
c.delete("delta") { "Nonexistent" }
# Returns "delta" (block is never evaluated)
c.delete("omega") { "Nonexistent" }
# Returns "Nonexistent"

The delete_if passes every element into the supplied block and deletes the elements for which the block evaluates to true. It behaves similarly to reject!, except that the latter can return nil when the array remains unchanged:

email = ["job offers", "greetings", "spam", "news items"]
# Delete four-letter words
email.delete_if {|x| x.length==4 }
# email is now ["job offers", "greetings", "news items"]

The slice! method accesses the same elements as slice but deletes them from the array as it returns their values:

x = [0, 2, 4, 6, 8, 10, 12, 14, 16]
a = x.slice!(2) # 4
# x is now [0, 2, 6, 8, 10, 12, 14, 16]
b = x.slice!(2,3) # [6, 8, 10]
# x is now [0, 2, 12, 14, 16]
c = x.slice!(2..3) # [12, 14]
# x is now [0, 2, 16]

The shift and pop methods can be used for deleting array elements (for more about their intended uses, see Section 9.2, “Working with Stacks and Queues”):

x = [1, 2, 3, 4, 5]
x.pop # Delete the last element
# x is now [1, 2, 3, 4]
x.shift # Delete the first element
# x is now [2, 3, 4]

The reject method takes a block and produces a new array without the elements for which the block returns true:

arr = [1,2,3,4,5,6,7,8]
odd = arr.reject {|x| x % 2 == 0 } # [1,3,5,7]

Finally, the clear method deletes all the elements in an array. It is equivalent to assigning an empty array to the variable but is marginally more efficient.

x = [1, 2, 3]
x.clear
# x is now []

8.1.16 Concatenating and Appending onto Arrays

Frequently we want to append an element or another array onto an array. You can do this in many ways with a Ruby array.

The “append” operator (<<) appends an object onto an array; the return value is the array itself, so that these operations can be “chained.”

x = [1, 5, 9]
x << 13 # x is now [1, 5, 9, 13]
x << 17 << 21 # x is now [1, 5, 9, 13, 17, 21]

Similar to the append operator are the unshift and push methods, which add to the beginning and end of an array, respectively (see Section 8.1.17, “Using an Array as a Stack or Queue”):

x = [1, 5, 9]
x.push *[2, 6, 10] # x is now [1, 5, 9, 2, 6, 10]
x.unshift 3 # x is now [3, 1, 5, 9, 2, 6, 10]

Arrays may be concatenated with the concat method or by using the + and += operators:

x = [1,2]
y = [3,4]
z = [5,6]
b = y + z # [3,4,5,6]
b += x # [3,4,5,6,1,2]
z.concat y # z is now [5,6,3,4]

Bear in mind that +, concat, and even += always create a new array object. Also bear in mind that while << adds to the existing array, it appends a new array element (which may itself be an array):

a = [1,2]
b = [3,4]
a += b # [1,2,3,4]

a = [1,2]
b = [3,4]
a << b # [1,2,[3,4]]

a = [1,2]
b = [3,4]
a = a.concat(b) # [1,2,3,4]

8.1.17 Using an Array as a Stack or Queue

The basic stack operations are push and pop, which add and remove items at the end of an array, respectively. The basic queue operations are shift (which removes an item from the beginning of an array) and unshift (which adds an element to the beginning). The append operator <<can also be used to add an item to the end of an array (basically a synonym for push).

Don’t get confused. The shift and unshift methods work on the beginning of an array; the push, pop, and << methods work on the end.

For a better discussion of this topic, see Section 9.2, “Working with Stacks and Queues.”

8.1.18 Iterating over an Array

The Array class has the standard iterator each, as is to be expected. However, it also has other useful iterators.

The reverse_each method iterates in reverse order. It is equivalent to using reverse and then each, but it is faster:

words = %w(Son I am able she said)
str = ""
words.reverse_each { |w| str += "#{w} "}
# str is now "said she able am I Son "

If we only want to iterate over the indices, we can use each_index. Saying x.each_index is equivalent to saying (0..(x.size-1)).each (that is, iterating over the range of indices).

The chainable iterator with_index (mixed in from Enumerable) adds the element index to an existing iterator:

x = ["alpha", "beta", "gamma"]
x.each.with_index do |x, i|
puts "Element #{i} is #{x}"
end
# Produces three lines of output

Suppose that we wanted to iterate over an array in random order. We show here the iterator random_each (which simply uses the shuffle method):

class Array

def random_each
self.shuffle.each {|x| yield x}
end

end

dwarves = %w(Sleepy Dopey Happy Sneezy Grumpy Bashful Doc)
list = ""
dwarves.random_each {|x| list += "#{x} "}
# list is now: "Bashful Dopey Sleepy Happy Grumpy Doc Sneezy "
# (Your mileage may vary.)

8.1.19 Interposing Delimiters to Form a String

Frequently we will want to insert delimiters in between array elements in a “fencepost” fashion; that is, we want to put delimiters between the elements but not before the first one or after the last one. The method join will do this, as will the * operator:

been_there = ["Veni", "vidi", "vici."]
journal = been_there.join(", ") # "Veni, vidi, vici."

letters = ["Phi", "Mu", "Alpha"]
musicians = letters.join(" ") # "Phi Mu Alpha"

people = ["Bob", "Carol", "Ted", "Alice"]
movie = people * " and " # "Bob and Carol and Ted and Alice"

Note that if we really need to treat the last element differently, perhaps by inserting the word and, we can do it manually:

list = %w[A B C D E F]
with_commas = list[0..-2] * ", " + ", and " + list[-1]
# with_commas is now "A, B, C, D, E, and F"

8.1.20 Reversing an Array

To reverse the order of an array, use the reverse or reverse! method:

inputs = ["red", "green", "blue"]
outputs = inputs.reverse # ["green","blue","red"]
priorities = %w(eat sleep code)
priorities.reverse! # ["code","sleep","eat"]

8.1.21 Removing Duplicate Elements from an Array

If you want to remove duplicate elements from an array, the uniq method (or its in-place mutator uniq!) will do the job:

breakfast = %w[spam spam eggs ham eggs spam]
lunch = breakfast.uniq # ["spam","eggs","ham"]
breakfast.uniq! # breakfast has changed now

8.1.22 Interleaving Arrays

Suppose that we wanted to take two arrays and “interleave” them so that the new array contains an array of paired elements for each of the two original ones. That’s what the zip method in Enumerable does:

a = [1, 2, 3, 4]
b = ["a", "b", "c", "d"]
c = a.zip(b)
# c is now [[1,"a"], [2,"b"], [3,"c"], [4,"d"]]
# Use flatten if you want to eliminate the nesting
d = c.flatten
# d is now [1, "a", 2, "b", 3, "c", 4, "d"]

If a block is specified, the output subarrays will be passed successively into the block:

a.zip(b) {|x1, x2| puts x2 + "-" + x1.to_s }
# Prints: a-1
# b-2
# c-3
# d-4
# and returns nil

8.1.23 Counting Frequency of Values in an Array

Strings have a count method to discover how many times certain letters are used in the string, but arrays have no similar method. Therefore, here is a method that counts the occurrences of each data item in an Array:

class Array

def counts
each_with_object(Hash.new(0)){|x,h| h[x] += 1 }
end

end

meal = %w[spam spam eggs ham eggs spam]
items = meal.counts
# items is {"ham" => 1, "spam" => 3, "eggs" => 2}
spams = items["spam"] # 3

Note that a hash is returned. No pun intended.

8.1.24 Inverting an Array to Form a Hash

An array is used to associate an integer index with a piece of data. But what if you want to invert that association—that is, associate the data with the index, producing a hash? The following method will do just that:

class Array

def invert
each_with_object({}).with_index{|(x,h),i| h[x] = i }
end

end

a = ["red","yellow","orange"]
h = a.invert # {"orange"=>2, "yellow"=>1, "red"=>0}

8.1.25 Synchronized Sorting of Multiple Arrays

Suppose we wanted to sort an array, but we had other arrays that corresponded with this one on an element-for-element basis. In other words, we don’t want to get them out of sync.

The solution presented here sorts an array and gathers the resulting set of indices. The list of indices (itself an array) can be applied to any other array to put its elements in the same order:

class Array

def sort_index
d=[]
each.with_index{|x,i| d[i] = [x,i] }
if block_given?
d.sort{|x,y| yield x[0], y[0] }.collect{|x| x[1] }
else
d.sort.collect{|x| x[1] }
end
end

def sort_with(ord = [])
return nil if self.length != ord.length
values_at(*ord)
end

end


a = [21, 33, 11, 34, 36, 24, 14]
b = a.sort_index
a2 = a.sort_with(b)
c = a.sort_index {|x,y| x%2 <=> y%2 }
a3 = a.sort_with(c)

p a # [21, 33, 11, 34, 36, 24, 14]
p b # [2, 6, 0, 5, 1, 3, 4]
p a2 # [11, 14, 21, 24, 33, 34, 36]
p c # [6, 5, 4, 3, 2, 1, 0]
p a3 # [14, 24, 36, 34, 11, 33, 21]

8.1.26 Establishing a Default Value for New Array Elements

When an array grows and new (unassigned) elements are created, these default to nil values:

a = Array.new
a[0]="x"
a[3]="y"
# a is now ["x", nil, nil, "y"]

What if we want to set those new elements to some other value? As a specific instance of a general principle, we offer the ZeroArray class, which defaults new unassigned elements to 0:

class ZeroArray < Array

def [](x)
if x > size
(size+1..x).each{|i| self[i] = 0 }
end
v = super(x)
end

def []=(x, v)
max = size
super(x, v)
if size - max > 1
(max..size-2).each{|i| self[i] = 0 }
end
end
end


num = ZeroArray.new
num[1] = 1
num[2] = 4
num[5] = 25
# num is now [0, 1, 4, 0, 0, 25]

8.2 Working with Hashes

Hashes are known in some circles as associative arrays, dictionaries, hash maps, and various other names. Python, Java, and Perl programmers in particular will already be familiar with this data structure.

Think of an array as an entity that creates an association between an index x and a data item y. A hash creates a similar association with at least two exceptions. First, for an array, x is always an integer; for a hash, it need not be. Second, an array is an ordered data structure; we typically think of a hash as having no ordering. (That said, Ruby started tracking the ordering of hashes in version 1.9.)

A hash key can be of any arbitrary type. As a side effect, this makes a hash intrinsically a nonsequential data structure. In an array, we know that element 4 follows element 3; but in a hash, the key may be of a type that does not define a real predecessor or successor. However, hashes “remember” their insertion order since Ruby 1.9, in effect making a hash a kind of ordered data structure.

You may think of a hash as an array with a specialized index, or as a database “synonym table” with two fields stored in memory. Regardless of how you perceive it, a hash is a powerful and useful programming construct.

8.2.1 Creating a New Hash

As with Array, the special class method [] is used to create a hash. The data items listed in the brackets are used to form the mapping of the hash.

Six ways of calling this method are shown here (hashes a1 through c2 will all be populated identically):

a1 = Hash.[]("flat", 3, "curved", 2)
a2 = Hash.[]("flat" => 3,"curved" => 2)
b1 = Hash["flat", 3, "curved", 2]
b2 = Hash["flat" => 3,"curved" => 2]
c1 = {"flat", 3, "curved", 2}
c2 = {"flat" => 3,"curved" => 2}
# For a1, b1, and c1: There must be
# an even number of elements.

There is an alternate “convenience” syntax for a hash literal in the special case where the keys are symbols, added in Ruby 1.9. The preceding colon is omitted and the “arrow” is replaced with a colon:

h1 = {:alpha => 123, :beta => 456}
h2 = {alpha: 123, beta: 456}
h1 == h2 # true

There is also a class method called new that can take a parameter specifying a default value. Note that this default value is not actually part of the hash; it is simply a value returned in place of nil:

d = Hash.new # Create an empty hash
e = Hash.new(99) # Create an empty hash
f = Hash.new("a" => 3) # Create an empty hash
e["angled"] # 99
e.inspect # {}
f["b"] # {"a" => 3} (default value is
# actually a hash itself)
f.inspect # {}

Finally, there is a to_h method available on Array. It converts any array of two-element arrays into a hash of keys and values:

g = [["a", 1]].to_h # {"a" => 1}

8.2.2 Specifying a Default Value for a Hash

The default value of a hash is an object referenced in place of nil in the case of a missing key. This is useful if you plan to use methods with the hash value that are not defined for nil. It can be assigned upon creation of the hash or at a later time using the default= method.

All missing keys point to the same default value object, so changing the default value has a side effect:

a = Hash.new("missing") # default value object is "missing"
a["hello"] # "missing"
a.default = "nothing"
a["hello"] # "nothing"
a["good"] << "bye" # "nothingbye"
a.default # "nothingbye"

In contrast to default, a block can allow each missing key to have its own default. A common idiom is a hash where the default value is an array, so that items can be added without having to explicitly check for a value and create an empty array:

a = Hash.new{|h, key| h[key] = [] } # default value is a new []
a["hello"] # []
a["good"] << "bye" # {"good" => ["bye"]}

There is also an instance method called fetch that raises an IndexError exception if the key does not exist in the Hash object. It takes a second parameter that serves as a default value. Also, fetch optionally accepts a block to produce a default value in case the key is not found. This is similar to default values created by a block:

a = {"flat" => 3, "curved" => 2, "angled" => 5}
a.fetch("pointed") # IndexError
a.fetch("curved", "na") # 2
a.fetch("x", "na") # "na"
a.fetch("flat") {|x| x.upcase } # 3
a.fetch("pointed") {|x| x.upcase } # "POINTED"

8.2.3 Accessing and Adding Key-Value Pairs

Hash has class methods [] and []=, just as Array has; they are used much the same way, except that they accept only one parameter. The parameter can be any object, not just a string (although string objects are commonly used):

a = {}
a["flat"] = 3 # {"flat" => 3}
a.[]=("curved", 2) # {"flat" => 3, "curved" => 2}
a.store("angled", 5) # {"flat" => 3, "curved" => 2, "angled" => 5}

The method store is simply an alias for the []= method, both of which take two arguments, as shown in the preceding example.

The [] method is similar to the fetch method, except that it does not raise an IndexError on missing keys, and returns nil instead:

a.fetch("flat") # 3
a.[]("flat") # 3
a["flat"] # 3
a["bent"] # nil

Suppose that we are not sure whether the Hash object exists, and we want to avoid clearing an existing hash. One way is to check whether the hash is defined:

a = {} unless defined? a
a["flat"] = 3

But a more idiomatic way is as follows:

a ||= {}
a["flat"] = 3
# Or even:
(a ||= {})["flat"] = 3

The same problem can be applied to individual keys, where you want to assign a value only if the key does not exist:

a = Hash.new(99)
a[2] # 99
a # {}
a[2] ||= 5 # 99
a # {}
b = Hash.new
b # {}
b[2] # nil
b[2] ||= 5 # 5
b # {2 => 5}

Note that nil may be used as either a key or an associated value:

b = {}
b[2] # nil
b[3] = nil
b # {3 => nil}
b[2].nil? # true
b[3].nil? # true
b[nil] = 5
b # {3 => nil, nil => 5}
b[nil] # 5
b[b[3]] # 5

8.2.4 Deleting Key-Value Pairs

Key-value pairs of a Hash object can be deleted using clear, delete, delete_if, reject, reject!, and shift.

Use clear to remove all key-value pairs. This is essentially the same as assigning a new empty hash but marginally faster.

Use shift to remove an unspecified key-value pair. This method returns the pair as a two-element array, or nil if no keys are left:

a = {1 => 2, 3 => 4}
b = a.shift # [1, 2]
# a is now {3 => 4}

Use delete to remove a specific key-value pair. It accepts a key and returns the value associated with the key removed (if found). If not found, the default value is returned. It also accepts a block to produce a unique default value rather than just a reused object reference:

a = {1 => 1, 2 => 4, 3 => 9, 4 => 16}
a.delete(3) # 9
# a is now {1 => 1, 2 => 4, 4 => 16}
a.delete(5) # nil in this case
a.delete(6) { "not found" } # "not found"

Use delete_if, reject, or reject! in conjunction with the required block to delete all keys for which the block evaluates to true. The method reject uses a copy of the hash, and reject! returns nil if no changes were made.

8.2.5 Iterating Over a Hash

The Hash class has the standard iterator each, as is to be expected. It also has each_key, each_pair, and each_value (each is an alias for each_pair).

{"a" => 3,"b" => 2}.each do |key, val|
print val, " from ", key, "; " # 3 from a; 2 from b;
end

The other two provide only one or the other of key or value to the block:

{"a" => 3,"b" => 2}.each_key do |key|
print "key = #{key};" # Prints: key = a; key = b;
end

{"a" => 3, "b" => 2}.each_value do |value|
print "val = #{value};" # Prints: val = 3; val = 2;
end

As of Ruby 1.9, iteration order is the same as insertion order.

8.2.6 Inverting a Hash

Inverting a hash in Ruby is trivial with the invert method:

a = {"fred" => "555-1122", "jane" => "555-7779"}
b = a.invert
b["555-7779"] # "jane"

Because hashes have unique keys, there is potential for data loss when doing this: Duplicate associated values will be converted to a unique key using only one of the associated keys as its value. There is no predictable way to tell which one will be used.

8.2.7 Detecting Keys and Values in a Hash

Determining whether a key has been assigned can be done with has_key? or any one of its aliases: include?, key?, and member?.

a = {"a" => 1, "b" => 2}
a.has_key? "c" # false
a.include? "a" # true
a.key? 2 # false
a.member? "b" # true

You can also use empty? to see whether any keys at all are left in the hash. Likewise, length or its alias size can be used to determine how many there are:

a.empty? # false
a.length # 2

Alternatively, you can test for the existence of an associated value using has_value? or value?:

a.has_value? 2 # true
a.value? 99 # false

8.2.8 Extracting Hashes into Arrays

To convert the entire hash into an array, use the to_a method. The resulting array will contain two-element arrays of key and value pairs:

h = {"a" => 1, "b" => 2}
h.to_a # [["a", 1], ["b", 2]]

It is also possible to convert only the keys or only the values of the hash into an array:

h.keys # ["a","b"]
h.values # [1,2]

Finally, you can extract an array of values selectively based on a list of keys, using the method. This works for hashes much as the method of the same name works for arrays. (Also, as for arrays, values_at replaces the obsolete methods indices and indexes.)

h = {1 => "one", 2 => "two", 3 => "three",
4 => "four", "cinco" => "five"}
h.values_at(3, "cinco", 4) # ["three","five","four"]
h.values_at(1,3) # ["one","three"]

8.2.9 Selecting Key-Value Pairs by Criteria

The Hash class mixes in the Enumerable module, so you can use detect (find), select (find_all), grep, min, max, and reject as with arrays.

The detect method (alias find) finds a single key-value pair. It takes a block (into which the pairs are passed one at a time) and returns the first pair for which the block evaluates to true:

names = {"fred" => "jones", "jane" => "tucker",
"joe" => "tucker", "mary" => "SMITH"}
# Find a tucker
names.detect {|k,v| v == "tucker" } # ["joe","tucker"]
# Find a capitalized surname
names.find {|k,v| v == v.upcase } # ["mary", "SMITH"]

Of course, the objects in the hash can be of arbitrary complexity, as can the test in the block, but comparisons between differing types can cause problems.

The select method (alias find_all) returns multiple matches as opposed to a single match:

names.select {|k,v| v == "tucker" }
# [["joe", "tucker"], ["jane", "tucker"]]
names.find_all {|k,v| k.count("r")>0}
# [["mary", "SMITH"], ["fred", "jones"]]

8.2.10 Sorting a Hash

Hashes are by their nature not ordered according to the value of their keys or associated values. In performing a sort on a hash, Ruby converts the hash to an array and then sorts that array. The result is naturally an array:

names = {"Jack" => "Ruby", "Monty" => "Python",
"Blaise" => "Pascal", "Minnie" => "Perl"}
list = names.sort
# list is now:
# [["Blaise", "Pascal"], ["Jack", "Ruby"],
# ["Minnie","Perl"], ["Monty","Python"]]

Here is how to convert such an array back to a hash:

list_hash = list.to_h

See also Section 8.2.12, “Creating a Hash from an Array.”

8.2.11 Merging Two Hashes

Merging hashes may be useful sometimes. Ruby’s merge method merges the entries of two hashes, forming a third hash and overwriting any previous duplicates:

dict = {"base" => "foundation", "pedestal" => "base"}
added = {"base" => "non-acid", "salt" => "NaCl"}
new_dict = dict.merge(added)
# {"base" => "non-acid", "pedestal" => "base", "salt" => "NaCl"}

An alias for merge is update.

If a block is specified, it can contain logic to deal with collisions. For example, here we assume that if we have two keys colliding, we use the value that is less than the other (alphabetically, numerically, or however):

dict = {"base" => "foundation", "pedestal" => "base"}
added = {"base" => "non-acid", "salt" => "NaCl"}
new_dict = dict.merge(added) {|key,old,new| old < new ? old : new }
# {"salt" => "NaCl", "pedestal" => "base", "base" => "foundation"}

The second result using the block is thus different from the previous example. Also be aware that the mutator methods merge! and update! will change the receiver in place.

8.2.12 Creating a Hash from an Array

The easiest way to create a hash from an array is with the to_h method on an array of two-element arrays. It’s also possible to use the bracket method on the Hash class, with either two-element arrays or a single array with an even number of elements:

pairs = [[2, 3], [4, 5], [6, 7]]
array = [2, 3, 4, 5, 6, 7]
h1 = pairs.to_h # {2 => 3, 4 => 5, 6 => 7}
h2 = Hash[pairs] # {2 => 3, 4 => 5, 6 => 7}
h3 = Hash[*array] # {2 => 3, 4 => 5, 6 => 7}

8.2.13 Finding Difference or Intersection of Hash Keys

Because the keys of a hash can be extracted as a separate array, the extracted arrays of different hashes can be manipulated using the Array class methods & and - to produce the intersection and difference of the keys. The matching values can be generated with the each method performed on a third hash representing the merge of the two hashes (to ensure all keys can be found in one place):

a = {"a" => 1,"b" => 2,"z" => 3}
b = {"x" => 99,"y" => 88,"z" => 77}
intersection = a.keys & b.keys
difference = a.keys - b.keys
c = a.merge(b)
inter = {}
intersection.each {|k| inter[k] = c[k] }
# inter is {"z" => 77}
diff={}
difference.each {|k| diff[k] = c[k] }
# diff is {"a" => 1, "b" => 2}

8.2.14 Using a Hash as a Sparse Matrix

Often we want to use an array or matrix that is nearly empty. We could store it in the conventional way, but this is often wasteful of memory. A hash provides a way to store only the values that actually exist.

In the following example, we assume that the nonexistent values should default to zero:

values = Hash.new(0)
values[1001] = 5
values[2010] = 7
values[9237] = 9
x = values[9237] # 9
y = values[5005] # 0

Obviously, in the preceding example, an array would have created more than 9,000 unused elements. This may not be acceptable.

What if we want to implement a sparse matrix of two or more dimensions? All we need do is use arrays as the hash keys:

cube = Hash.new(0)
cube[[2000,2000,2000]] = 2
z = cube[[36,24,36]] # 0

In this case, we see that literally billions of array elements would need to be created if this three-dimensional matrix were to be complete.

8.2.15 Implementing a Hash with Duplicate Keys

Purists would likely say that if a hash has duplicate keys, it isn’t really a hash. We don’t want to argue. Call it what you will; there might be occasions where you want a data structure that offers the flexibility and convenience of a hash but allows duplicate key values.

We offer a partial solution in Listing 8.1. It is partial for two reasons: First, we have not bothered to implement all the functionality that could be desired but only a good representative subset. Second, the inner workings of Ruby are such that a hash literal is always an instance of the Hashclass, and even if we were to inherit from Hash, a hash literal would not be allowed to contain duplicates.

Listing 8.1 Hash with Duplicate Keys


class HashDup

def initialize(*all)
all.flatten!
raise IndexError if all.size % 2 != 0
@store = {}
all.each do |key, val|
self.store(key, val)
end
end

def store(k, v)
if @store.has_key?(k)
@store[k] += [v]
else
@store[k] = [v]
end
end

def [](key)
@store[key]
end

def []=(key,value)
self.store(key,value)
end

def to_s
@store.to_s
end

def to_a
@store.to_a
end

def inspect
@store.inspect
end

def keys
result=[]
@store.each do |k,v|
result += ([k]*v.size)
end
result
end

def values
@store.values.flatten
end

def each
@store.each {|k,v| v.each {|y| yield k, y }}
end

alias each_pair each

def each_key
self.keys.each {|k| yield k }
end

def each_value
self.values.each {|v| yield v }
end

def has_key? k
self.keys.include? k
end

def has_value? v
self.values.include? v
end

def length
self.values.size
end

alias size length

def delete(k)
val = @store[k]
@store.delete k
val
end

def delete(k,v)
@store[k] -= [v] if @store[k]
v
end

# Other methods omitted here...

end


# This won't work... dup key will ignore
# first occurrence.
h = {1 => 1, 2 => 4, 3 => 9, 4 => 16, 2 => 0}

# This will work...
h = HashDup.new(1,1, 2,4, 3,9, 4,16, 2,0)

k = h.keys # [4, 1, 2, 2, 3]
v = h.values # [16, 1, 4, 0, 9]

n = h.size # 5

h.each {|k,v| puts "#{k} => #{v}"}
# Prints:
# 4 => 16
# 1 => 1
# 2 => 4
# 2 => 0
# 3 => 9


As long as you stay away from the hash-literal notation, this problem is doable. In Listing 8.1, we implement a class that has a “store” (@store), which is a simple hash; each value in the hash is an array. We control access to the hash in such a way that when we find ourselves adding a key that already exists, we add the value to the existing array of items associated with that key.

What should size return? Obviously the “real” number of key-value pairs including duplicates. Likewise, the keys method returns a value potentially containing duplicates. The iterators behave as expected; as with a normal hash, there is no predicting the order in which the pairs will be visited.

Besides the usual delete, we have implemented a delete_pair method. The former deletes all values associated with a key; the latter deletes only the specified key-value pair. (Note that it would have been difficult to make a single method like delete(k,v=nil) because nil is a valid value for any hash.)

For brevity, we have not implemented the entire class, and, frankly, some of the methods such as invert would require some design decisions as to what their behavior should be. The interested reader can flesh out the rest as needed.

8.2.16 Other Hash Operations

Hashes normally are indexed based on the value of the key. If we want to, we can alter the hash to store by the object identity instead. The method compare_by_identity will force the hash to this special state, and compare_by_identity? will tell which state a hash is in:

h1 = { "alpha" => 1, :echo => 2, 35 => 3 }
h2 = h1.dup.compare_by_identity

h1.compare_by_identity? # false
h2.compare_by_identity? # true

a1 = h1.values_at("alpha", :echo, 35) # [1, 2, 3]
a2 = h2.values_at("alpha", :echo, 35) # [nil, 2, 3]

The reason for the preceding behavior, of course, is that symbols and integers are immediate values in Ruby (as are true, false, and nil). Two quoted strings, however, will create different objects with different object IDs.

8.3 Enumerables in General

What makes a collection enumerable? Largely it is just the fact of being a collection. The module Enumerable has the requirement that the default iterator each should be defined. Sequence as such is not an issue because even an unordered collection can have an iterator.

Additionally, if the methods min, max, and sort are to be used, the collection must have a comparison method (<=>). This is fairly obvious.

So an enumerable is just a collection that can be traversed, and thus searched, and then possibly sorted. As a rule of thumb, any user-defined collection that does not subclass an existing core class should probably mix in the Enumerable module.

Bear in mind that what we say about one enumerable applies in effect to all of them. The actual data structure could be an array, a hash, or a tree, to name a few.

There are, of course, some nuances of behavior. An array is a collection of individual items, whereas a hash is a collection of paired key-value associations. Naturally there will be differences in their behavior.

Many of the methods we looked at for arrays and/or hashes (such as map and find) really originate here in the Enumerable module. In many cases, it was difficult to determine how to cover this material. Any confusion or inaccuracy should be considered my fault.

The array is the most common and representative collection that mixes in this module. Therefore, by default, I will use it as an example.

8.3.1 The inject Method

The inject method comes to Ruby via Smalltalk. Its behavior is interesting, if a little difficult to grasp at first sight.

This method relies on the fact that frequently we will iterate through a list and “accumulate” a result that changes as we iterate. The most common example might be finding the sum of a list of numbers. Whatever the operation, there is usually an “accumulator” of some kind (for which we supply an initial value) and a function or operation we apply (represented in Ruby as a block).

For a trivial example or two, suppose that we have this array of numbers and we want to find the sum of all of them:

nums = [3,5,7,9,11,13]
sum = nums.inject(0) {|x,n| x+n }

Note how we start with an accumulator of 0 (the “addition identity”). Then the block gets the current accumulated value and the current value from the list passed in. In each case, the block takes the previous sum and adds the current item to it.

Obviously, this is equivalent to the following piece of code:

sum = 0
nums.each {|n| sum += n }

So the abstraction level is only slightly higher. If inject never fits nicely in your brain, don’t use it. But if you get over the initial confusion, you might find yourself inventing new and elegant ways to use it.

The accumulator value is optional. If it is omitted, the first item is used as the accumulator and is then omitted from iteration:

sum = nums.inject {|x,n| x+n }

# Means the same as:

sum = nums[0]
nums[1..-1].each {|n| sum += n }

A similar example is finding the product of the numbers. Note that the accumulator, if given, must be 1 because that is the “multiplication identity.”

prod = nums.inject(1) {|x,n| x*n }

# or:

prod = nums.inject {|x,n| x*n }

The following slightly more complex example takes a list of words and finds the longest word in the list:

words = %w[ alpha beta gamma delta epsilon eta theta ]
longest_word = words.inject do |best,w|
w.length > best.length ? w : best
end
# return value is "epsilon"

8.3.2 Using Quantifiers

The quantifiers any? and all? make it easier to test the nature of a collection. Each of these takes a block (which of course tests true or false).

nums = [1,3,5,8,9]

# Are any of these numbers even?
flag1 = nums.any? {|x| x % 2 == 0 } # true

# Are all of these numbers even?
flag2 = nums.all? {|x| x % 2 == 0 } # false

In the absence of a block, these simply test the truth value of each element. That is, a block {|x| x } is added implicitly.

flag1 = list.all? # list contains no falses or nils
flag1 = list.any? # list contains at least one true value (non-nil
# or non-false)

Bear in mind that the all? quantifier works in the true mathematical sense. That is, an empty collection will return a true value (because there is no element for which the block is evaluated).

8.3.3 The partition Method

As the saying goes, “There are two kinds of people in the world—those who divide people into two kinds, and those who don’t.” The partition method doesn’t deal with people (unless we can encode them as Ruby objects), but it does divide a collection into two parts.

When partition is called and passed a block, the block is evaluated for each element in the collection. The truth value of each result is then evaluated, and a pair of arrays (inside another array) is returned. All the elements resulting in true go in the first array; the others go in the second:

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]

odd_even = nums.partition {|x| x % 2 == 1 }
# [[1,3,5,7,9],[2,3,4,6,8]]

under5 = nums.partition {|x| x < 5 }
# [[1,2,3,4],[5,6,7,8,9]]

squares = nums.partition {|x| Math.sqrt(x).to_i**2 == x }
# [[1,4,9],[2,3,5,6,7,8]]

If we wanted to partition into more than two groups, we could use group_by, which returns a hash, with one key for each result of the given block:

nums = [1,2,3,4,5,6,7,8,9]
mod3 = nums.group_by {|x| x % 3 }
# { 0 => [3,6,9], 1 => [1,4,7], 2 => [2,5,8] }

words = %w[ area arboreal brick estrous clear
donor ether filial patina ]
vowels = words.group_by {|x| x.count("aeiou") } # How many vowels?
# {1 => ["brick"], 2 => ["clear", "donor", "ether"],
# 3 => ["area", "estrous", "filial", "patina"], 4 => ["arboreal"]}

initials = words.group_by {|x| x[0..0] } # By first letter
# {"a" => ["area", "arboreal"], "b" => ["brick"],
# "c" => ["clear"], "d" => ["donor"], "f" => ["filial"],
# "p" => ["patina"], "e" =>["estrous", "ether"]}

8.3.4 Iterating by Groups

In every case we’ve seen so far, we iterate over a list a single item at a time. However, there might be times when we want to grab these in pairs or triples or some other quantity.

The iterator each_slice takes a parameter, n, and iterates over that many elements at a time. If there are not enough items left to form a slice, the last slice will be smaller in size.

arr = [1,2,3,4,5,6,7,8,9,10]
arr.each_slice(3) do |triple|
puts triple.join(",")
end

# Output:
# 1,2,3
# 4,5,6
# 7,8,9
# 10

There is also the possibility of iterating with a “sliding window” of the given size with the each_cons iterator. (If this name seems unintuitive, it is part of the heritage of LISP. Think of it as meaning consecutive.) In this case, the slices will always be the same size.

arr = [1,2,3,4,5,6,7,8,9,10]
arr.each_cons(3) do |triple|
puts triple.join(",")
end

# Output:
# 1,2,3
# 2,3,4
# 3,4,5
# 4,5,6
# 5,6,7
# 6,7,8
# 7,8,9
# 8,9,10

8.3.5 Converting to Arrays or Sets

Every enumerable can in theory be converted trivially to an array (by using to_a). For example, a hash results in a nested array of pairs:

hash = {1 => 2, 3 => 4, 5 => 6}
arr = hash.to_a # [[5, 6], [1, 2], [3, 4]]

The method entries is an alias for the to_a method.

If the set library has been required, you can use the to_set method, which works as expected. See Section 9.1, “Working with Sets,” for a discussion of sets.

require 'set'
hash = {1 => 2, 3 => 4, 5 => 6}
set = hash.to_set # #<Set: {[1, 2], [3, 4], [5, 6]}>

8.3.6 Using Enumerator Objects

An enumerator is basically an object that can be used for external or internal iteration. In internal iteration, we simply iterate over each item in the collection and execute the block for each item in sequence; external iteration means that the code can grab the next item in sequence “on demand.”

To understand external versus internal iteration, think of getline as providing an external iterator onto an IO object. You call it at will, and it provides you with data. Contrast that with the internal iterator each_line, which simply passes each line in succession into the code block.

Sometimes internal iterators are not appropriate to the problem at hand. There is always a valid solution, but it may not always be convenient. Sometimes an external iterator is more convenient.

The iterator method each does not require a block. If one is not specified, the method returns an enumerator object:

items = [1,2,3,4]
enum = items.each
enum.each {|x| puts x } # Prints numbers one per line

Obviously, in the preceding code fragment, converting to an enumerator is a waste of effort. However, here is an example of external iteration. Imagine we have an array of numbers and names. Each number tells how many words (separate array elements) are in the next name. We want to extract the names and print them. With internal iteration, this is inconvenient; but in this example, it is easy:

people = [2, "George ", "Washington",
3, "Edgar ", "Allan ", "Poe",
2, "John ", "Glenn",
4, "Little ", "Red ", "Riding ", "Hood",
1, "Sting"]
enum = people.each
loop do
count = enum.next # Grab next item from array
count.times { print enum.next }
puts
end

Notice that we extract an item from the array with the next method. For simplicity, I embedded spaces in the names, and there is no error checking here.

You may be wondering how an apparently infinite loop ever terminates. The “magic” here is that when we try to take an item that isn’t there, we get a nil value, but something else happens. The enumerator raises a StopIteration exception, which is implicitly caught. If this happens other than in a loop, it should be caught explicitly. Otherwise, the program will terminate with that exception.

The methods each_slice and each_cons also return enumerator objects:

deck = (1..52).sort_by { rand } # shuffle
dealer = deck.each_slice(5)
hands = []
4.times do |i|
hands << dealer.next } # not actually how a human deals
end

sequence = %w[In Xanadu did Kublai Khan
a stately pleasure dome decree]
search_term = %w[stately pleasure dome]
enum = sequence.each_cons(search_term.length)
index = 0

loop do
triplet = enum.next
break if triplet == search_term
index += 1
end

if index < sequence.length - 1
puts "Search term found at position #{index}"
else
puts "Search term not found"
end

There is a rewind method that “resets” the internal state to the beginning of the enumerable sequence:

list = [10, 20, 30, 40, 50]
enum = list.each
puts enum.next # 10
puts enum.next # 20
puts enum.next # 30
enum.rewind
puts enum.next # 10

The with_index method is a simple (internal) iterator. It can be used with another enumerator, and it returns an iterator itself:

list = [10, 20, 30, 40, 50]
list.each.with_index {|x,i| puts "list[#{i}] = #{x}"] }
# or...
enum = list.each.with_index
loop { x, i = enum2.next; puts "list[#{i}] = #{x}" } # Same result

8.4 More on Enumerables

There are many other methods on Enumerable, and I cover most of them here. For convenience, I’ve divided them a little arbitrarily into four areas: searching and selecting, counting and comparing, iterating, and (finally) extracting and converting.

8.4.1 Searching and Selecting

In addition to other methods I’ve covered, such as select and reject, there is a method called find_index, which is the “generic” equivalent of an index on an array (in fact, for an array, both these methods will return the same result). This method will do a search for the first object equal to the one passed as a parameter and return the (zero-based) index for that object. Here is an example:

array = [10, 20, 30, 40, 50, 30, 20]
location = array.find_index(30) # result is 2

Not every enumerable will necessarily be able to return a meaningful result (for example, a set, which is unordered). In this case, nil will be returned.

The methods first and last will return the first or last n items of the collection (defaulting to 1):

array = [14, 17, 23, 25, 29, 35]
head = array.first # 14
tail = array.last # 35
front = array.first(2) # [14, 17]
back = array.last(3) # [25, 29, 35]

There are also two quantifiers: one? and none?. The one? method is easy to understand; the code block must evaluate to true exactly once for it to return true:

array = [1, 3, 7, 10, 15, 17, 21]
array.one? {|x| x % 2 == 0 } # true (one even number)
array.one? {|x| x > 16 } # false
[].one? {|x| true } # empty array always returns false

But none? might be a little less intuitive. The rule is, it returns true if the block never evaluates to true.

array = [1, 3, 7, 10, 15, 17, 21]
array.none? {|x| x > 50 } # true (block was false for all elements)
array.none? {|x| x == 1 } # false
[].none? {|x| true } # true (block is not run on empty array)

If the code block is omitted, every item in the collection is tested for truth or falsehood, and they must all test false for none? to return true.

8.4.2 Counting and Comparing

The count method is easy to understand. It may take a parameter, a code block, or neither. In the degenerate case, it simply returns the size of the collection. If an object is specified as a parameter, the number of instances of that object will be counted (using ==). Given a block, it counts the number of items in the collection for which the block returns true.

days = %w[Sunday Monday Tuesday Wednesday Thursday Friday Saturday]
days.count # 7 in all
days.count("Saturday") # 1 (there's only one Saturday!)
days.count {|x| x.length == 6 } # 3 are six letters long

We have already seen the min and max methods. These depend on the existence of the spaceship comparison method <=> (or the inclusion of the module Comparable). There is also a method called minmax that returns both at once in an array:

days = %w[Sunday Monday Tuesday Wednesday Thursday Friday Saturday]
days.minmax # ["Friday", "Wednesday"]

If we want to compare items by some complex metric rather than a straightforward comparison, the variants min_by, max_by, and minmax_by will take an arbitrary code block (by analogy with sort_by, which we saw earlier in this chapter):

days = %w[Sunday Monday Tuesday Wednesday Thursday Friday Saturday]
days.min_by {|x| x.length } # "Sunday" (though there are others)
days.max_by {|x| x.length } # "Wednesday"
days.minmax_by {|x| x.reverse } # ["Friday", "Thursday"]

The final result contains the first and last elements of the array after it has been sorted in reverse alphabetical order.

8.4.3 Iterating

We’ve seen almost every form of iteration already. But one that we have missed is cycle, which can iterate over the collection more than once or even “infinitely.” The parameter specifies the number of cycles, defaulting to infinity.

months = %w[Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec]
months.cycle(2) {|m| puts m } # loops 24 times in all
months.cycle {|m| puts m } # a true infinite loop

The each_with_object method works similarly to inject, but it conveniently returns the same object passed into it. This avoids the situation in which you must explicitly return the accumulator at the end of the block, which is ugly. (This example comes from David Alan Black.)

h = Hash.new(0)
result = [1,2,3,4].inject(h) {|acc,n| acc[n] += 1; acc } # Ugly

h = Hash.new(0)
result = [1,2,3,4].each_with_object(h) do |n, acc|
acc[n] += 1 } # Better
end

8.4.4 Extracting and Converting

Any enumerable collection can be extracted into an array by using to_a or entries, as we have already seen. There are also ways of intelligently extracting certain parts of the collection into an array.

An example is take and its companion take_while. These methods return a list of items from the front of a collection. In this example, we start with a hash; each “item” in the collection is a key-value pair, returned as a subarray. For another example, we do these same operations on arrays:

hash = {1 => 2, 3 => 6, 4 => 8, 5 => 10, 7 => 14}
arr1 = hash.take(2) # [[1,2], [3,6]]
arr2 = hash.take_while {|k,v| v <= 8 } # [[1,2], [3,6], [4,8]]
arr3 = arr1.take(1) # [[1,2]]
arr4 = arr2.take_while {|x| x[0] < 4 } # [[1,2], [3,6]]

The drop method is complementary to take. It ignores (drops) items on the front of the collection and returns the remaining items. There is a drop_while also.

hash = {1 => 2, 3 => 6, 4 => 8, 5 => 10, 7 => 14}
arr1 = hash.drop(2) # [[4,8], [5,10], [7 => 14]]
arr2 = hash.take_while {|k,v| v <= 8 } # [[5,10], [7 => 14]]

The reduce method is also similar in spirit to inject. It applies a binary operation (specified by a symbol) to each pair of items in the collection, or it may take a block instead. If an initial value is specified for the accumulator, it will be used; otherwise, the first value in the collection will be used. So the basic variations are these:

range = 3..6
# symbol
range.reduce(:*) # 3*4*5*6 = 360
# initial value, symbol
range.reduce(2, :*) # 2*3*4*5*6 = 720
# initial value, block
range.reduce(10) {|acc, item| acc += item } # 10+3+4+5+6 = 28
# block
range.reduce {|acc, item| acc += item } # 3+4+5+6 = 28

Note that it’s impossible to specify both a binary operator symbol and a block together.

Enumerables can also be converted to JSON format (if the json library is required) or to a set (if the set library is required):

require 'set'
require 'json'

array = [3,4,5,6]
p array.to_set # #<Set: {3, 4, 5, 6}>
p array.to_json # "[3,4,5,6]"

8.4.5 Lazy Enumerators

Each Enumerable covered in this chapter (and others, such as Range) also allows its iteration methods to be called without a block. In that case, an instance of Enumerator is returned. That Enumerator can be iterated on later or combined with other Enumerators, such aswith_index, for additional functionality (as we saw in Section 8.1.18, “Iterating over an Array”).

More than just combining, though, the Enumerable method lazy returns a special type of Enumerator that calculates the next item only when it is requested. This makes it possible to iterate over groups that are too big to store, like every odd number between 1 and infinity:

enum = (1..Float::INFINITY).each # an Enumerator
lazy = enum.lazy # a LazyEnumerator of integers
odds = lazy.select(&:odd?) # a LazyEnumerator of odd integers

odds.first(5) # [1, 3, 5, 7, 9]
odds.next # 1
odds.next # 3

Lazy enumerators provide some new ways to save memory and time while iterating over big groups, so I encourage you to read the LazyEnumerator class documentation.

8.5 Conclusion

We’ve taken a good look here at arrays, hashes, and enumerables in general. We’ve seen some similarities between arrays and hashes—most of which are due to the fact that both mix in Enumerable—as well as some differences. We’ve looked at converting between arrays and hashes, and we’ve seen some interesting ways of extending their standard behavior.

We’ve examined common methods of iteration, such as each_slice and each_cons. We’ve also seen how enumerator objects work.

In Chapter 9, we again look at high-level data structures. But these will be slightly higher level and may not necessarily be provided as part of the Ruby core or standard libraries. These include sets, stacks, queues, trees, and graphs.