Collections central: Enumerable and Enumerator - Built-in classes and modules - The Well-Grounded Rubyist, Second Edition (2014)

The Well-Grounded Rubyist, Second Edition (2014)

Part 2. Built-in classes and modules

Chapter 10. Collections central: Enumerable and Enumerator

This chapter covers

· Mixing Enumerable into your classes

· The use of Enumerable methods in collection objects

· Strings as quasi-enumerable objects

· Sorting enumerables with the Comparable module

· Enumerators

All collection objects aren’t created equal—but an awful lot of them have many characteristics in common. In Ruby, common characteristics among many objects tend to reside in modules. Collections are no exception: collection objects in Ruby typically include the Enumerable module.

Classes that use Enumerable enter into a kind of contract: the class has to define an instance method called each, and in return, Enumerable endows the objects of the class with all sorts of collection-related behaviors. The methods behind these behaviors are defined in terms of each. In some respects, you might say the whole concept of a “collection” in Ruby is pegged to the Enumerable module and the methods it defines on top of each.

You’ve already seen a bit of each in action. Here, you’ll see a lot more. Keep in mind, though, that although every major collection class partakes of the Enumerable module, each of them has its own methods too. The methods of an array aren’t identical to those of a set; those of a range aren’t identical to those of a hash. And sometimes collection classes share method names but the methods don’t do exactly the same thing. They can’t always do the same thing; the whole point is to have multiple collection classes but to extract as much common behavior as possible into a common module.

You can mix Enumerable into your own classes:

class C
include Enumerable
end

By itself, that doesn’t do much. To tap into the benefits of Enumerable, you must define an each instance method in your class:

class C
include Enumerable
def each
# relevant code here
end
end

At this point, objects of class C will have the ability to call any instance method defined in Enumerable.

In addition to the Enumerable module, in this chapter we’ll look at a closely related class called Enumerator. Enumerators are objects that encapsulate knowledge of how to iterate through a particular collection. By packaging iteration intelligence in an object that’s separate from the collection itself, enumerators add a further and powerful dimension to Ruby’s already considerable collection-manipulation facilities.

Let’s start by looking more closely at each and its role as the engine for enumerable behavior.

10.1. Gaining enumerability through each

Any class that aspires to be enumerable must have an each method whose job is to yield items to a supplied code block, one at a time.

Exactly what each does will vary from one class to another. In the case of an array, each yields the first element, then the second, and so forth. In the case of a hash, it yields key/value pairs in the form of two-element arrays. In the case of a file handle, it yields one line of the file at a time. Ranges iterate by first deciding whether iterating is possible (which it isn’t, for example, if the start point is a float) and then pretending to be an array. And if you define an each in a class of your own, it can mean whatever you want it to mean—as long as it yields something. So each has different semantics for different classes. But however each is implemented, the methods in the Enumerable module depend on being able to call it.

You can get a good sense of how Enumerable works by writing a small, proof-of-concept class that uses it. The following listing shows such a class: Rainbow. This class has an each method that yields one color at a time. Because the class mixes in Enumerable, its instances are automatically endowed with the instance methods defined in that module.

Listing 10.1. An Enumerable class and its deployment of the each method

class Rainbow
include Enumerable
def each
yield "red"
yield "orange"
yield "yellow"
yield "green"
yield "blue"
yield "indigo"
yield "violet"
end
end

Every instance of Rainbow will know how to iterate through the colors. In the simplest case, we can use the each method:

r = Rainbow.new
r.each do |color|
puts "Next color: #{color}"
end

The output of this simple iteration is as follows:

Next color: red
Next color: orange
Next color: yellow
Next color: green
Next color: blue
Next color: indigo
Next color: violet

But that’s just the beginning. Because Rainbow mixed in the Enumerable module, rainbows are automatically endowed with a whole slew of methods built on top of the each method.

Here’s an example: find, which returns the first element in an enumerable object for which the code block provided returns true. Let’s say we want to find the first color that begins with the letter y. We can do it with find, like this:

find works by calling each. each yields items, and find uses the code block we’ve given it to test those items one at a time for a match. When each gets around to yielding yellow, find runs it through the block and it passes the test. The variable y_color therefore receives the value yellow. Notice that there’s no need to define find. It’s part of Enumerable, which we’ve mixed in. It knows what to do and how to use each to do it.

Defining each, together with mixing in Enumerable, buys you a great deal of functionality for your objects. Much of the searching and querying functionality you see in Ruby arrays, hashes, and other collection objects comes directly from Enumerable. If you want to know which methodsEnumerable provides, ask it:

>> Enumerable.instance_methods(false).sort
=> [:all?, :any?, :chunk, :collect, :collect_concat, :count, :cycle, :detect,
:drop, :drop_while, :each_cons, :each_entry, :each_slice, :each_with_index,
:each_with_object, :entries, :find, :find_all, :find_index, :first,
:flat_map, :grep, :group_by, :include?, :inject, :lazy, :map, :max, :max_by,
:member?, :min, :min_by, :minmax, :minmax_by, :none?, :one?, :partition,
:reduce, :reject, :reverse_each, :select, :slice_before, :sort, :sort_by,
:take, :take_while, :to_a, :to_h, :zip]

Thanks to the false argument, the list includes only the methods defined in the Enumerable module itself. Each of these methods is built on top of each.

In the sections that follow, you’ll see examples of many of these methods. Some of the others will crop up in later chapters. The examples throughout the rest of this chapter will draw on all four of the major collection classes—Array, Hash, Range, and Set—more or less arbitrarily. Chapter 9introduced you to these classes individually. Armed with a sense of what makes each of them tick, you’re in a good position to study what they have in common.

Some of the methods in Ruby’s enumerable classes are actually overwritten in those classes. For example, you’ll find implementations of map, select, sort, and other Enumerable instance methods in the source-code file array.c; the Array class doesn’t simply provide an each method and mix in Enumerable (though it does do that, and it gains behaviors that way). These overwrites are done either because a given class requires special behavior in the face of a given Enumerable method, or for the sake of efficiency. We’re not going to scrutinize all the overwrites. The main point here is to explore the ways in which all of the collection classes share behaviors and interface.

In what follows, we’ll look at several categories of methods from Enumerable. We’ll start with some Boolean methods.

10.2. Enumerable Boolean queries

A number of Enumerable methods return true or false depending on whether one or more element matches certain criteria. Given an array states, containing the names of all the states in the United States of America, here’s how you might perform some of these Boolean queries:

# Does the array include Louisiana?
>> states.include?("Louisiana")
=> true
# Do all states include a space?
>> states.all? {|state| state =~ / / }
=> false

# Does any state include a space?
>> states.any? {|state| state =~ / / }
=> true
# Is there one, and only one, state with "West" in its name?
>> states.one? {|state| state =~ /West/ }
=> true
# Are there no states with "East" in their names?
>> states.none? {|state| state =~ /East/ }
=> true

If states were, instead, a hash with state names as keys and abbreviations as values, you could run similar tests, although you’d need to adjust for the fact that Hash#each yields both a key and a value each time through. The Hash#include? method checks for key inclusion, as you saw inchapter 9, but the other methods in the previous example handle key/value pairs:

In all of these cases, you could grab an array via states.keys and perform the tests on that array directly:

# Do all states include a space?
>> states.keys.all? {|state, abbr| state =~ / / }
=> false

Generating the entire keys array in advance, rather than walking through the hash that’s already there, is slightly wasteful of memory. Still, the new array contains the key objects that already exist, so it only “wastes” the memory devoted to wrapping the keys in an array. The memory taken up by the keys themselves doesn’t increase.

Hashes iterate with two-element arrays

When you iterate through a hash with each or any other built-in iterator, the hash is yielded to your code block one key/value pair at a time—and the pairs are two-element arrays. You can, if you wish, provide just one block parameter and capture the whole little array:

hash.each {|pair| ... }

In such a case, you’ll find the key at pair[0] and the value at pair[1]. Normally, it makes more sense to grab the key and value in separate block parameters. But all that’s happening is that the two are wrapped up in a two-element array, and that array is yielded. If you want to operate on the data in that form, you may.

What about sets and ranges? Set iteration works much like array iteration for Boolean query (and most other) purposes: if states were a set, you could run exactly the same queries as the ones in the example with the same results. With ranges, enumerability gets a little trickier.

It’s more meaningful to view some ranges as enumerable—as collections of items that you can step through—than others. The include? method works for any range. But the other Boolean Enumerable methods force the enumerability issue: if the range can be expressed as a list of discrete elements, then those methods work; but if it can’t, as with a range of floats, then calling any of the methods triggers a fatal error:

Given a range spanning two integers, you can run tests like one? and none? because the range can easily slip into behaving like a collection: in effect, the range 1..10 adopts the API of the corresponding array, [1,2,3,4,5,6,7,8,9,10].

But a range between two floats can’t behave like a finite collection of discrete values. It’s meaningless to produce “each” float inside a range. The range has the each method, but the method is written in such a way as to refuse to iterate over floats . (The fact that the error is TypeErrorrather than NoMethodError indicates that the each method exists but can’t function on this range.)

You can use a float as a range’s end point and still get enumeration, as long as the start point is an integer . When you call each (or one of the methods built on top of each), the range behaves like a collection of integers starting at the start point and ending at the end point rounded down to the nearest integer. That integer is considered to be included in the range, whether the range is inclusive or exclusive (because, after all, the official end point is a float that’s higher than the integer below it).

In addition to answering various true/false questions about their contents, enumerable objects excel at performing search and select operations. We’ll turn to those now.

10.3. Enumerable searching and selecting

It’s common to want to filter a collection of objects based on one or more selection criteria. For example, if you have a database of people registering for a conference, and you want to send payment reminders to the people who haven’t paid, you can filter a complete list based on payment status. Or you might need to narrow a list of numbers to only the even ones. And so forth; the use cases for selecting elements from enumerable objects are unlimited.

The Enumerable module provides several facilities for filtering collections and for searching collections to find one or more elements that match one or more criteria. We’ll look at several filtering and searching methods here. All of them are iterators: they all expect you to provide a code block. The code block is the selection filter. You define your selection criteria (your tests for inclusion or exclusion) inside the block. The return value of the entire method may, depending on which method you’re using and on what it finds, be one object, an array (possibly empty) of objects matching your criteria, or nil, indicating that the criteria weren’t met.

We’ll start with a one-object search using find and then work our way through several techniques for deriving a multiple-object result set from an enumerable query.

10.3.1. Getting the first match with find

find (also available as the synonymous detect) locates the first element in an array for which the code block, when called with that element as an argument, returns true. For example, to find the first number greater than 5 in an array of integers, you can use find like this:

>> [1,2,3,4,5,6,7,8,9,10].find {|n| n > 5 }
=> 6

find iterates through the array, yielding each element in turn to the block. If the block returns anything with the Boolean value of true, the element yielded “wins,” and find stops iterating. If find fails to find an element that passes the code-block test, it returns nil. (Try changing n > 5 to n > 100 in the example, and you’ll see.) It’s interesting to ponder the case where your array has nil as one of its elements, and your code block looks for an element equal to nil:

[1,2,3,nil,4,5,6].find {|n| n.nil? }

In these circumstances, find always returns nil—whether the search succeeds or fails! That means the test is useless; you can’t tell whether it succeeded. You can work around this situation with other techniques, such as the include? method, with which you can find out whether an array hasnil as an element. You can also provide a “nothing found” function—a Proc object—as an argument to find, in which case that function will be called if the find operation fails. We haven’t looked at Proc objects in depth yet, although you’ve seen some examples of them in connection with the handling of code blocks. For future reference, here’s an example of how to supply find with a failure-handling function:

In this example, the anonymous function (the Proc object) returns 11 , so even if there’s no number greater than 10 in the array, you get one anyway. (You’ll see lambdas and Proc objects up close in chapter 14.)

Although find always returns one object, find_all, also known as select, always returns an array, as does its negative equivalent reject.

The dominance of the array

Arrays serve generically as the containers for most of the results that come back from enumerable selecting and filtering operations, whether or not the object being selected from or filtered is an array. There are some exceptions to this quasi-rule, but it holds true widely.

The plainest way to see it is by creating an enumerable class of your own and watching what you get back from your select queries. Look again at the Rainbow class in listing 10.1. Now look at what you get back when you perform some queries:

>> r = Rainbow.new
=> #<Rainbow:0x45b708>
>> r.select {|color| color.size == 6 }
=> ["orange", "yellow", "indigo", "violet"]
>> r.map {|color| color[0,3] }
=> ["red", "ora", "yel", "gre", "blu", "ind", "vio"]
>> r.drop_while {|color| color.size < 5 }
=> ["orange", "yellow", "green", "blue", "indigo", "violet"]

In every case, the result set comes back in an array.

The array is the most generic container and therefore the logical candidate for the role of universal result format. A few exceptions arise. A hash returns a hash from a select or reject operation. Sets return arrays from map, but you can call map! on a set to change the elements of the set in place. For the most part, though, enumerable selection and filtering operations come back to you inside arrays.

10.3.2. Getting all matches with find_all (a.k.a. select) and reject

find_all (the same method as select) returns a new collection containing all the elements of the original collection that match the criteria in the code block, not just the first such element (as with find). If no matching elements are found, find_all returns an empty collection object.

In the general case—for example, when you use Enumerable in your own classes—the “collection” returned by select will be an array. Ruby makes special arrangements for hashes and sets, though: if you select on a hash or set, you get back a hash or set. This is enhanced behavior that isn’t strictly part of Enumerable.

We’ll stick to array examples here:

The first find_all operation returns an array of all the elements that pass the test in the block: all elements that are greater than 5 . The second operation also returns an array, this time of all the elements in the original array that are greater than 10. There aren’t any, so an empty array is returned .

(Arrays, hashes, and sets have a bang version, select!, that reduces the collection permanently to only those elements that passed the selection test. There’s no find_all! synonym; you have to use select!.)

Just as you can select items, so you can reject items, meaning that you find out which elements of an array do not return a true value when yielded to the block. Using the a array from the previous example, you can do this to get the array minus any and all elements that are greater than 5:

>> a.reject {|item| item > 5 }
=> [1, 2, 3, 4, 5]

(Once again there’s a bang, in-place version, reject!, specifically for arrays, hashes, and sets.)

If you’ve ever used the command-line utility grep, the next method will ring a bell. If you haven’t, you’ll get the hang of it anyway.

10.3.3. Selecting on threequal matches with grep

The Enumerable#grep method lets you select from an enumerable object based on the case equality operator, ===. The most common application of grep is the one that corresponds most closely to the common operation of the command-line utility of the same name, pattern matching for strings:

>> colors = %w{ red orange yellow green blue indigo violet }
=> ["red", "orange", "yellow", "green", "blue", "indigo", "violet"]
>> colors.grep(/o/)
=> ["orange", "yellow", "indigo", "violet"]

But the generality of === lets you do some fancy things with grep:

String === object is true for the two strings in the array, so an array of those two strings is what you get back from grepping for String . Ranges implement === as an inclusion test. The range 50..100 includes 75; hence the result from grepping miscellany for that range .

In general, the statement enumerable.grep(expression) is functionally equivalent to this:

enumerable.select {|element| expression === element }

In other words, it selects for a truth value based on calling ===. In addition, grep can take a block, in which case it yields each element of its result set to the block before returning the results:

>> colors = %w{ red orange yellow green blue indigo violet }
=> ["red", "orange", "yellow", "green", "blue", "indigo", "violet"]
>> colors.grep(/o/) {|color| color.capitalize }
=> ["Orange", "Yellow", "Indigo", "Violet"]

The full grep syntax

enumerable.grep(expression) {|item| ... }

thus operates in effect like this:

enumerable.select {|item| expression === item}.map {|item| ... }

Again, you’ll mostly see (and probably mostly use) grep as a pattern-based string selector. But keep in mind that grepping is pegged to case equality (===) and can be used accordingly in a variety of situations.

Whether carried out as select or grep or some other operation, selection scenarios often call for grouping of results into clusters or categories. The Enumerable #group_by and #partition methods make convenient provisions for exactly this kind of grouping.

10.3.4. Organizing selection results with group_by and partition

A group_by operation on an enumerable object takes a block and returns a hash. The block is executed for each object. For each unique block return value, the result hash gets a key; the value for that key is an array of all the elements of the enumerable for which the block returned that value.

An example should make the operation clear:

>> colors = %w{ red orange yellow green blue indigo violet }
=> ["red", "orange", "yellow", "green", "blue", "indigo", "violet"]
>> colors.group_by {|color| color.size }
=> {3=>["red"], 6=>["orange", "yellow", "indigo", "violet"],
5=>["green"], 4=>["blue"]}

The block {|color| color.size } returns an integer for each color. The hash returned by the entire group_by operation is keyed to the various sizes (3, 4, 5, 6), and the values are arrays containing all the strings from the original array that are of the size represented by the respective keys.

The partition method is similar to group_by, but it splits the elements of the enumerable into two arrays based on whether the code block returns true for the element. There’s no hash, just an array of two arrays. The two arrays are always returned in true/false order.

Consider a Person class, where every person has an age. The class also defines an instance method teenager?, which is true if the person’s age is between 13 and 19, inclusive:

class Person
attr_accessor :age
def initialize(options)
self.age = options[:age]
end
def teenager?
(13..19) === age
end
end

Now let’s generate an array of people:

people = 10.step(25,3).map {|i| Person.new(:age => i) }

This code does an iteration from 10 to 25 in steps of 3 (10, 13, 16, 19, 22, 25), passing each of the values to the block in turn. Each time through, a new Person is created with the age corresponding to the increment. Thanks to map, the person objects are all accumulated into an array, which is assigned to people. (The chaining of the iterator map to the iterator step is made possible by the fact that step returns an enumerator. You’ll learn more about enumerators presently.)

We’ve got our six people; now let’s partition them into teens and non-teens:

teens = people.partition {|person| person.teenager? }

The teens array has the following content:

[[#<Person:0x000001019d1a50 @age=13>, #<Person:0x000001019d19d8 @age=16>,
#<Person:0x000001019d1988 @age=19>], [#<Person:0x000001019d1ac8 @age=10>,
#<Person:0x000001019d1910 @age=22>, #<Person:0x000001019d1898 @age=25>]]

Note that this is an array containing two subarrays. The first contains those people for whom person.teenager? returned true; the second is the non-teens.

We can now use the information, for example, to find out how many teens and non-teens we have:

puts "#{teens[0].size} teens; #{teens[1].size} non-teens"

The output from this statement reflects the fact that half of our people are teens and half aren’t:

3 teens; 3 non-teens

Let’s look now at some “element-wise” operations—methods that involve relatively fine-grained manipulation of specific collection elements.

10.4. Element-wise enumerable operations

Collections are born to be traversed, but they also contain special-status individual objects: the first or last in the collection, and the greatest (largest) or least (smallest). Enumerable objects come with several tools for element handling along these lines.

10.4.1. The first method

Enumerable#first, as the name suggests, returns the first item encountered when iterating over the enumerable:

>> [1,2,3,4].first
=> 1
>> (1..10).first
=> 1
>> {1 => 2, "one" => "two"}.first
=> [1, 2]

The object returned by first is the same as the first object you get when you iterate through the parent object. In other words, it’s the first thing yielded by each. In keeping with the fact that hashes yield key/value pairs in two-element arrays, taking the first element of a hash gives you a two-element array containing the first pair that was inserted into the hash (or the first key inserted and its new value, if you’ve changed that value at any point):

Perhaps the most noteworthy point about Enumerable#first is that there’s no Enumerable#last. That’s because finding the end of the iteration isn’t as straightforward as finding the beginning. Consider a case where the iteration goes on forever. Here’s a little Die class (die as in the singular of dice). It iterates by rolling the die forever and yielding the result each time:

class Die
include Enumerable
def each
loop do
yield rand(6) + 1
end
end
end

The loop uses the method Kernel#rand. Called with no argument, this method generates a random floating-point number n such that 0 <= n < 1. With an argument i, it returns a random integer n such that 0 <= n < i. Thus rand(6) produces an integer in the range (0..5). Adding one to that number gives a number between 1 and 6, which corresponds to what you get when you roll a die.

But the main point is that Die#each goes on forever. If you’re using the Die class, you have to make provisions to break out of the loop. Here’s a little game where you win as soon as the die turns up 6:

puts "Welcome to 'You Win If You Roll a 6'!"
d = Die.new
d.each do |roll|
puts "You rolled a #{roll}."
if roll == 6
puts "You win!"
break
end
end

A typical run might look like this:

Welcome to 'You Win If You Roll a 6'
You rolled a 3.
You rolled a 2.
You rolled a 2.
You rolled a 1.
You rolled a 6.
You win!

The triviality of the game aside, the point is that it would be meaningless to call last on your die object, because there’s no last roll of the die. Unlike taking the first element, taking the last element of an enumerable has no generalizable meaning.

For the same reason—the unreachability of the end of the enumeration—an enumerable class with an infinitely yielding each method can’t do much with methods like select and map, which don’t return their results until the underlying iteration is complete. Occasions for infinite iteration are, in any event, few; but observing the behavior and impact of an endless each can be instructive for what it reveals about the more common, finite case.

Keep in mind, though, that some enumerable classes do have a last method: notably, Array and Range. Moreover, all enumerables have a take method, a kind of generalization of first, and a companion method called drop.

10.4.2. The take and drop methods

Enumerables know how to “take” a certain number of elements from the beginning of themselves and conversely how to “drop” a certain number of elements. The take and drop operations basically do the same thing—they divide the collection at a specific point—but they differ in what they return:

When you take elements, you get those elements. When you drop elements, you get the original collection minus the elements you’ve dropped. You can constrain the take and drop operations by providing a block and using the variant forms take_while and drop_while, which determine the size of the “take” not by an integer argument but by the truth value of the block:

>> states.take_while {|s| /N/.match(s) }
=> ["NJ", "NY"]
>> states.drop_while {|s| /N/.match(s) }
=> ["CT", "MA", "VT", "FL"]

The take and drop operations are a kind of hybrid of first and select. They’re anchored to the beginning of the iteration and terminate once they’ve satisfied the quantity requirement or encountered a block failure.

You can also determine the minimum and maximum values in an enumerable collection.

10.4.3. The min and max methods

The min and max methods do what they sound like they’ll do:

>> [1,3,5,4,2].max
=> 5
>> %w{ Ruby C APL Perl Smalltalk }.min
=> "APL"

Minimum and maximum are determined by the <=> (spaceship comparison operator) logic, which for the array of strings puts "APL" first in ascending order. If you want to perform a minimum or maximum test based on nondefault criteria, you can provide a code block:

>> %w{ Ruby C APL Perl Smalltalk }.min {|a,b| a.size <=> b.size }
=> "C"

A more streamlined block-based approach, though, is to use min_by or max_by, which perform the comparison implicitly:

There’s also a minmax method (and the corresponding minmax_by method), which gives you a pair of values, one for the minimum and one for the maximum:

>> %w{ Ruby C APL Perl Smalltalk }.minmax
=> ["APL", "Smalltalk"]
>> %w{ Ruby C APL Perl Smalltalk }.minmax_by {|lang| lang.size }
=> ["C", "Smalltalk"]

Keep in mind that the min/max family of enumerable methods is always available, even when using it isn’t a good idea. You wouldn’t want to do this, for example:

die = Die.new
puts die.max

The infinite loop with which Die#each is implemented won’t allow a maximum value ever to be determined. Your program will hang.

In the case of hashes, min and max use the keys to determine ordering. If you want to use values, the *_by members of the min/max family can help you:

And of course you can, if you wish, perform calculations inside the block that involve both the key and the value.

At this point, we’ve looked at examples of each methods and how they link up to a number of methods that are built on top of them. It’s time now to look at some methods that are similar to each but a little more specialized. The most important of these is map. In fact, map is important enough that we’ll look at it separately in its own section. First, let’s discuss some other each relatives.

10.5. Relatives of each

Enumerable makes several methods available to you that are similar to each, in that they go through the whole collection and yield elements from it, not stopping until they’ve gone all the way through (and in one case, not even then!). Each member of this family of methods has its own particular semantics and niche. The methods include reverse_each, each_with_index, each_slice, each_cons, cycle, and inject. We’ll look at them in that order.

10.5.1. reverse_each

The reverse_each method does what it sounds like it will do: it iterates backwards through an enumerable. For example, the code

[1,2,3].reverse_each {|e| puts e * 10 }

produces this output:

30
20
10

You have to be careful with reverse_each: don’t use it on an infinite iterator, since the concept of going in reverse depends on the concept of knowing what the last element is—which is a meaningless concept in the case of an infinite iterator. Try calling reverse_each on an instance of the Dieclass shown earlier—but be ready to hit Ctrl-c to get out of the infinite loop.

10.5.2. The each_with_index method (and each.with_index)

Enumerable#each_with_index differs from each in that it yields an extra item each time through the collection: namely, an integer representing the ordinal position of the item. This index can be useful for labeling objects, among other purposes:

An anomaly is involved in each_with_index: every enumerable object has it, but not every enumerable object has knowledge of what an index is. You can see this by asking enumerables to perform an each_index (as opposed to each_with_index) operation. The results vary from one enumerable to another:

>> %w{a b c }.each_index {|i| puts i }
0
1
2
=> ["a", "b", "c"]

Arrays, then, have a fundamental sense of an index. Hashes don’t—although they do have a sense of with index:

>> letters = {"a" => "ay", "b" => "bee", "c" => "see" }
=> {"a"=>"ay", "b"=>"bee", "c"=>"see"}
>> letters.each_with_index {|(key,value),i| puts i }
0
1
2
=> {"a"=>"ay", "b"=>"bee", "c"=>"see"}
>> letters.each_index {|(key,value),i| puts i }
NoMethodError: undefined method `each_index' for {"a"=>"ay",
"b"=>"bee", "c"=>"see"}:Hash

We could posit that a hash’s keys are its indexes and that the ordinal numbers generated by the each_with_index iteration are extra or meta-indexes. It’s an interesting theoretical question; but in practice it doesn’t end up mattering much, because it’s extremely unusual to need to perform aneach_with_index operation on a hash.

Enumerable#each_with_index works, but it’s somewhat deprecated. Instead, consider using the #with_index method of the enumerator you get back from calling each. You’ve already seen this technique in chapter 9:

>> array = %w{ red yellow blue }
=> ["red", "yellow", "blue"]

>> array.each.with_index do |color, i|
?> puts "Color number #{i} is #{color}."
>> end

It’s as simple as changing an underscore to a period ... though there’s a little more to it under the hood, as you’ll see when you learn more about enumerators a little later. (See section 10.11.2 for more on each_index.) Using each_index also buys you some functionality: you can provide an argument that will be used as the first index value, thus avoiding the need to add one to the index in a case like the previous list of presidents:

>> names.each.with_index(1) do |pres, i|
?> puts "#{i} #{pres}"
>> end

Another subfamily of each relatives is the pair of methods each_slice and each_cons.

10.5.3. The each_slice and each_cons methods

The each_slice and each_cons methods are specializations of each that walk through a collection a certain number of elements at a time, yielding an array of that many elements to the block on each iteration. The difference between them is that each_slice handles each element only once, whereas each_cons takes a new grouping at each element and thus produces overlapping yielded arrays.

Here’s an illustration of the difference:

>> array = [1,2,3,4,5,6,7,8,9,10]
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>> array.each_slice(3) {|slice| p slice }
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]
=> nil
>> array.each_cons(3) {|cons| p cons }
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]
[5, 6, 7]
[6, 7, 8]
[7, 8, 9]
[8, 9, 10]
=> nil

The each_slice operation yields the collection progressively in slices of size n (or less than n, if fewer than n elements remain). By contrast, each_cons moves through the collection one element at a time and at each point yields an array of n elements, stopping when the last element in the collection has been yielded once.

Yet another generic way to iterate through an enumerable is with the cycle method.

10.5.4. The cycle method

Enumerable#cycle yields all the elements in the object again and again in a loop. If you provide an integer argument, the loop will be run that many times. If you don’t, it will be run infinitely.

You can use cycle to decide dynamically how many times you want to iterate through a collection—essentially, how many each-like runs you want to perform consecutively. Here’s an example involving a deck of playing cards:

The class PlayingCard defines constants representing suits and ranks , whereas the PlayingCard::Deck class models the deck. The cards are stored in an array in the deck’s @cards instance variable, available also as a reader attribute . Thanks to cycle, it’s easy to arrange for the possibility of combining two or more decks. Deck.new takes an argument, defaulting to 1 . If you override the default, the process by which the @cards array is populated is augmented.

For example, this command produces a double deck of cards containing two of each card for a total of 104:

deck = PlayingCard::Deck.new(2)

That’s because the method cycles through the suits twice, cycling through the ranks once per suit iteration . The ranks cycle is always done only once ; cycle(1) is, in effect, another way of saying each. For each permutation, a new card, represented by a descriptive string, is inserted into the deck .

Last on the each-family method tour is inject, also known as reduce.

10.5.5. Enumerable reduction with inject

The inject method (a.k.a. reduce and similar to “fold” methods in functional languages) works by initializing an accumulator object and then iterating through a collection (an enumerable object), performing a calculation on each iteration and resetting the accumulator, for purposes of the next iteration, to the result of that calculation.

The classic example of injecting is the summing up of numbers in an array. Here’s how to do it:

>> [1,2,3,4].inject(0) {|acc,n| acc + n }
=> 10

And here’s how it works:

1. The accumulator is initialized to 0, courtesy of the 0 argument to inject.

2. The first time through the iteration—the code block—acc is 0, and n is set to 1 (the first item in the array). The result of the calculation inside the block is 0 + 1, or 1.

3. The second time through, acc is set to 1 (the block’s result from the previous time through), and n is set to 2 (the second element in the array). The block therefore evaluates to 3.

4. The third time through, acc and n are 3 (previous block result) and 3 (next value in the array). The block evaluates to 6.

5. The fourth time through, acc and n are 6 and 4. The block evaluates to 10. Because this is the last time through, the value from the block serves as the return value of the entire call to inject. Thus the entire call evaluates to 10, as shown by irb.

If you don’t supply an argument to inject, it uses the first element in the enumerable object as the initial value for acc. In this example, that would produce the same result, because the first iteration added 0 to 1 and set acc to 1 anyway.

Here’s a souped-up example, with some commentary printed out on each iteration so that you can see what’s happening:

>> [1,2,3,4].inject do |acc,n|
puts "adding #{acc} and #{n}...#{acc+n}"
acc + n
end
adding 1 and 2...3
adding 3 and 3...6
adding 6 and 4...10
=> 10

The puts statement is a pure side effect (and, on its own, evaluates to nil), so you still have to end the block with acc + n to make sure the block evaluates to the correct value.

We’ve saved perhaps the most important relative of each for last: Enumerable#map.

10.6. The map method

The map method (also callable as collect) is one of the most powerful and important enumerable or collection operations available in Ruby. You’ve met it before (in chapter 6), but there’s more to see, especially now that we’re inside the overall topic of enumerability.

Whatever enumerable it starts with, map always returns an array. The returned array is always the same size as the original enumerable. Its elements consist of the accumulated result of calling the code block on each element in the original object in turn.

For example, here’s how you map an array of names to their uppercase equivalents:

>> names = %w{ David Yukihiro Chad Amy }
=> ["David", "Yukihiro", "Chad", "Amy"]
>> names.map {|name| name.upcase }
=> ["DAVID", "YUKIHIRO", "CHAD", "AMY"]

The new array is the same size as the original array, and each of its elements corresponds to the element in the same position in the original array. But each element has been run through the block.

Using a symbol argument as a block

You can use a symbol such as :upcase with a & in front of it in method-argument position, and the result will be the same as if you used a code block that called the method with the same name as the symbol on each element. Thus you could rewrite the block in the last example, which callsupcase on each element, like this:

names.map(&:upcase)

You’ll see an in-depth explanation of this idiom when you read about callable objects in chapter 14.

It may be obvious, but it’s important to note that what matters about map is its return value.

10.6.1. The return value of map

The return value of map, and the usefulness of that return value, is what distinguishes map from each. The return value of each doesn’t matter. You’ll almost never see this:

result = array.each {|x| # code here... }

Why? Because each returns its receiver. You might as well do this:

result = array
array.each {|x| ... }

On the other hand, map returns a new object: a mapping of the original object to a new object. So you’ll often see—and do—things like this:

result = array.map {|x| # code here... }

The difference between map and each is a good reminder that each exists purely for the side effects from the execution of the block. The value returned by the block each time through is discarded. That’s why each returns its receiver; it doesn’t have anything else to return, because it hasn’t saved anything. map, on the other hand, maintains an accumulator array of the results from the block.

This doesn’t mean that map is better or more useful than each. It means they’re different in some important ways. But the semantics of map do mean that you have to be careful about the side effects that make each useful.

Be careful with block evaluation

Have a look at this code, and see if you can predict what the array result will contain when the code is executed:

array = [1,2,3,4,5]
result = array.map {|n| puts n * 100 }

The answer is that result will be this:

[nil, nil, nil, nil, nil]

Why? Because the return value of puts is always nil. That’s all map cares about. Yes, the five values represented by n * 100 will be printed to the screen, but that’s because the code in the block gets executed. The result of the operation—the mapping itself—is all nils because every call to this particular block will return nil.

There’s an in-place version of map for arrays and sets: map! (a.k.a. collect!).

10.6.2. In-place mapping with map!

Consider again the names array:

names = %w{ David Yukihiro Chad Amy }

To change the names array in place, run it through map!, the destructive version of map:

The map! method of Array is defined in Array, not in Enumerable. Because map operations generally return arrays, whatever the class of their receiver may be, doing an in-place mapping doesn’t make sense unless the object is already an array. It would be difficult, for example, to imagine what an in-place mapping of a range would consist of. But the Set#map! method does an in-place mapping of a set back to itself—which makes sense, given that a set is in many respects similar to an array.

We’re going to look next at a class that isn’t enumerable: String. Strings are a bit like ranges in that they do and don’t behave like collections. In the case of ranges, their collection-like properties are enough that the class warrants the mixing in of Enumerable. In the case of strings, Enumerableisn’t in play; but the semantics of strings, when you treat them as iterable sequences of characters or bytes, is similar enough to enumerable semantics that we’ll address it here.

10.7. Strings as quasi-enumerables

You can iterate through the raw bytes or the characters of a string using convenient iterator methods that treat the string as a collection of bytes, characters, code points, or lines. Each of these four ways of iterating through a string has an each–style method associated with it. To iterate through bytes, use each_byte:

str = "abcde"
str.each_byte {|b| p b }

The output of this code is

97
98
99
100
101

If you want each character, rather than its byte code, use each_char:

str = "abcde"
str.each_char {|c| p c }

This time, the output is

"a"
"b"
"c"
"d"
"e"

Iterating by code point provides character codes (integers) at the rate of exactly one per character:

>> str = "100\u20ac"
=> "100€"
>> str.each_codepoint {|cp| p cp }
49
48
48
8364

Compare this last example with what happens if you iterate over the same string byte by byte:

>> str.each_byte {|b| p b }
49
48
48
226
130
172

Due to the encoding, the number of bytes is greater than the number of code points (or the number of characters, which is equal to the number of code points).

Finally, if you want to go line by line, use each_line:

str = "This string\nhas three\nlines"
str.each_line {|l| puts "Next line: #{l}" }

The output of this example is

Next line: This string
Next line: has three
Next line: lines

The string is split at the end of each line—or, more strictly speaking, at every occurrence of the current value of the global variable $/. If you change this variable, you’re changing the delimiter for what Ruby considers the next line in a string:

str = "David!Alan!Black"
$/ = "!"
str.each_line {|l| puts "Next line: #{l}" }

Now Ruby’s concept of a “line” will be based on the ! character:

Next line: David!
Next line: Alan!
Next line: Black

Even though Ruby strings aren’t enumerable in the technical sense (String doesn’t include Enumerable), the language thus provides you with the necessary tools to traverse them as character, byte, code point, and/or line collections when you need to.

The four each-style methods described here operate by creating an enumerator. You’ll learn more about enumerators in section 10.9. The important lesson for the moment is that you’ve got another set of options if you simply want an array of all bytes, characters, code points, or lines: drop theeach_ and pluralize the method name. For example, here’s how you’d get an array of all the bytes in a string:

string = "Hello"
p string.bytes

The output is

[72, 101, 108, 108, 111]

You can do likewise with the methods chars, codepoints, and lines.

We’ve searched, transformed, filtered, and queried a variety of collection objects using an even bigger variety of methods. The one thing we haven’t done is sort collections. We’ll do that next.

10.8. Sorting enumerables

If you have a class, and you want to be able to arrange multiple instances of it in order, you need to do the following:

1. Define a comparison method for the class (<=>).

2. Place the multiple instances in a container, probably an array.

3. Sort the container.

The key point is that although the ability to sort is granted by Enumerable, your class doesn’t have to mix in Enumerable. Rather, you put your objects into a container object that does mix in Enumerable. That container object, as an enumerable, has two sorting methods, sort and sort_by, which you can use to sort the collection.

In the vast majority of cases, the container into which you place objects you want sorted will be an array. Sometimes it will be a hash, in which case the result will be an array (an array of two-element key/value pair arrays, sorted by a key or some other criterion).

Normally, you don’t have to create an array of items explicitly before you sort them. More often, you sort a collection that your program has already generated automatically. For instance, you may perform a select operation on a collection of objects and sort the ones you’ve selected. The manual stuffing of lists of objects into square brackets to create array examples in this section is therefore a bit contrived. But the goal is to focus directly on techniques for sorting, and that’s what we’ll do.

Here’s a simple sorting example involving an array of integers:

>> [3,2,5,4,1].sort
=> [1, 2, 3, 4, 5]

Doing this is easy when you have numbers or even strings (where a sort gives you alphabetical order). The array you put them in has a sorting mechanism, and the integers or strings have some knowledge of what it means to be in order.

But what if you want to sort, say, an array of Painting objects?

>> [pa1, pa2, pa3, pa4, pa5].sort

For paintings to have enough knowledge to participate in a sort operation, you have to define the spaceship operator (see section 7.6.2): Painting#<=>. Each painting will then know what it means to be greater or less than another painting, and that will enable the array to sort its contents. Remember, it’s the array you’re sorting, not each painting; but to sort the array, its elements have to have a sense of how they compare to each other. (You don’t have to mix in the Comparable module; you just need the spaceship method. We’ll come back to Comparable shortly.)

Let’s say you want paintings to sort in increasing order of price, and let’s assume paintings have a price attribute. Somewhere in your Painting class you would do this:

def <=>(other_painting)
self.price <=> other_painting.price
end

Now any array of paintings you sort will come out in price-sorted order:

price_sorted = [pa1, pa2, pa3, pa4, pa5].sort

Ruby applies the <=> test to these elements, two at a time, building up enough information to perform the complete sort.

A more fleshed-out account of the steps involved might go like this:

1. Teach your objects how to compare themselves with each other, using <=>.

2. Put those objects inside an enumerable object (probably an array).

3. Ask that object to sort itself. It does this by asking the objects to compare themselves to each other with <=>.

If you keep this division of labor in mind, you’ll understand how sorting operates and how it relates to Enumerable. But what about Comparable?

10.8.1. Where the Comparable module fits into enumerable sorting (or doesn’t)

When we first encountered the spaceship operator, it was in the context of including Comparable and letting that module build its various methods (>, <, and so on) on top of <=>. But in prepping objects to be sortable inside enumerable containers, all we’ve done is define <=>; we haven’t mixed in Comparable.

The whole picture fits together if you think of it as several separate, layered techniques:

· If you define <=> for a class, then instances of that class can be put inside an array or other enumerable for sorting.

· If you don’t define <=>, you can still sort objects if you put them inside an array and provide a code block telling the array how it should rank any two of the objects. (This is discussed next in section 10.8.2.)

· If you define <=> and also include Comparable in your class, then you get sortability inside an array and you can perform all the comparison operations between any two of your objects (>, <, and so on), as per the discussion of Comparable in chapter 9.

In other words, the <=> method is useful both for classes whose instances you wish to sort and for classes whose instances you wish to compare with each other in a more fine-grained way using the full complement of comparison operators.

Back we go to enumerable sorting—and, in particular, to the variant of sorting where you provide a code block instead of a <=> method to specify how objects should be compared and ordered.

10.8.2. Defining sort-order logic with a block

In cases where no <=> method is defined for these objects, you can supply a block on-the-fly to indicate how you want your objects sorted. If there’s a <=> method, you can override it for the current sort operation by providing a block.

Let’s say, for example, that you’ve defined Painting#<=> in such a way that it sorts by price, as earlier. But now you want to sort by year. You can force a year-based sort by using a block:

year_sort = [pa1, pa2, pa3, pa4, pa5].sort do |a,b|
a.year <=> b.year
end

The block takes two arguments, a and b. This enables Ruby to use the block as many times as needed to compare one painting with another. The code inside the block does a <=> comparison between the respective years of the two paintings. For this call to sort, the code in the block is used instead of the code in the <=> method of the Painting class.

You can use this code-block form of sort to handle cases where your objects don’t have a <=> method and therefore don’t know how to compare themselves to each other. It can also come in handy when the objects being sorted are of different classes and by default don’t know how to compare themselves to each other. Integers and strings, for example, can’t be compared directly: an expression like "2" <=> 4 causes a fatal error. But if you do a conversion first, you can pull it off:

>> ["2",1,5,"3",4,"6"].sort {|a,b| a.to_i <=> b.to_i }
=> [1, "2", "3", 4, 5, "6"]

The elements in the sorted output array are the same as those in the input array: a mixture of strings and integers. But they’re ordered as they would be if they were all integers. Inside the code block, both strings and integers are normalized to integer form with to_i. As far as the sort engine is concerned, it’s performing a sort based on a series of integer comparisons. It then applies the order it comes up with to the original array.

sort with a block can thus help you where the existing comparison methods won’t get the job done. And there’s an even more concise way to sort a collection with a code block: the sort_by method.

10.8.3. Concise sorting with sort_by

Like sort, sort_by is an instance method of Enumerable. The main difference is that sort_by always takes a block, and it only requires that you show it how to treat one item in the collection. sort_by figures out that you want to do the same thing to both items every time it compares a pair of objects.

The previous array-sorting example can be written like this, using sort_by:

All we have to do in the block is show (once) what action needs to be performed to prep each object for the sort operation. We don’t have to call to_i on two objects; nor do we need to use the <=> method explicitly.

In addition to the Enumerable module, and still in the realm of enumerability, Ruby provides a class called Enumerator. Enumerators add a whole dimension of collection manipulation power to Ruby. We’ll look at them in depth now.

10.9. Enumerators and the next dimension of enumerability

Enumerators are closely related to iterators, but they aren’t the same thing. An iterator is a method that yields one or more values to a code block. An enumerator is an object, not a method.

At heart, an enumerator is a simple enumerable object. It has an each method, and it employs the Enumerable module to define all the usual methods—select, inject, map, and friends—directly on top of its each.

The twist in the plot, though, is how the enumerator’s each method is engineered.

An enumerator isn’t a container object. It has no “natural” basis for an each operation, the way an array does (start at element 0; yield it; go to element 1; yield it; and so on). The each iteration logic of every enumerator has to be explicitly specified. After you’ve told it how to do each, the enumerator takes over from there and figures out how to do map, find, take, drop, and all the rest.

An enumerator is like a brain in a science-fiction movie, sitting on a table with no connection to a body but still able to think. It just needs an “each” algorithm, so that it can set into motion the things it already knows how to do. And this it can learn in one of two ways: either you callEnumerator.new with a code block, so that the code block contains the each logic you want the enumerator to follow; or you create an enumerator based on an existing enumerable object (an array, a hash, and so forth) in such a way that the enumerator’s each method draws its elements, for iteration, from a specific method of that enumerable object.

We’ll start by looking at the code block approach to creating enumerators. But most of the rest of the discussion of enumerators will focus on the second approach, where you “hook up” an enumerator to an iterator on another object. (If you find the block-based technique difficult to follow, no harm will come if you skim section 10.9.1 for now and focus on section 10.9.2.) Which techniques you use and how you combine them will ultimately depend on your exact needs in a given situation.

10.9.1. Creating enumerators with a code block

Here’s a simple example of the instantiation of an enumerator with a code block:

e = Enumerator.new do |y|
y << 1
y << 2
y << 3
end

Now, first things first: what is y?

y is a yielder, an instance of Enumerator::Yielder, automatically passed to your block. Yielders encapsulate the yielding scenario that you want your enumerator to follow. In this example, what we’re saying is when you (the enumerator) get an each call, please take that to mean that you should yield 1, then 2, then 3. The << method (in infix operator position, as usual) serves to instruct the yielder as to what it should yield. (You can also write y.yield(1) and so forth, although the similarity of the yield method to the yield keyword might be more confusing than it’s worth.) Upon being asked to iterate, the enumerator consults the yielder and makes the next move—the next yield—based on the instructions that the yielder has stored.

What happens when you use e, the enumerator? Here’s an irb session where it’s put through its paces (given that the code in the example has already been executed):

The enumerator e is an enumerating machine. It doesn’t contain objects; it has code associated with it—the original code block—that tells it what to do when it’s addressed in terms that it recognizes as coming from the Enumerable module.

The enumerator iterates once for every time that << (or the yield method) is called on the yielder. If you put calls to << inside a loop or other iterator inside the code block, you can introduce just about any iteration logic you want. Here’s a rewrite of the previous example, using an iterator inside the block:

e = Enumerator.new do |y|
(1..3).each {|i| y << i }
end

The behavior of e will be the same, given this definition, as it is in the previous examples. We’ve arranged for << to be called three times; that means e.each will do three iterations. Again, the behavior of the enumerator can be traced ultimately to the calls to << inside the code block with which it was initialized.

Note in particular that you don’t yield from the block; that is, you don’t do this:

Rather, you populate your yielder (y, in the first examples) with specifications for how you want the iteration to proceed at such time as you call an iterative method on the enumerator.

Every time you call an iterator method on the enumerator, the code block gets executed once. Any variables you initialize in the block are initialized once at the start of each such method call. You can trace the execution sequence by adding some verbosity and calling multiple methods:

e = Enumerator.new do |y|
puts "Starting up the block!"
(1..3).each {|i| y << i }
puts "Exiting the block!"
end
p e.to_a
p e.select {|x| x > 2 }

The output from this code is

You can see that the block is executed once for each iterator called on e.

It’s also possible to involve other objects in the code block for an enumerator. Here’s a somewhat abstract example in which the enumerator performs a calculation involving the elements of an array while removing those elements from the array permanently:

a = [1,2,3,4,5]
e = Enumerator.new do |y|
total = 0
until a.empty?
total += a.pop
y << total
end
end

Now let’s look at the fate of poor a, in irb:

>> e.take(2)
=> [5, 9]
>> a
=> [1, 2, 3]
>> e.to_a
=> [3, 5, 6]
>> a
=> []

The take operation produces a result array of two elements (the value of total for two successive iterations) and leaves a with three elements. Calling to_a on e, at this point, causes the original code block to be executed again, because the to_a call isn’t part of the same iteration as the call totake. Therefore, total starts again at 0, and the until loop is executed with the result that three values are yielded and a is left empty.

It’s not fair to ambush a separate object by removing its elements as a side effect of calling an enumerator. But the example shows you the mechanism—and it also provides a reasonable segue into the other half of the topic of creating enumerators: creating enumerators whose each methods are tied to specific methods on existing enumerable objects.

10.9.2. Attaching enumerators to other objects

The other way to endow an enumerator with each logic is to hook the enumerator up to another object—specifically, to an iterator (often each, but potentially any method that yields one or more values) on another object. This gives the enumerator a basis for its own iteration: when it needs to yield something, it gets the necessary value by triggering the next yield from the object to which it is attached, via the designated method. The enumerator thus acts as part proxy, part parasite, defining its own each in terms of another object’s iteration.

You create an enumerator with this approach by calling enum_for (a.k.a. to_enum) on the object from which you want the enumerator to draw its iterations. You provide as the first argument the name of the method onto which the enumerator will attach its each method. This argument defaults to :each, although it’s common to attach the enumerator to a different method, as in this example:

names = %w{ David Black Yukihiro Matsumoto }
e = names.enum_for(:select)

Specifying :select as the argument means that we want to bind this enumerator to the select method of the names array. That means the enumerator’s each will serve as a kind of front end to array’s select:

You can also provide further arguments to enum_for. Any such arguments are passed through to the method to which the enumerator is being attached. For example, here’s how to create an enumerator for inject so that when inject is called on to feed values to the enumerator’s each, it’s called with a starting value of "Names: ":

>> e = names.enum_for(:inject, "Names: ")
=> #<Enumerator: ["David", "Black", "Yukihiro", "Matsumoto"]:inject("Names: ")>
>> e.each {|string, name| string << "#{name}..." }
=> "Names: David...Black...Yukihiro...Matsumoto..."

But be careful! That starting string "Names: " has had some names added to it, but it’s still alive inside the enumerator. That means if you run the same inject operation again, it adds to the same string (the line in the output in the following code is broken across two lines to make it fit):

>> e.each {|string, name| string << "#{name}..." }
=> "Names: David...Black...Yukihiro...Matsumoto...
David...Black...Yukihiro...Matsumoto..."

When you create the enumerator, the arguments you give it for the purpose of supplying its proxied method with arguments are the arguments—the objects—it will use permanently. So watch for side effects. (In this particular case, you can avoid the side effect by adding strings—string + "#{name}..."—instead of appending to the string with <<, because the addition operation creates a new string object. Still, the cautionary tale is generally useful.)

Note

You can call Enumerator.new(obj, method_name, arg1, arg2...) as an equivalent to obj.enum_for(method_name, arg1, arg2...). But using this form of Enumerator.new is discouraged. Use enum_for for the method-attachment scenario and Enumerator.new for the block-based scenario described in section 10.9.1.

Now you know how to create enumerators of both kinds: the kind whose knowledge of how to iterate is conveyed to it in a code block, and the kind that gets that knowledge from another object. Enumerators are also created implicitly when you make blockless calls to certain iterator methods.

10.9.3. Implicit creation of enumerators by blockless iterator calls

By definition, an iterator is a method that yields one or more values to a block. But what if there’s no block?

The answer is that most built-in iterators return an enumerator when they’re called without a block. Here’s an example from the String class: the each_byte method (see section 10.7). First, here’s a classic iterator usage of the method, without an enumerator but with a block:

>> str = "Hello"
=> "Hello"
>> str.each_byte {|b| puts b }
72
101
108
108
111
=> "Hello"

each_byte iterates over the bytes in the string and returns its receiver (the string). But if you call each_byte with no block, you get an enumerator:

>> str.each_byte
=> #<Enumerator: "Hello":each_byte>

The enumerator you get is equivalent to what you would get if you did this:

>> str.enum_for(:each_byte)

You’ll find that lots of methods from Enumerable return enumerators when you call them without a block (including each, map, select, inject, and others). The main use case for these automatically returned enumerators is chaining: calling another method immediately on the enumerator. We’ll look at chaining as part of the coverage of enumerator semantics in the next section.

10.10. Enumerator semantics and uses

Now that you know how enumerators are wired and how to create them, we’re going to look at how they’re used—and why they’re used.

Perhaps the hardest thing about enumerators, because it’s the most difficult to interpret visually, is how things play out when you call the each method. We’ll start by looking at that; then, we’ll examine the practicalities of enumerators, particularly the ways in which an enumerator can protect an object from change and how you can use an enumerator to do fine-grained, controlled iterations. We’ll then look at how enumerators fit into method chains in general and we’ll see a couple of important specific cases.

10.10.1. How to use an enumerator’s each method

An enumerator’s each method is hooked up to a method on another object, possibly a method other than each. If you use it directly, it behaves like that other method, including with respect to its return value.

This can produce some odd-looking results where calls to each return filtered, sorted, or mapped collections:

There’s nothing mysterious here. The enumerator isn’t the same object as the array; it has its own ideas about what each means. Still, the overall effect of connecting an enumerator to the map method of an array is that you get an each operation with an array mapping as its return value. The usual each iteration of an array, as you’ve seen, exists principally for its side effects and returns its receiver (the array). But an enumerator’s each serves as a kind of conduit to the method from which it pulls its values and behaves the same way in the matter of return value.

Another characteristic of enumerators that you should be aware of is the fact that they perform a kind of un-overriding of methods in Enumerable.

The un-overriding phenomenon

If a class defines each and includes Enumerable, its instances automatically get map, select, inject, and all the rest of Enumerable’s methods. All those methods are defined in terms of each.

But sometimes a given class has already overridden Enumerable’s version of a method with its own. A good example is Hash#select. The standard, out-of-the-box select method from Enumerable always returns an array, whatever the class of the object using it might be. A select operation on a hash, on the other hand, returns a hash:

>> h = { "cat" => "feline", "dog" => "canine", "cow" => "bovine" }
=> {"cat"=>"feline", "dog"=>"canine", "cow"=>"bovine"}
>> h.select {|key,value| key =~ /c/ }
=> {"cat"=>"feline", "cow"=>"bovine"}

So far, so good (and nothing new). And if we hook up an enumerator to the select method, it gives us an each method that works like that method:

>> e = h.enum_for(:select)
=> #<Enumerator: {"cat"=>"feline", "dog"=>"canine", "cow"=>"bovine"}:select>
>> e.each {|key,value| key =~ /c/ }
=> {"cat"=>"feline", "cow"=>"bovine"}

But what about an enumerator hooked up not to the hash’s select method but to the hash’s each method? We can get one by using to_enum and letting the target method default to each:

>> e = h.to_enum
=> #<Enumerator: {"cat"=>"feline", "dog"=>"canine", "cow"=>"bovine"}:each>

Hash#each, called with a block, returns the hash. The same is true of the enumerator’s each—because it’s just a front end to the hash’s each. The blocks in these examples are empty because we’re only concerned with the return values:

>> h.each { }
=> {"cat"=>"feline", "dog"=>"canine", "cow"=>"bovine"}
>> e.each { }
=> {"cat"=>"feline", "dog"=>"canine", "cow"=>"bovine"}

So far, it looks like the enumerator’s each is a stand-in for the hash’s each. But what happens if we use this each to perform a select operation?

>> e.select {|key,value| key =~ /c/ }
=> [["cat", "feline"], ["cow", "bovine"]]

The answer, as you can see, is that we get back an array, not a hash.

Why? If e.each is pegged to h.each, how does the return value of e.select get unpegged from the return value of h.select?

The key is that the call to select in the last example is a call to the select method of the enumerator, not the hash. And the select method of the enumerator is built directly on the enumerator’s each method. In fact, the enumerator’s select method is Enumerable#select, which always returns an array. The fact that Hash#select doesn’t return an array is of no interest to the enumerator.

In this sense, the enumerator is adding enumerability to the hash, even though the hash is already enumerable. It’s also un-overriding Enumerable#select; the select provided by the enumerator is Enumerable#select, even if the hash’s select wasn’t. (Technically it’s not an un-override, but it does produce the sensation that the enumerator is occluding the select logic of the original hash.)

The lesson is that it’s important to remember that an enumerator is a different object from the collection from which it siphons its iterated objects. Although this difference between objects can give rise to some possibly odd results, like select being rerouted through the Enumerable module, it’s definitely beneficial in at least one important way: accessing a collection through an enumerator, rather than through the collection itself, protects the collection object from change.

10.10.2. Protecting objects with enumerators

Consider a method that expects, say, an array as its argument. (Yes, it’s a bit un-Ruby-like to focus on the object’s class, but you’ll see that that isn’t the main point here.)

def give_me_an_array(array)

If you pass an array object to this method, the method can alter that object:

array << "new element"

If you want to protect the original array from change, you can duplicate it and pass along the duplicate—or you can pass along an enumerator instead:

give_me_an_array(array.to_enum)

The enumerator will happily allow for iterations through the array, but it won’t absorb changes. (It will respond with a fatal error if you try calling << on it.) In other words, an enumerator can serve as a kind of gateway to a collection object such that it allows iteration and examination of elements but disallows destructive operations.

The deck of cards code from section 10.5.4 provides a nice opportunity for some object protection. In that code, the Deck class has a reader attribute cards. When a deck is created, its @cards instance variable is initialized to an array containing all the cards. There’s a vulnerability here: What if someone gets hold of the @cards array through the cards reader attribute and alters it?

deck = PlayingCard::Deck.new
deck.cards << "JOKER!!"

Ideally, we’d like to be able to read from the cards array but not alter it. (We could freeze it with the freeze method, which prevents further changes to objects, but we’ll need to change the deck inside the Deck class when it’s dealt from.) Enumerators provide a solution. Instead of a reader attribute, let’s make the cards method return an enumerator:

class PlayingCard
SUITS = %w{ clubs diamonds hearts spades }
RANKS = %w{ 2 3 4 5 6 7 8 9 10 J Q K A }
class Deck
def cards
@cards.to_enum
end
def initialize(n=1)
@cards = []
SUITS.cycle(n) do |s|
RANKS.cycle(1) do |r|
@cards << "#{r} of #{s}"
end
end
end
end
end

It’s still possible to pry into the @cards array and mess it up if you’re determined. But the enumerator provides a significant amount of protection:

Of course, if you want the calling code to be able to address the cards as an array, returning an enumerator may be counterproductive. (And at least one other technique protects objects under circumstances like this: return @cards.dup.) But if it’s a good fit, the protective qualities of an enumerator can be convenient.

Because enumerators are objects, they have state. Furthermore, they use their state to track their own progress so you can stop and start their iterations. We’ll look now at the techniques for controlling enumerators in this way.

10.10.3. Fine-grained iteration with enumerators

Enumerators maintain state: they keep track of where they are in their enumeration. Several methods make direct use of this information. Consider this example:

names = %w{ David Yukihiro }
e = names.to_enum
puts e.next
puts e.next
e.rewind
puts e.next

The output from these commands is

David
Yukihiro
David

The enumerator allows you to move in slow motion, so to speak, through the enumeration of the array, stopping and restarting at will. In this respect, it’s like one of those editing tables where a film editor cranks the film manually. Unlike a projector, which you switch on and let it do its thing, the editing table allows you to influence the progress of the film as it proceeds.

This point also sheds light on the difference between an enumerator and an iterator. An enumerator is an object, and can therefore maintain state. It remembers where it is in the enumeration. An iterator is a method. When you call it, the call is atomic; the entire call happens, and then it’s over. Thanks to code blocks, there is of course a certain useful complexity to Ruby method calls: the method can call back to the block, and decisions can be made that affect the outcome. But it’s still a method. An iterator doesn’t have state. An enumerator is an enumerable object.

Interestingly, you can use an enumerator on a non-enumerable object. All you need is for your object to have a method that yields something so the enumerator can adopt that method as the basis for its own each method. As a result, the non-enumerable object becomes, in effect, enumerable.

10.10.4. Adding enumerability with an enumerator

An enumerator can add enumerability to objects that don’t have it. It’s a matter of wiring: if you hook up an enumerator’s each method to any iterator, then you can use the enumerator to perform enumerable operations on the object that owns the iterator, whether that object considers itself enumerable or not.

When you hook up an enumerator to the String#bytes method, you’re effectively adding enumerability to an object (a string) that doesn’t have it, in the sense that String doesn’t mix in Enumerable. You can achieve much the same effect with classes of your own. Consider the following class, which doesn’t mix in Enumerable but does have one iterator method:

module Music
class Scale
NOTES = %w{ c c# d d# e f f# g a a# b }

def play
NOTES.each {|note| yield note }
end
end
end

Given this class, it’s possible to iterate through the notes of a scale

scale = Music::Scale.new
scale.play {|note| puts "Next note is #{note}" }

with the result

Next note is c
Next note is c#
Next note is d

and so forth. But the scale isn’t technically an enumerable. The standard methods from Enumerable won’t work because the class Music::Scale doesn’t mix in Enumerable and doesn’t define each:

scale.map {|note| note.upcase }

The result is

NoMethodError: unknown method `map' for #<Music::Scale:0x3b0aec>

Now, in practice, if you wanted scales to be fully enumerable, you’d almost certainly mix in Enumerable and change the name of play to each. But you can also make a scale enumerable by hooking it up to an enumerator.

Here’s how to create an enumerator for the scale object, tied in to the play method:

The enumerator, enum, has an each method; that method performs the same iteration that the scale’s play method performs. Furthermore, unlike the scale, the enumerator is an enumerable object; it has map, select, inject, and all the other standard methods from Enumerable. If you use the enumerator, you get enumerable operations on a fundamentally non-enumerable object:

p enum.map {|note| note.upcase }
p enum.select {|note| note.include?('f') }

The first line’s output is

["C", "C#", "D", "D#", "E", "F", "F#", "G", "A", "A#", "B"]

and the second line’s output is

["f", "f#"]

An enumerator, then, attaches itself to a particular method on a particular object and uses that method as the foundation method—the each—for the entire enumerable toolset.

Attaching an enumerator to a non-enumerable object like the scale object is a good exercise because it illustrates the difference between the original object and the enumerator so sharply. But in the vast majority of cases, the objects for which enumerators are created are themselves enumerables: arrays, hashes, and so forth. Most of the examples in what follows will involve enumerable objects (the exception being strings). In addition to taking us into the realm of the most common practices, this will allow us to look more broadly at the possible advantages of using enumerators.

Throughout, keep in mind the lesson of the Music::Scale object and its enumerator: an enumerator is an enumerable object whose each method operates as a kind of siphon, pulling values from an iterator defined on a different object.

We’ll conclude our examination of enumerators with a look at techniques that involve chaining enumerators and method calls.

10.11. Enumerator method chaining

Method chaining is a common technique in Ruby programming. It’s common in part because it’s so easy. Want to print out a comma-separated list of uppercased names beginning with A through N? Just string a few methods together:

The left-to-right, conveyor-belt style of processing data is powerful and, for the most part, straightforward. But it comes at a price: the creation of intermediate objects. Method chaining usually creates a new object for every link in the chain. In the previous code, assuming that names is an array of strings, Ruby ends up creating two more arrays (one as the output of select, one from map) and a string (from join).

Enumerators don’t solve all the problems of method chaining. But they do mitigate the problem of creating intermediate objects in some cases. And enumerator-based chaining has some semantics unto itself that it’s good to get a handle on.

10.11.1. Economizing on intermediate objects

Remember that many methods from the Enumerable module return an enumerator if you call them without a block. In most such cases, there’s no reason to chain the enumerator directly to another method. names.each.inject, for example, might as well be names.inject. Similarly,names.map.select doesn’t buy you anything over names.select. The map enumerator doesn’t have any knowledge of what function to map to; therefore, it can’t do much other than pass the original array of values down the chain.

But consider names.each_slice(2). The enumerator generated by this expression does carry some useful information; it knows that it’s expected to produce two-element-long slices of the names array. If you place it inside a method chain, it has an effect:

>> names = %w{ David Black Yukihiro Matsumoto }
=> ["David", "Black", "Yukihiro", "Matsumoto"]
>> names.each_slice(2).map do |first, last|
"First name: #{first}, last name: #{last}\n"
end
=> ["First name: David, last name: Black\n",
"First name: Yukihiro, last name: Matsumoto\n"]

The code block attached to the map operation gets handed items from the names array two at a time, because of the each_slice(2) enumerator. The enumerator can proceed in “lazy” fashion: rather than create an entire array of two-element slices in memory, it can create the slices as they’re needed by the map operation.

Enumerator literacy

One consequence of the way enumerators work, and of their being returned automatically from blockless iterator calls, is that it takes a little practice to read enumerator code correctly. Consider this snippet, which returns an array of integers:

string = "An arbitrary string"
string.each_byte.map {|b| b + 1 }

Probably not useful business logic...but the point is that it looks much like string.each_byte is returning an array. The presence of map as the next operation, although not conclusive evidence of an array, certainly evokes the presence of a collection on the left.

Let’s put it another way. Judging by its appearance, you might expect that if you peel off the whole map call, you’ll be left with a collection.

In fact, string.each_byte returns an enumerator. The key is that an enumerator is a collection. It’s an enumerable object as much as an array or a hash is. It just may take a little getting used to.

Enumerable methods that take arguments and return enumerators, like each_slice, are candidates for this kind of compression or optimizationEven if an enumerable method doesn’t return an enumerator, you can create one for it, incorporating the argument so that it’s remembered by the enumerator. You’ve seen an example of this technique already, approached from a slightly different angle, in section 10.9.2:

e = names.enum_for(:inject, "Names: ")

The enumerator remembers not only that it’s attached to the inject method of names but also that it represents a call to inject with an argument of "Names".

In addition to the general practice of including enumerators in method chains, the specialized method with_index—one of the few that the Enumerator class implements separately from those in Enumerable—adds considerable value to enumerations.

10.11.2. Indexing enumerables with with_index

In the days when Rubyists used the each_with_index method, a number of us lobbied for a corresponding map_with_index method. We never got it—but we ended up with something even better. Enumerators have a with_index method that adds numerical indexing, as a second block parameter, to any enumeration. Here’s how you would use with_index to do the letter/number mapping:

Note that it’s map.with_index (two methods, chained), not map_with_index (a composite method name). And with_index can be chained to any enumerator. Remember the musical scale from section 10.10.4? Let’s say we enumerator-ize the play method:

def play
NOTES.to_enum
end

The original example of walking through the notes will now work without the creation of an intermediate enumerator.

scale.play {|note| puts "Next note: #{note}" }

And now this will work too:

The output will be a numbered list of notes:

Note 1: c
Note 2: c#
Note 3: d
# etc.

Thus the with_index method generalizes what would otherwise be a restricted functionality.

We’ll look at one more enumerator chaining example, which nicely pulls together several enumerator and iteration techniques and also introduces a couple of new methods you may find handy.

10.11.3. Exclusive-or operations on strings with enumerators

Running an exclusive-or (or XOR) operation on a string means XOR-ing each of its bytes with some value. XOR-ing a byte is a bitwise operation: each byte is represented by an integer, and the result of the XOR operation is an exclusive-or-ing of that integer with another number.

If your string is "a", for example, it contains one byte with the value 97. The binary representation of 97 is 1100001. Let’s say we want to XOR it with the character #, which has an ASCII value of 35, or 100011 in binary. Looking at it purely numerically, and not in terms of strings, we’re doing 97 ^ 35, or 1100001 ^ 100011 in binary terms. An XOR produces a result that, in binary representation (that is, in terms of its bits) contains a 1 where either of the source numbers, but not both, contained a 1, and a 0 where both of the source numbers contains the same value, whether 0 or 1. In the case of our two numbers, the XOR operation produces 1000010 or 66.

A distinguishing property of bitwise XOR operations is that if you perform the same operation twice, you get back the original value. In other words, (a ^ b) ^ b == a. Thus if we xor 66 with 35, we get 97. This behavior makes xor-ing strings a useful obfuscation technique, especially if you xor a long string byte for byte against a second string. Say your string is "This is a string." If you xor it character for character against, say, #%.3u, repeating the xor string as necessary to reach the length of the original string, you get the rather daunting result wMG@UJV\x0ERUPQ\\Z\eD\v. If you xor that monstrosity against #%.3u again, you get back "This is a string."

Now let’s write a method that will do this. We’ll add it to the String class—not necessarily the best way to go about changing the functionality of core Ruby objects (as you’ll see in chapter 13), but expedient for purposes of illustration. The following listing shows the instance methodString#^.

Listing 10.2. An exclusive-or method for strings

The method takes one argument: the string that will be used as the basis of the xor operation (the key) . We have to deal with cases where the key is shorter than the original string by looping through the key as many times as necessary to provide enough characters for the whole operation. That’s where enumerators come in.

The variable kenum is bound to an enumerator based on chaining two methods off the key string: each_byte, which itself returns an enumerator traversing the string byte by byte, and cycle, which iterates over and over again through a collection, resuming at the beginning when it reaches the end . The enumerator kenum embodies both of these operations: each iteration through it provides another byte from the string; and when it’s finished providing all the bytes, it goes back to the beginning of the string and iterates over the bytes again. That’s exactly the behavior we want, to make sure we’ve got enough bytes to match whatever string we’re xor-ing, even if it’s a string that’s longer than the key. In effect, we’ve made the key string infinitely long.

Now comes the actual xor operation . Here, we use each_byte to iterate over the bytes of the string that’s being xor’ed. The enumerator returned by each_byte gets chained to map. Inside the map block, each byte of the original string is xor’ed with the “next” byte from the enumerator that’s cycling infinitely through the bytes of the key string. The whole map operation, then, produces an array of xor’ed bytes. All that remains is to put those bytes back into a result string.

Enter the pack method. This method turns an array into a string, interpreting each element of the array in a manner specified by the argument. In this case, the argument is "C*", which means treat each element of the array as an unsigned integer representing a single character (that’s the “C”),and process all of them (that’s the “*”). Packing the array into a string of characters is thus the equivalent of transforming each array element into a character and then doing a join on the whole array.

Now we can xor strings. Here’s what the process looks like:

>> str = "Nice little string."
=> "Nice little string."
>> key = "secret!"
=> "secret!"
>> x = str ^ key
=> "=\f\x00\x17E\x18H\a\x11\x0F\x17E\aU\x01\f\r\x15K"
>> orig = x ^ key
=> "Nice little string."

As you can see, XOR-ing twice with the same key gets you back to the original string. And it’s all thanks to a two-line method that uses three enumerators!

Forcing an encoding

The String#^ as implemented in the previous snippet is vulnerable to encoding issues: if you xor, say, a UTF-8 string against an ASCII string twice, you’ll get back a string encoded in ASCII-8BIT. To guard against this, add a call to force_encoding:

each_byte.map {|byte| byte ^ kenum.next }.pack("C*").
force_encoding(self.encoding)

This will ensure that the byte sequence generated by the mapping gets encoded in the original string’s encoding.

Enumerators add a completely new tool to the already rich Ruby toolkit for collection management and iteration. They’re conceptually and technically different from iterators, but if you try them out on their own terms, you’re sure to find uses for them alongside the other collection-related techniques you’ve seen.

We’ll conclude our look at enumerators with a variant called a lazy enumerator.

10.12. Lazy enumerators

Lazy enumerators make it easy to enumerate selectively over infinitely large collections. To illustrate what this means, let’s start with a case where an operation tries to enumerate over an infinitely large collection and gets stuck. What if you want to know the first 10 multiples of 3? To use an infinite collection we’ll create a range that goes from 1 to the special value Float::INFINITY. Using such a range, a first approach to the task at hand might be

(1..Float::INFINITY).select {|n| n % 3 == 0 }.first(10)

But this line of code runs forever. The select operation never finishes, so the chained-on first command never gets executed.

You can get a finite result from an infinite collection by using a lazy enumerator. Calling the lazy method directly on a range object will produce a lazy enumerator over that range:

>> (1..Float::INFINITY).lazy
=> #<Enumerator::Lazy: 1..Infinity>

You can then wire this lazy enumerator up to select, creating a cascade of lazy enumerators:

>>(1..Float::INFINITY).lazy.select {|n| n % 3 == 0 }
=> #<Enumerator::Lazy: #<Enumerator::Lazy: 1..Infinity>:select>

Since we’re now lazily enumerating, it’s possible to grab result sets from our operations without waiting for the completion of infinite tasks. Specifically, we can now ask for the first 10 results from the select test on the infinite list, and the infinite list is happy to enumerate only as much as is necessary to produce those 10 results:

>> (1..Float::INFINITY).lazy.select {|n| n % 3 == 0 }.first(10)
=> [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

As a variation on the same theme, you can create the lazy select enumerator and then use take on it. This allows you to choose how many multiples of 3 you want to see without hard-coding the number. Note that you have to call force on the result of take; otherwise you’ll end up with yet another lazy enumerator, rather than an actual result set:

>> my_enum = (1..Float::INFINITY).lazy.select {|n| n % 3 == 0 }
=> #<Enumerator::Lazy: #<Enumerator::Lazy: 1..Infinity>:select>
>> my_enum.take(5).force
=> [3, 6, 9, 12, 15]
>> my_enum.take(10).force
=> [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

Lazy enumerators are a somewhat specialized tool, and you probably won’t need them too often. But they’re very handy if you have an infinite collection and want to deal only with a finite result set from operations on that collection.

10.12.1. FizzBuzz with a lazy enumerator

The FizzBuzz problem, in its classic form, involves printing out the integers from 1 to 100 ... except you apply the following rules:

· If the number is divisible by 15, print "FizzBuzz".

· Else if the number is divisible by 3, print "Fizz".

· Else if the number is divisible by 5, print "Buzz".

· Else print the number.

You can use a lazy enumerator to write a version of FizzBuzz that can handle any range of numbers. Here’s what it might look like:

def fb_calc(i)
case 0
when i % 15
"FizzBuzz"
when i % 3
"Fizz"
when i % 5
"Buzz"
else
i.to_s
end
end

def fb(n)
(1..Float::INFINITY).lazy.map {|i| fb_calc(i) }.first(n)
end

Now you can examine, say, the FizzBuzz output for the first 15 positive integers like this:

p fb(15)

The output will be

["1", "2", "Fizz", "4", "Buzz", "Fizz", "7", "8", "Fizz", "Buzz", "11",
"Fizz", "13", "14", "FizzBuzz"]

Without creating a lazy enumerator on the range, the map operation would go on forever. Instead, the lazy enumeration ensures that the whole process will stop once we’ve got what we want.

10.13. Summary

In this chapter you’ve seen

· The Enumerable module and its instance methods

· Using Enumerable in your own classes

· Enumerator basics

· Creating enumerators

· Iterating over strings

· Lazy enumerators

This chapter focused on the Enumerable module and the Enumerator class, two entities with close ties. First, we explored the instance methods of Enumerable, which are defined in terms of an each method and which are available to your objects as long as those objects respond to each and your class mixes in Enumerable. Second, we looked at enumerators, objects that encapsulate the iteration process of another object, binding themselves—specifically, their each methods—to a designated method on another object and using that parasitic each-binding to deliver the full range of enumerable functionality.

Enumerators can be tricky. They build entirely on Enumerable; and in cases where an enumerator gets hooked up to an object that has overridden some of Enumerable’s methods, it’s important to remember that the enumerator will have its own ideas of what those methods are. It’s not a general-purpose proxy to another object; it siphons off values from one method on the other object.

One way or another—be it through the use of enumerators or the use of the more classic Ruby style of iteration and collection management—you’ll almost certainly use the enumeration-related facilities of the language virtually every time you write a Ruby program. It’s worth getting to knowEnumerable intimately; it’s as powerful a unit of functionality as there is anywhere in Ruby.

We’ll turn next to the subject of regular expressions and pattern matching. As you’ll see, there’s some payoff to looking at both strings and collection objects prior to studying regular expressions: a number of pattern-matching methods performed on strings return their results to you in collection form and therefore lend themselves to iteration. Looking at regular expressions will help you develop a full-featured toolkit for processing strings and bodies of text.