Symbols and Ranges - THE RUBY WAY, Third Edition (2015)

THE RUBY WAY, Third Edition (2015)

Chapter 6. Symbols and Ranges

I hear, and I forget. I see, and I remember. I do, and I understand.

—Confucius

Two fairly Rubyesque objects are symbols and ranges. They are covered together in this chapter not because they are related but because there is not so much to say about them.

The Ruby concept of a symbol is sometimes difficult to grasp. If you are familiar with the concept of “atoms” in LISP, you can think of Ruby symbols as being similar. But rather than give a lengthy and abstruse definition, I will concentrate on what can be done with a symbol and how it can be used. After all, the question “What is a number?” could have a complex answer, but we all understand how to use and manipulate numbers.

Ranges are simpler. A range is simply a representation of a group or collection delimited by its endpoints. Similar constructs exist in Pascal, PHP, and even SQL.

Let’s look at symbols and ranges in greater detail, and see how we can use them in everyday Ruby code.

6.1 Symbols

A symbol in Ruby is an instance of the class Symbol. The syntax is simple in the typical case: a colon followed by an identifier.

A symbol is like a string in that it corresponds to a sequence of characters. It is unlike a string in that each symbol has only one instance (just as a Fixnum works). Therefore, there is a memory or performance issue to be aware of. For example, in the following code, the string "foo" is stored as three separate objects in memory, but the symbol :foo is stored as a single object (referenced more than once):

array = ["foo", "foo", "foo", :foo, :foo, :foo]

Some people are confused by the leading colon on a symbol name. There is no need for confusion; it’s a simple matter of syntax. Strings, arrays, and hashes have both beginning and ending delimiters; a symbol has only a beginning delimiter. Think of it as a unary delimiter rather than abinary one. You may consider the syntax strange at first, but there is no mystery. Internally, Ruby represents each symbol with a number, instead of the symbol’s characters. The number can be retrieved with to_i, but there is little need for it.

According to Jim Weirich, a symbol is “an object that has a name.” Austin Ziegler prefers to say “an object that is a name.” In any case, there is a one-to-one correspondence between symbols and names. What kinds of things do we need to apply names to? Such things as variables, methods, and arbitrary constants.

One common use of symbols is to represent the name of a variable or method. For example, we know that if we want to add a read/write attribute to a class, we can do it this way:

class SomeClass
attr_accessor :whatever
end

This is equivalent to the following:

class SomeClass
def whatever
@whatever
end
def whatever=(val)
@whatever = val
end
end

In other words, the symbol :whatever tells the attr_accessor method that the “getter” and “setter” (as well as the instance variable) will all be given names corresponding to that symbol.

You might well ask why we couldn’t use a string instead. As it happens, we could. Many or most core methods that expect symbols are content to get strings instead:

attr_reader :alpha
attr_reader "beta" # This is also legal

In fact, a symbol is “like” a string in that it corresponds to a sequence of characters. This leads some people to say that “a symbol is just an immutable string.” However, the Symbol class does not inherit from String, and the typical operations we might apply to a string are not necessarily applicable to symbols.

Another misunderstanding is to think that symbols necessarily correspond directly to identifiers. This leads some people to talk of “the symbol table” (as they would in referring to an assembled object program). But this is not really a useful concept; although symbols are certainly stored in a kind of table internally, Ruby does not expose the table as an entity we can access, and we as programmers don’t care that it is there.

What is more, symbols need not even look like identifiers. Typically they do, whatever that means; but they can also contain punctuation if they are enclosed in quotes. These are also valid Ruby symbols:

sym1 = :"This is a symbol"
sym2 = :"This is, too!"
sym3 = :")(*&^%$" # and even this

You could even use such symbols to define instance variables and methods, but then you would need such techniques as send and instance_variable_get to reference them. In general, such a thing is not recommended.

6.1.1 Symbols as Enumerations

Languages such as Pascal and later versions of C have the concept of an enumerated type. Ruby doesn’t have type checking, but symbols are frequently useful for their mnemonic value; we might represent directions as :north, :south, :east, and :west. If we’re going to refer to those values repeatedly, we could store them in a constant:

Directions = [:north, :south, :east, :west]

If these were strings rather than symbols, storing them in a constant would save memory, but each symbol exists only once in object space anyhow. (Symbols, like Fixnums, are stored as immediate values.)

6.1.2 Symbols as Metavalues

Frequently, we use exceptions as a way of avoiding return codes. But if you prefer to use return codes, you can. The fact that Ruby’s methods are not limited to a single return type makes it possible to pass back “out-of-band” values.

We frequently have need for such values. At one time, the ASCII NUL character was considered to be not a character at all. C has the idea of the NULL pointer, Pascal has the nil pointer, SQL has NULL, and so on. Ruby, of course, has nil.

The trouble with such metavalues is that they keep getting absorbed into the set of valid values. Everyone today considers NUL a true ASCII character. And in Ruby, nil isn’t really a non-object; it can be stored and manipulated. Thus, we have minor annoyances such ashash[key]returning nil; did it return nil because the key was not found or because the key is really associated with a nil?

The point here is that symbols can sometimes be used as good metavalues. Imagine a method that somehow grabs a string from the network (perhaps via HTTP or something similar). If we want, we can return non-string values to indicate exceptional occurrences:

str = get_string
case str
when String
# Proceed normally
when :eof
# end of file, socket closed, whatever
when :error
# I/O or network error
when :timeout
# didn't get a reply
end

Is this really “better” or clearer than using exceptions? Not necessarily. But it is a technique to keep in mind, especially when you want to deal with conditions that may be “edge cases” but not necessarily errors.

6.1.3 Symbols, Variables, and Methods

Probably the best known use of symbols is in defining attributes on a class:

class MyClass
attr_reader :alpha, :beta
attr_writer :gamma, :delta
attr_accessor :epsilon
# ...
end

Bear in mind that there is some code at work here. For example, attr_accessor uses the symbol name to determine the name of the instance variable and the reader and writer methods. That does not mean that there is always an exact correspondence between that symbol and that instance variable name. For example, if we use instance_variable_set, we have to specify the exact name of the variable, including the at sign:

instance_variable_set(:@foo, "str") # Works
instance_variable_set(:foo, "str") # error

In short, a symbol passed into the attr family of methods is just an argument, and these methods create instance variables and methods as needed, based on the value of that symbol. (The writer has an equal sign appended to the end, and the instance variable name has an at sign added to the front.) In other cases, the symbol must exactly correspond to the identifier it is referencing.

In most if not all cases, methods that expect symbols can also take strings. The reverse is not necessarily true.

6.1.4 Converting to/from Symbols

Strings and symbols can be freely interconverted with the to_str and to_sym methods:

a = "foobar"
b = :foobar
a == b.to_str # true
b == a.to_sym # true

If you’re doing metaprogramming, the following method might prove useful sometimes while experimenting:

class Symbol
def +(other)
(self.to_s + other.to_s).to_sym
end
end

The preceding method allows us to concatenate symbols (or append a string onto a symbol). It’s generally bad form to change the behavior of core classes in code that others will use, but it can be a powerful tool for understanding by experimentation.

The following is an example that uses it; this trivial piece of code accepts a symbol and tries to tell us whether it represents an accessor (that is, a reader and writer both exist):

class Object
def accessor?(sym)
return self.respond_to?(sym) and self.respond_to?(sym+"=")
end
end

There is a clever usage of symbols that I’ll mention here. When we do a map operation, sometimes a complex block may be attached. But in many cases, we are simply calling a method on each element of the array or collection:

list = words.map {|x| x.capitalize }

In such a case, it may seem we are doing a little too much punctuation for the benefit we’re getting. The Symbol class defines a to_proc method. This ensures that any symbol can be coerced into a proc object. A proc is effectively a method that can be manipulated like an object: assigned to variables, or called at will. On a Symbol, the proc returned by the method will simply call the method named by the symbol—in other words, it will send the symbol itself as a message to the object. The method might be defined like this:

def to_proc
proc {|obj, *args| obj.send(self, *args) }
end

With this method in place, we can rewrite our original code fragment:

list = words.map(&:capitalize)

It’s worth spending a minute understanding how this works. The map method ordinarily takes only a block (no other parameters). The ampersand notation allows us to pass a proc instead of an explicit attached block if we want. Because we use the ampersand on an object that isn’t aproc, the interpreter tries to call to_proc on that object. The resulting proc takes the place of an explicit block so that map will call it repeatedly, once for each element in the array. Now, why does self make sense as the thing passed as a message to the array element? It’s because aproc is a closure and therefore remembers the context in which it was created. At the time it was created, self referred to the symbol on which the to_proc was called.

Next, we look at ranges, which are straightforward but surprisingly handy.

6.2 Ranges

Ranges are fairly intuitive, but they do have a few confusing uses and qualities. A numeric range is one of the simplest:

digits = 0..9
scale1 = 0..10
scale2 = 0...10

The .. operator is inclusive of its endpoint, and the ... operator is exclusive of its endpoint. (This may seem unintuitive to you; if so, just memorize this fact.) Therefore, digits and scale2, shown in the preceding example, are effectively the same.

But ranges are not limited to integers or numbers. The beginning and end of a range may be any Ruby object. However, not all ranges are meaningful or useful, as we shall see.

The primary operations you might want to do on a range are to iterate over it, convert it to an array, or determine whether it includes a given object. Let’s look at all the ramifications of these and other operations.

6.2.1 Open and Closed Ranges

We call a range “closed” if it includes its end, and “open” if it does not:

r1 = 3..6 # a closed range
r2 = 3...6 # an open range
a1 = r1.to_a # [3,4,5,6]
a2 = r2.to_a # [3,4,5]

There is no way to construct a range that excludes its beginning point. This is arguably a limitation of the language.

6.2.2 Finding Endpoints

The first and last methods return the left and right endpoints of a range. Synonyms are begin and end (which are normally keywords but may be called as methods when there is an explicit receiver).

r1 = 3..6
r2 = 3...6
r1a, r1b = r1.first, r1.last # 3, 6
r1c, r1d = r1.begin, r1.end # 3, 6
r2a, r2b = r1.begin, r1.end # 3, 6

The exclude_end? method tells us whether the endpoint is excluded:

r1.exclude_end? # false
r2.exclude_end? # true

6.2.3 Iterating Over Ranges

Typically, it’s possible to iterate over a range. For this to work, the class of the endpoints must define a meaningful succ method:

(3..6).each {|x| puts x } # prints four lines
# (parens are necessary)

So far, so good. But be very cautious when dealing with String ranges! It is possible to iterate over ranges of strings because the String class defines a succ operator, but it is of limited usefulness. You should use this kind of feature only in well-known, isolated circumstances because the succ method for strings is not defined with exceptional rigor. (It is “intuitive” rather than lexicographic; therefore, some strings have a successor that is surprising or meaningless.)

r1 = "7".."9"
r2 = "7".."10"
r1.each {|x| puts x } # Prints three lines
r2.each {|x| puts x } # Prints no output!

The preceding examples look similar but work differently. The reason lies partly in the fact that in the second range, the endpoints are strings of different length. To our eyes, we expect this range to cover the strings "7", "8", "9", and "10", but what really happens?

When we try to iterate over r2, we start with a value of "7" and enter a loop that terminates when the current value is greater than the endpoint on the right. But because "7" and "10" are strings, not numbers, they are compared as such. In other words, they are compared lexicographically, and we find that the left endpoint is greater than the right endpoint. Therefore, we don’t loop at all.

What about floating point ranges? We can construct them, and we can certainly test membership in them, which makes them useful. But we can’t iterate over them because there is no succ method. Here is an example:

fr = 2.0..2.2
fr.each {|x| puts x } # error!

Why isn’t there a floating point succ method? It would be theoretically possible to increment the floating point number by epsilon each time. But this would be highly architecture dependent, it would result in a frighteningly high number of iterations for even “small” ranges, and it would be of limited usefulness.

6.2.4 Testing Range Membership

Ranges are not much good if we can’t determine whether an item lies within a given range. As it turns out, the include? method makes this easy:

r1 = 23456..34567
x = 14142
y = 31416
r1.include?(x) # false
r1.include?(y) # true

The method member? is an alias.

But how does this work internally? How does the interpreter determine whether an item is in a given range? Actually, it makes this determination simply by comparing the item with the endpoints (so that range membership is dependent on the existence of a meaningful <=> operator).

Therefore, to say (a..b).include?(x) is equivalent to saying x >= a and x <= b.

Once again, beware of string ranges:

s1 = "2".."5"
str = "28"
s1.include?(str) # true (misleading!)

6.2.5 Converting to Arrays

When we convert a range to an array, the interpreter simply applies succ repeatedly until the end is reached, appending each item onto an array that is returned:

r = 3..12
arr = r.to_a # [3,4,5,6,7,8,9,10,11,12]

This naturally won’t work with Float ranges. It may sometimes work with String ranges, but this should be avoided because the results will not always be obvious or meaningful.

6.2.6 Backward Ranges

Does a backward range make any sense? Yes and no. For example, this is a perfectly valid range:

r = 6..3
x = r.begin # 6
y = r.end # 3
flag = r.end_excluded? # false

As you see, we can determine its starting and ending points and whether the end is included in the range. However, that is nearly all we can do with such a range.

arr = r.to_a # []
r.each {|x| p x} # No iterations
y = 5
r.include?(y) # false (for any value of y)

Does that mean that backward ranges are necessarily “evil” or useless? Not at all. It is still useful, in some cases, to have the endpoints encapsulated in a single object.

In fact, arrays and strings frequently take “backward ranges” because these are zero-indexed from the left but “minus one”-indexed from the right. Therefore, we can use expressions like these:

string = "flowery"
str1 = string[0..-2] # "flower"
str2 = string[1..-2] # "lower"
str3 = string[-5..-3] # "owe" (actually a forward range)

6.2.7 The Flip-Flop Operator

When the range operator is used in a condition, it is treated specially. This usage of .. is called the flip-flop operator because it is essentially a toggle that keeps its own state rather than a true range.

This trick, apparently originating with Perl, is useful. But understanding how it works takes a little effort.

Imagine we had a Ruby source file with embedded docs between =begin and =end tags. How would we extract and output only those sections? (Our state toggles between “inside” a section and “outside” a section, hence the flip-flop concept.) The following piece of code, while perhaps unintuitive, will work:

file.each_line do |line|
puts line if (line=~/=begin/)..(line=~/=end/)
end

How can this work? The magic all happens in the flip-flop operator.

First, realize that this “range” is preserving a state internally, but this fact is hidden. When the left endpoint becomes true, the range itself returns true; it then remains true until the right endpoint becomes true, and the range toggles to false.

This kind of feature might be used in some cases, such as parsing section-oriented config files, selecting ranges of items from lists, and so on.

However, I personally don’t like the syntax, and others are also dissatisfied with it. Removing it has been discussed publicly at bugs.ruby-lang.org/issues/5400, and Matz himself has said that it will eventually be removed.

So what’s wrong with the flip-flop? Here is my opinion.

First, in the preceding example, take a line with the value =begin. As a reminder, the =~ operator does not return true or false as we might expect; it returns the position of the match (a Fixnum) or nil if there was no match. So then the expressions in the range evaluate to 0 and nil, respectively.

However, if we try to construct a range from 0 to nil, it gives us an error because it is nonsensical:

range = 0..nil # error!

Furthermore, bear in mind that in Ruby, only false and nil evaluate to false; everything else evaluates as true. Then a range ordinarily would not evaluate as false.

puts "hello" if x..y
# Prints "hello" for any valid range x..y

And again, suppose we stored these values in variables and then used the variables to construct the range. This doesn’t work because the test is always true:

file.each_line do |line|
start = line=~/=begin/
stop = line=~/=end/
puts line if start..stop
end

What if we put the range itself in a variable? This doesn’t work either because, once again, the test is always true:

file.each_line do |line|
range = (line=~/=begin/)..(line=~/=end/)
puts line if range
end

To understand this, we have to understand that the entire range (with both endpoints) is reevaluated each time the loop is run, but the internal state is also factored in. The flip-flop operator is therefore not a true range at all. The fact that it looks like a range but is not is considered a bad thing by some.

Finally, think of the endpoints of the flip-flop. They are reevaluated every time, but this reevaluation cannot be captured in a variable that can be substituted. In effect, the flip-flop’s endpoints are like procs. They are not values; they are code. The fact that something that looks like an ordinary expression is really a proc is also undesirable.

Having said all that, the functionality is still useful. Can we write a class that encapsulates this function without being so cryptic and magical? As it turns out, this is not difficult. Listing 6.1 introduces a simple class called Transition that mimics the behavior of the flip-flop.

Listing 6.1 The Transition Class


class Transition
A, B = :A, :B
T, F = true, false

# state,p1,p2 => newstate, result
Table = {[A,F,F]=>[A,F], [B,F,F]=>[B,T],
[A,T,F]=>[B,T], [B,T,F]=>[B,T],
[A,F,T]=>[A,F], [B,F,T]=>[A,T],
[A,T,T]=>[A,T], [B,T,T]=>[A,T]}

def initialize(proc1, proc2)
@state = A
@proc1, @proc2 = proc1, proc2
end

def check?
p1 = @proc1.call ? T : F
p2 = @proc2.call ? T : F
@state, result = *Table[[@state,p1,p2]]
return result
end
end


In the Transition class, we use a simple state machine to manage transitions. We initialize it with a pair of procs (the same ones used in the flip-flop). We do lose a little convenience in that any variables (such as line) used in the procs must already be in scope. But we now have a solution with no “magic” in it, where all expressions behave as they do any other place in Ruby.

Here’s a slight variant on the same solution. Let’s change the initialize method to take two arbitrary expressions:

def initialize(flag1, flag2)
@state = A
@flag1, @flag2 = flag1, flag2
end

def check?(item)
p1 = (@flag1 === item) ? T : F
p2 = (@flag2 === item) ? T : F
@state, result = *Table[[@state, p1, p2]]
return result
end

The case equality operator is used to check the relationship of the starting and ending flags with the variable.

Here is how we use the new version:

trans = Transition.new(/=begin/, /=end/)
file.each_line do |line|
puts line if trans.check?(line)
end

I do recommend an approach like this, which is more explicit and less magical.

6.2.8 Custom Ranges

Let’s look at an example of a range made up of some arbitrary object. Listing 6.2 shows a simple class to handle Roman numerals.

Listing 6.2 A Roman Numeral Class


class Roman
include Comparable

I,IV,V,IX,X,XL,L,XC,C,CD,D,CM,M =
1, 4, 5, 9, 10, 40, 50, 90, 100, 400, 500, 900, 1000

Values = %w[M CM D CD C XC L XL X IX V IV I]

def Roman.encode(value)
return "" if self == 0
str = ""
Values.each do |letters|
rnum = const_get(letters)
if value >= rnum
return(letters + str=encode(value-rnum))
end
end
str
end

def Roman.decode(rvalue)
sum = 0
letters = rvalue.split('')
letters.each_with_index do |letter,i|
this = const_get(letter)
that = const_get(letters[i+1]) rescue 0
op = that > this ? :- : :+
sum = sum.send(op,this)
end
sum
end

def initialize(value)
case value
when String
@roman = value
@decimal = Roman.decode(@roman)
when Symbol
@roman = value.to_s
@decimal = Roman.decode(@roman)
when Numeric
@decimal = value
@roman = Roman.encode(@decimal)
end
end

def to_i
@decimal
end

def to_s
@roman
end

def succ
Roman.new(@decimal+1)
end

def <=>(other)
self.to_i <=> other.to_i
end
end

def Roman(val)
Roman.new(val)
end


I’ll cover a few highlights of this class first. It can be constructed using a string or a symbol (representing a Roman numeral) or a Fixnum (representing an ordinary Hindu-Arabic decimal number). Internally, conversion is performed, and both forms are stored. There is a “convenience method” called Roman, which simply is a shortcut to calling the Roman.new method. The class-level methods encode and decode handle conversion to and from Roman form, respectively.

For simplicity, I haven’t done any error checking. I also assume that the Roman letters are uppercase.

The to_i method naturally returns the decimal value, and the to_s method predictably returns the Roman form. We define succ to be the next Roman number—for example, Roman(:IV).succ would be Roman(:V).

We implement the comparison operator by comparing the decimal equivalents in a straightforward way. We do an include of the Comparable module so that we can get the less-than and greater-than operators (which depend on the existence of the comparison method <=>).

Notice the gratuitous use of symbols in this fragment:

op = that > this ? :- : :+
sum = sum.send(op,this)

In the preceding fragment, we’re actually choosing which operation (denoted by a symbol) to perform—addition or subtraction. This code fragment is just a short way of saying the following:

if that > this
sum -= this
else
sum += this
end

The second fragment is longer but arguably clearer.

Because this class has both a succ method and a full set of relational operators, we can use it in a range. The following sample code demonstrates this:

require 'roman'

y1 = Roman(:MCMLXVI)
y2 = Roman(:MMIX)
range = y1..y2 # 1966..2009
range.each {|x| puts x} # 44 lines of output

epoch = Roman(:MCMLXX)
range.include?(epoch) # true

doomsday = Roman(2038)
range.include?(doomsday) # false

Roman(:V) == Roman(:IV).succ # true
Roman(:MCM) < Roman(:MM) # true

6.3 Conclusion

In this chapter, we’ve seen what symbols are in Ruby and how they are used. We’ve seen both standard and user-defined uses of symbols.

We’ve also looked at ranges in depth. We’ve seen how to convert them to arrays, how to use them as array or string indices, how to iterate over them, and other such operations. We’ve looked in detail at the flip-flop operator (and an alternative to the existing syntax). Finally, we’ve seen in detail how to construct a class so that it works well with range operators.

That ends our discussion of symbols and ranges. However, because they are commonly used in Ruby (and are extremely useful), you’ll see more of them in incidental code throughout the rest of the book.