Ruby in Review - THE RUBY WAY, Third Edition (2015)

THE RUBY WAY, Third Edition (2015)

Chapter 1. Ruby in Review

Language shapes the way we think and determines what we can think about.

—Benjamin Lee Whorf

It is worth remembering that a new programming language is sometimes viewed as a panacea, especially by its adherents. But no one language will supplant all the others; no one tool is unarguably the best for every possible task. There are many different problem domains in the world and many possible constraints on problems within those domains.

Above all, there are different ways of thinking about these problems, stemming from the diverse backgrounds and personalities of the programmers themselves. For these reasons, there is no foreseeable end to the proliferation of languages. And as long as there is a multiplicity of languages, there will be a multiplicity of personalities defending and attacking them. In short, there will always be “language wars”; in this book, however, we do not intend to participate in them.

Yet in the constant quest for newer and better program notations, we have stumbled across ideas that endure, that transcend the context in which they were created. Just as Pascal borrowed from Algol, just as Java borrowed from C, so will every language borrow from its predecessors.

A language is both a toolbox and a playground; it has a practical side, but it also serves as a test bed for new ideas that may or may not be widely accepted by the computing community.

One of the most far-reaching of these ideas is the concept of object-oriented programming (OOP). Although many would argue that the overall significance of OOP is evolutionary rather than revolutionary, no one can say that it has not had an impact on the industry. Twenty-five years ago, object orientation was for the most part an academic curiosity; today it is a universally accepted paradigm.

In fact, the ubiquitous nature of OOP has led to a significant amount of “hype” in the industry. In a classic paper of the late 1980s, Roger King observed, “If you want to sell a cat to a computer scientist, you have to tell him it’s object-oriented.” Additionally, there are differences of opinion about what OOP really is, and even among those who are essentially in agreement, there are differences in terminology.

It is not our purpose here to contribute to the hype. We do find OOP to be a useful tool and a meaningful way of thinking about problems; we do not claim that it cures cancer.

As for the exact nature of OOP, we have our pet definitions and favorite terminology; but we make these known only to communicate effectively, not to quibble over semantics.

We mention all this because it is necessary to have a basic understanding of OOP to proceed to the bulk of this book and understand the examples and techniques. Whatever else might be said about Ruby, it is definitely an object-oriented language.

1.1 An Introduction to Object Orientation

Before talking about Ruby specifically, it is a good idea to talk about object-oriented programming in the abstract. These first few pages review those concepts with only cursory references to Ruby before we proceed to a review of the Ruby language itself.

1.1.1 What Is an Object?

In object-oriented programming, the fundamental unit is the object. An object is an entity that serves as a container for data and also controls access to the data. Associated with an object is a set of attributes, which are essentially no more than variables belonging to the object. (In this book, we will loosely use the ordinary term variable for an attribute.) Also associated with an object is a set of functions that provide an interface to the functionality of the object, called methods.

It is essential that any OOP language provide encapsulation. As the term is commonly used, it means first that the attributes and methods of an object are associated specifically with that object, or bundled with it; second, it means that the scope of those attributes and methods is by default the object itself (an application of the principle of data hiding).

An object is considered to be an instance or manifestation of an object class (usually simply called a class). The class may be thought of as the blueprint or pattern; the object itself is the thing created from that blueprint or pattern. A class is often thought of as an abstract type—a more complex type than, for example, an integer or character string.

When an object (an instance of a class) is created, it is said to be instantiated. Some languages have the notion of an explicit constructor and destructor for an object—functions that perform whatever tasks are needed to initialize an object and (respectively) to “destroy” it. We may as well mention prematurely that Ruby has what might be considered a constructor but certainly does not have any concept of a destructor (because of its well-behaved garbage collection mechanism).

Occasionally a situation arises in which a piece of data is more “global” in scope than a single object, and it is inappropriate to put a copy of the attribute into each instance of the class. For example, consider a class called MyDogs, from which three objects are created: fido, rover, andspot. For each dog, there might be such attributes as age and date of vaccination. But suppose that we want to store the owner’s name (the owner of all the dogs). We could certainly put it in each object, but that is wasteful of memory and at the very least a misleading design. Clearly theowner_name attribute belongs not to any individual object but to the class itself. When it is defined that way (and the syntax varies from one language to another), it is called a class attribute (or class variable).

Of course, there are many situations in which a class variable might be needed. For example, suppose that we wanted to keep a count of how many objects of a certain class had been created. We could use a class variable that was initialized to zero and incremented with every instantiation; the class variable would be associated with the class and not with any particular object. In scope, this variable would be just like any other attribute, but there would only be one copy of it for the entire class and the entire set of objects created from that class.

To distinguish between class attributes and ordinary attributes, the latter are sometimes explicitly called object attributes (or instance attributes). We use the convention that any attribute is assumed to be an instance attribute unless we explicitly call it a class attribute.

Just as an object’s methods are used to control access to its attributes and provide a clean interface to them, so is it sometimes appropriate or necessary to define a method associated with a class. A class method, not surprisingly, controls access to the class variables and also performs any tasks that might have classwide effects rather than merely objectwide. As with data attributes, methods are assumed to belong to the object rather than the class unless stated otherwise.

It is worth mentioning that there is a sense in which all methods are class methods. We should not suppose that when 100 objects are created, we actually copy the code for the methods 100 times! But the rules of scope assure us that each object method operates only on the object whose method is being called, providing us with the necessary illusion that object methods are associated strictly with their objects.

1.1.2 Inheritance

We come now to one of the real strengths of OOP, which is inheritance. Inheritance is a mechanism that allows us to extend a previously existing entity by adding features to create a new entity. In short, inheritance is a way of reusing code. (Easy, effective code reuse has long been the Holy Grail of computer science, resulting in the invention decades ago of parameterized subroutines and code libraries. OOP is only one of the later efforts in realizing this goal.)

Typically we think of inheritance at the class level. If we have a specific class in mind, and there is a more general case already in existence, we can define our new class to inherit the features of the old one. For example, suppose that we have a class named Polygon that describes convex polygons. If we then find ourselves dealing with a Rectangle class, we can inherit from Polygon so that Rectangle has all the attributes and methods that Polygon has. For example, there might be a method that calculates perimeter by iterating over all the sides and adding their lengths. Assuming that everything was implemented properly, this method would automatically work for the new class; the code would not have to be rewritten.

When a class B inherits from a class A, we say that B is a subclass of A, or conversely A is the superclass of B. In slightly different terminology, we may say that A is a base class or parent class, and B is a derived class or child class.

A derived class, as we have seen, may treat a method inherited from its base class as if it were its own. On the other hand, it may redefine that method entirely if it is necessary to provide a different implementation; this is referred to as overriding a method. In addition, most languages provide a way for an overridden method to call its namesake in the parent class; that is, the method foo in B knows how to call method foo in A if it wants to. (Any language that does not provide this feature is under suspicion of not being truly object oriented.) Essentially the same is true for data attributes.

The relationship between a class and its superclass is interesting and important; it is usually described as the is-a relationship, because a Square “is a” Rectangle, and a Rectangle “is a” Polygon, and so on. Thus, if we create an inheritance hierarchy (which tends to exist in one form or another in any OOP language), we see that the more specific entity “is a” subclass of the more general entity at any given point in the hierarchy. Note that this relationship is transitive—in the previous example, we easily see that a Square “is a” Polygon. Note also that the relationship is not commutative—we know that every Rectangle is a Polygon, but not every Polygon is a Rectangle.

This brings us to the topic of multiple inheritance (MI). It is conceivable that a new class could inherit from more than one class. For example, the classes Dog and Cat can both inherit from the class Mammal, and Sparrow and Raven can inherit from WingedCreature. But what if we want to define a Bat? It can reasonably inherit from both the classes Mammal and WingedCreature. This corresponds well with our real-life experience in which things are not members of just one category but of many non-nested categories.

MI is probably the most controversial area in OOP. One camp will point out the potential for ambiguity that must be resolved. For example, if Mammal and WingedCreature both have an attribute called size (or a method called eat), which one will be referenced when we refer to it from a Bat object? Another related difficulty is the diamond inheritance problem—so called because of the shape of its inheritance diagram, with both superclasses inheriting from a single common superclass. For example, imagine that Mammal and WingedCreature both inherit fromOrganism; the hierarchy from Organism to Bat forms a diamond. But what about the attributes that the two intermediate classes both inherit from their parent? Does Bat get two copies of each of them? Or are they merged back into single attributes because they come from a common ancestor in the first place?

These are both issues for the language designer rather than the programmer. Different OOP languages deal with the issues differently. Some provide rules allowing one definition of an attribute to “win out,” or a way to distinguish between attributes of the same name, or even a way of aliasing or renaming the identifiers. This in itself is considered by many to be an argument against MI—the mechanisms for dealing with name clashes and the like are not universally agreed upon but are language dependent. C++ offers a minimal set of features for dealing with ambiguities; those of Eiffel are probably better, and those of Perl are different from both.

The alternative, of course, is to disallow MI altogether. This is the approach taken by such languages as Java and Ruby. This sounds like a drastic compromise; however, as we shall see later, it is not as bad as it sounds. We will look at a viable alternative to traditional MI, but we must first discuss polymorphism, yet another OOP buzzword.

1.1.3 Polymorphism

Polymorphism is the term that perhaps inspires the most semantic disagreement in the field. Everyone seems to know what it is, but everyone has a different definition. (In recent years, “What is polymorphism?” has become a popular interview question. If it is asked of you, I recommend quoting an expert such as Bertrand Meyer or Bjarne Stroustrup; that way, if the interviewer disagrees, his beef is with the expert and not with you.)

The literal meaning of polymorphism is “the ability to take on multiple forms or shapes.” In its broadest sense, this refers to the ability of different objects to respond in different ways to the same message (or method invocation).

Damian Conway, in his book Object-Oriented Perl, distinguishes meaningfully between two kinds of polymorphism. The first, inheritance polymorphism, is what most programmers are referring to when they talk about polymorphism.

When a class inherits from its superclass, we know (by definition) that any method present in the superclass is also present in the subclass. Thus, a chain of inheritance represents a linear hierarchy of classes that can respond to the same set of methods. Of course, we must remember that any subclass can redefine a method; that is what gives inheritance its power. If I call a method on an object, typically it will be either the one it inherited from its superclass or a more appropriate (more specialized) method tailored for the subclass.

In statically typed languages such as C++, inheritance polymorphism establishes type compatibility down the chain of inheritance (but not in the reverse direction). For example, if B inherits from A, a pointer to an A object can also point to a B object; but the reverse is not true. This type compatibility is an essential OOP feature in such languages—indeed it almost sums up polymorphism—but polymorphism certainly exists in the absence of static typing (as in Ruby).

The second kind of polymorphism Conway identifies is interface polymorphism. This does not require any inheritance relationship between classes; it only requires that the interfaces of the objects have methods of a certain name. The treatment of such objects as being the same “kind” of thing is thus a kind of polymorphism (though in most writings, it is not explicitly referred to as such).

Readers familiar with Java will recognize that it implements both kinds of polymorphism. A Java class can extend another class, inheriting from it via the extends keyword; or it may implement an interface, acquiring a known set of methods (which must then be overridden) via theimplements keyword. Because of syntax requirements, the Java interpreter can determine at compile time whether a method can be invoked on a particular object.

Ruby supports interface polymorphism but in a different way, providing modules whose methods may be mixed in to existing classes (interfacing to user-defined methods that are expected to exist). This, however, is not the way modules are usually used. A module consists of methods and constants that may be used as though they were actual parts of that class or object; when a module is mixed in via the include statement, this is considered to be a restricted form of multiple inheritance. According to the language designer, Yukihiro Matsumoto (often called Matz), it can be viewed as single inheritance with implementation sharing. This is a way of preserving the benefits of MI without suffering all the consequences.

1.1.4 A Few More Terms

In languages such as C++, there is the concept of abstract classes—classes that must be inherited from and cannot be instantiated on their own. This concept does not exist in the more dynamic Ruby language, although if the programmer really wants, it is possible to fake this kind of behavior by forcing the methods to be overridden. Whether this is useful is left as an exercise for the reader.

The creator of C++, Bjarne Stroustrup, also identifies the concept of a concrete type. This is a class that exists only for convenience; it is not designed to be inherited from, nor is it expected that there will ever be another class derived from it. In other words, the benefits of OOP are basically limited to encapsulation. Ruby does not specifically support this concept through any special syntax (nor does C++), but it is naturally well suited for the creation of such classes.

Some languages are considered to be more “pure” OO than others. (We also use the term radically object oriented.) This refers to the concept that every entity in the language is an object; every primitive type is represented as a full-fledged class, and variables and constants alike are recognized as object instances. This is in contrast to such languages as Java, C++, and Eiffel. In these, the more primitive data types (especially constants) are not first-class objects, though they may sometimes be treated that way with “wrapper” classes. Arguably there are languages that aremore radically object oriented than Ruby, but they are relatively few.

Most OO languages are static; the methods and attributes belonging to a class, the global variables, and the inheritance hierarchy are all defined at compile time. Perhaps the largest conceptual leap for a Ruby programmer is that these are all handled dynamically in Ruby. Definitions and even inheritance can happen at runtime—in fact, we can truly say that every declaration or definition is actually executed during the running of the program. Among many other benefits, this obviates the need for conditional compilation and can produce more efficient code in many circumstances.

This sums up the whirlwind tour of OOP. Throughout the rest of the book, we have tried to make consistent use of the terms introduced here. Let’s proceed now to a brief review of the Ruby language itself.

1.2 Basic Ruby Syntax and Semantics

In the previous pages, we have already seen that Ruby is a pure, dynamic OOP language. Let’s look briefly at some other attributes before summarizing the syntax and semantics.

Ruby is an agile language. It is “malleable” and encourages frequent, easy (manual) refactoring.

Ruby is an interpreted language. Of course, there may be later implementations of a Ruby compiler for performance reasons, but we maintain that an interpreter yields great benefits not only in rapid prototyping but also in the shortening of the development cycle overall.

Ruby is an expression-oriented language. Why use a statement when an expression will do? This means, for instance, that code becomes more compact as the common parts are factored out and repetition is removed.

Ruby is a very high-level language (VHLL). One principle behind the language design is that the computer should work for the programmer rather than vice versa. The “density” of Ruby means that sophisticated and complex operations can be carried out with relative ease as compared to lower-level languages.

Let’s start by examining the overall look and feel of the language and some of its terminology. We’ll briefly examine the nature of a Ruby program before looking at examples.

To begin with, Ruby is essentially a line-oriented language—more so than languages such as C but not so much as antique languages such as FORTRAN. Tokens can be crowded onto a single line as long as they are separated by whitespace as needed. Statements may share a single line if they are separated by semicolons; this is the only time the terminating semicolon is really needed. A line may be continued to the next line by ending it with a backslash or by letting the parser know that the statement is not complete—for example, by ending a line with a comma.

There is no main program as such; execution proceeds in general from top to bottom. In more complex programs, there may be numerous definitions at the top, followed by the (conceptual) main program at the bottom; but even in that case, execution proceeds from the top down because definitions in Ruby are executed.

1.2.1 Keywords and Identifiers

The keywords (or reserved words) in Ruby typically cannot be used for other purposes. These are as follows:

BEGIN END alias and begin

break case class def defined?

do else elsif end ensure

false for if in module

next nil not or redo

rescue retry return self super

then true undef unless until

when while yield

Variables and other identifiers normally start with an alphabetic letter or a special modifier. The basic rules are as follows:

• Local variables (and pseudovariables such as self and nil) begin with a lowercase letter or an underscore.

• Global variables begin with $ (a dollar sign).

• Instance variables (within an object) begin with @ (an at sign).

• Class variables (within a class) begin with @@ (two at signs).

• Constants begin with capital letters.

• For purposes of forming identifiers, the underscore (_) may be used as a lowercase letter.

• Special variables starting with a dollar sign (such as $1 and $/) are set by the Ruby interpreter itself.

Here are some examples of each of these:

Local variables alpha _ident some_var

Pseudovariables self nil __FILE__

Constants K9chip Length LENGTH

Instance variables @foobar @thx1138 @NOT_CONST

Class variable @@phydeaux @@my_var @@NOT_CONST

Global variables $beta $B12vitamin $NOT_CONST

1.2.2 Comments and Embedded Documentation

Comments in Ruby begin with a pound sign (#) outside a string or character constant and proceed to the end of the line:

x = y + 5 # This is a comment.
# This is another comment.
puts "# But this isn't."

Comments immediately before definitions typically document the thing that is about to be defined. This embedded documentation can often be retrieved from the program text by external tools. Typical documentation comments can run to several comment lines in a row.

# The purpose of this class
# is to cure cancer
# and instigate world peace
class ImpressiveClass

Given two lines starting with =begin and =end, everything between those lines (inclusive) is treated as a comment. (These can’t be preceded by whitespace.)

=begin
Everything on lines
inside here will be a
comment as well.
=end

1.2.3 Constants, Variables, and Types

In Ruby, variables do not have types, but the objects they refer to do have types. The simplest data types are character, numeric, and string.

Numeric constants are mostly intuitive, as are strings. Generally, a double-quoted string is subject to additional interpretation, and a single-quoted string is more “as is,” allowing only an escaped backslash.

In double-quoted strings, we can do “interpolation” of variables and expressions, as shown here:

a = 3
b = 79
puts "#{a} times #{b} = #{a*b}" # 3 times 79 = 237

For more information on literals (numbers, strings, regular expressions, and so on), refer to later chapters.

There is a special kind of string worth mentioning, primarily useful in small scripts used to glue together larger programs. The command output string is sent to the operating system as a command to be executed, whereupon the output of the command is substituted back into the string. The simple form of this string uses the grave accent (sometimes called a back-tick or back-quote) as a beginning and ending delimiter; the more complex form uses the %x notation:

'whoami'
'ls -l'
%x[grep -i meta *.html | wc -l]

Regular expressions in Ruby look similar to character strings, but they are used differently. The usual delimiter is a slash character.

For those familiar with Perl, regular expression handling is similar in Ruby. Incidentally, we’ll use the abbreviation regex throughout the remainder of the book; many people abbreviate it as regexp, but that is not as pronounceable. For details on regular expressions, see Chapter 3, “Working with Regular Expressions.”

Arrays in Ruby are a powerful construct; they may contain data of any type or may even mix types. As we shall see in Chapter 8, “Arrays, Hashes, and Other Enumerables,” all arrays are instances of the class Array and thus have a rich set of methods that can operate on them. An array constant is delimited by brackets; the following are all valid array expressions:

[1, 2, 3]
[1, 2, "buckle my shoe"]
[1, 2, [3,4], 5]
["alpha", "beta", "gamma", "delta"]

The second example shows an array containing both integers and strings; the third example in the preceding code shows a nested array, and the fourth example shows an array of strings. As in most languages, arrays are zero based; for instance, in the last array in the preceding code,"gamma" is element number 2. Arrays are dynamic and do not need to have a size specified when they are created.

Because the array of strings is so common (and so inconvenient to type), a special syntax has been set aside for it, similar to what we have seen already:

%w[alpha beta gamma delta]
%w(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
%w/am is are was were be being been/

Such a shorthand is frequently called “syntax sugar” because it offers a more convenient alternative to another syntactic form. In this case, the quotes and commas are not needed; only whitespace separates the individual elements. In the case of an element that contains whitespace, of course, this would not work.

An array variable can use brackets to index into the array. The resulting expression can be both examined and assigned to:

val = myarray[0]
print stats[j]
x[i] = x[i+1]

Another powerful construct in Ruby is the hash, also known in other circles as an associative array or dictionary. A hash is a set of associations between paired pieces of data; it is typically used as a lookup table or a kind of generalized array in which the index need not be an integer. Each hash is an instance of the class Hash.

A hash constant is typically represented between delimiting braces, with the symbol => separating the individual keys and values. The key can be thought of as an index where the corresponding value is stored. There is no restriction on types of the keys or the corresponding values. Here are some hashes:

{1 => 1, 2 => 4, 3 => 9, 4 => 16, 5 => 25, 6 => 36}
{"cat" => "cats", "ox" => "oxen", "bacterium" => "bacteria"}
{"odds" => [1,3,5,7], "evens" => [2,4,6,8]}
{"foo" => 123, [4,5,6] => "my array", "867-5309" => "Jenny"}

Hashes also have an additional syntax that creates keys that are instances of the Symbol class (which is explained further in later material):

{hydrogen: 1, helium: 2, carbon: 12}

A hash variable can have its contents accessed by essentially the same bracket notation that arrays use:

print phone_numbers["Jenny"]
plurals["octopus"] = "octopi"
atomic_numbers[:helium] #=> 2

It should be stressed, however, that both arrays and hashes have many methods associated with them; these methods give them their real usefulness. The section “OOP in Ruby,” later in the chapter, will expand on this a little more.

1.2.4 Operators and Precedence

Now that we have established our most common data types, let’s look at Ruby’s operators. They are arranged here in order from highest to lowest precedence:

Image

Some of the preceding symbols serve more than one purpose; for example, the operator << is a bitwise left shift but is also an append operator (for arrays, strings, and so on) and a marker for a here-document. Likewise, the + is for numeric addition as well as for string concatenation. As we shall see later, many of these operators are just shortcuts for method names.

Now we have defined most of our data types and many of the possible operations on them. Before going any further, let’s look at a sample program.

1.2.5 A Sample Program

In a tutorial, the first program is always Hello, world! But in a whirlwind tour like this one, let’s start with something slightly more advanced. Here is a small interactive console-based program to convert between Fahrenheit and Celsius temperatures:

print "Please enter a temperature and scale (C or F): "
STDOUT.flush
str = gets
exit if str.nil? || str.empty?
str.chomp!
temp, scale = str.split(" ")

abort "#{temp} is not a valid number." if temp !~ /-?\d+/

temp = temp.to_f
case scale
when "C", "c"
f = 1.8*temp + 32
when "F", "f"
c = (5.0/9.0)*(temp-32)
else
abort "Must specify C or F."
end

if f.nil?
puts "#{c} degrees C"
else
puts "#{f} degrees F"
end

Here are some examples of running this program. These show that the program can convert from Fahrenheit to Celsius, convert from Celsius to Fahrenheit, and handle an invalid scale or an invalid number:

Please enter a temperature and scale (C or F): 98.6 F
37.0 degrees C

Please enter a temperature and scale (C or F): 100 C
212.0 degrees F

Please enter a temperature and scale (C or F): 92 G
Must specify C or F.

Please enter a temperature and scale (C or F): junk F
junk is not a valid number.

Now, as for the mechanics of the program: We begin with a print statement, which is actually a call to the Kernel method print, to write to standard output. This is an easy way of leaving the cursor “hanging” at the end of the line.

Following this, we call gets (get string from standard input), assigning the value to str. We then do a chomp! to remove the trailing newline.

Note that any apparently “free-standing” function calls such as print and gets are actually methods of Object (probably originating in Kernel). In the same way, chomp is a method called with str as a receiver. Method calls in Ruby usually can omit the parentheses; for example,print "foo" is the same as print("foo").

The variable str refers to (or informally, it “holds”) a character string, but there is no reason it could not hold some other type instead. In Ruby, data have types, but variables do not. A variable springs into existence as soon as the interpreter sees an assignment to that variable; there are no “variable declarations” as such.

The exit is a call to a method that terminates the program. On this same line there is a control structure called an if-modifier. This is like the if statement that exists in most languages, but backwards; it comes after the action, does not permit an else, and does not require closing. As for the condition, we are checking two things: Does str have a value (is it non-nil) and is it a non-null string? In the case of an immediate end-of-file, our first condition will hold; in the case of a newline with no preceding data, the second condition will hold.

The || operator has the same effect as or, but is preferred because it has higher precedence and produces less-confusing results. The same statement could be written this way:

exit if not str or not str[0]

The reason these tests work is that a variable can have a nil value, and nil evaluates to false in Ruby. In fact, nil and false evaluate as false, and everything else evaluates as true. Specifically, the null string "" and the number 0 do not evaluate as false.

The next statement performs a chomp! operation on the string (to remove the trailing newline). The exclamation point as a prefix serves as a warning that the operation actually changes the value of its receiver rather than just returning a value. The exclamation point is used in many such instances to remind the programmer that a method has a side effect or is more “dangerous” than its unmarked counterpart. The method chomp, for example, returns the same result but does not modify its receiver.

The next statement is an example of multiple assignment. The split method splits the string into an array of values, using the space as a delimiter. The two assignable entities on the left-hand side will be assigned the respective values resulting on the right-hand side.

The if statement that follows uses a simple regex to determine whether the number is valid; if the string fails to match a pattern consisting of an optional minus sign followed by one or more digits, it is an invalid number (for our purposes), and the program exits. Note that the ifstatement is terminated by the keyword end; though it was not needed here, we could have had an else clause before the end. The keyword then is optional; we tend not to use it in this book.

The to_f method is used to convert the string to a floating point number. We are actually assigning this floating point value back to temp, which originally held a string.

The case statement chooses between three alternatives—the cases in which the user specified a C, specified an F, or used an invalid scale. In the first two instances, a calculation is done; in the third, we print an error and exit. When printing, the puts method will automatically add a newline after the string that is given.

Ruby’s case statement, by the way, is far more general than the example shown here. There is no limitation on the data types, and the expressions used are all arbitrary and may even be ranges or regular expressions.

There is nothing mysterious about the computation. But consider the fact that the variables c and f are referenced first inside the branches of the case. There are no declarations as such in Ruby; because a variable only comes into existence when it is assigned, this means that when we fall through the case statement, only one of these variables actually has a valid value.

We use this fact to determine after the fact which branch was followed, so that we can do a slightly different output in each instance. Testing f for a nil is effectively a test of whether the variable has a meaningful value. We do this here only to show that it can be done; obviously, two different print statements could be used inside the case statement if we wanted.

The perceptive reader will notice that we used only “local” variables here. This might be confusing because their scope certainly appears to cover the entire program. What is happening here is that the variables are all local to the top level of the program (written toplevel by some). The variables appear global because there are no lower-level contexts in a program this simple; but if we declared classes and methods, these top-level variables would not be accessible within those.

1.2.6 Looping and Branching

Let’s spend some time looking at control structures. We have already seen the simple if statement and the if-modifier; there are also corresponding structures based on the keyword unless (which also has an optional else), as well as expression-oriented forms of if and unless. To summarize these forms, these two statements are equivalent:

if x < 5
statement1
end


unless x >= 5
statement1
end

And so are these:

if x < 5
statement1
else
statement2
end


unless x < 5
statement2
else
statement1
end

And these:

statement1 if y == 3

statement1 unless y != 37.0

And these are also equivalent:

x = if a > 0 then b else c end


x = unless a <= 0 then c else b end

Note that the keyword then may always be omitted except in the final (expression-oriented) cases. Note also that the modifier form cannot have an else clause.

The case statement in Ruby is more powerful than in most languages. This multiway branch can even test for conditions other than equality—for example, a matched pattern. The test used by the case statement is called the case equality operator (===), and its behavior varies from one object to another. Let’s look at this example:

case "This is a character string."
when "some value"
puts "Branch 1"
when "some other value"
puts "Branch 2"
when /char/
puts "Branch 3"
else
puts "Branch 4"
end

The preceding code prints Branch 3. Why? It first tries to check for equality between the tested expression and one of the strings "some value" or "some other value"; this fails, so it proceeds. The third test is for a pattern within the string; when /char/ is equivalent to if /char/ === "This is a character string.". The test succeeds, and the third print statement is performed. The else clause always handles the default case in which none of the preceding tests succeeds.

If the tested expression is an integer, the compared value can be an integer range (for example, 3..8). In this case, the expression is tested for membership in that range. In all instances, the first successful branch will be taken.

Although the case statement usually behaves predictably, there are a few subtleties you should appreciate. We will look at these later.

As for looping mechanisms, Ruby has a rich set. The while and until control structures are both pretest loops, and both work as expected: One specifies a continuation condition for the loop, and the other specifies a termination condition. They also occur in “modifier” form, such asif and unless. There is also the loop method of the Kernel module (by default an infinite loop), and there are iterators associated with various classes.

The examples here assume an array called list, defined something like this:

list = %w[alpha bravo charlie delta echo]

They all step through the array and write out each element.

i = 0 # Loop 1 (while)
while i < list.size do
print "#{list[i]} "
i += 1
end


i = 0 # Loop 2 (until)
until i == list.size do
print "#{list[i]} "
i += 1
end

i = 0 # Loop 3 (post-test while)
begin
print "#{list[i]} "
i += 1
end while i < list.size

i = 0 # Loop 4 (post-test until)
begin
print "#{list[i]} "
i += 1
end until i == list.size

for x in list do # Loop 5 (for)
print "#{x} "
end

list.each do |x| # Loop 6 ('each' iterator)
print "#{x} "
end

i = 0 # Loop 7 ('loop' method)
n=list.size-1
loop do
print "#{list[i]} "
i += 1
break if i > n
end

i = 0 # Loop 8 ('loop' method)
n=list.size-1
loop do
print "#{list[i]} "
i += 1
break unless i <= n
end

n=list.size # Loop 9 ('times' iterator)
n.times do |i|
print "#{list[i]} "
end

n = list.size-1 # Loop 10 ('upto' iterator)
0.upto(n) do |i|
print "#{list[i]} "
end

n = list.size-1 # Loop 11 (for)
for i in 0..n do
print "#{list[i]} "
end

list.each_index do |x| # Loop 12 ('each_index' iterator)
print "#{list[x]} "
end

Let’s examine these in detail. Loops 1 and 2 are the “standard” forms of the while and until loops; they behave essentially the same, but their conditions are negations of each other. Loops 3 and 4 are the same thing in “post-test” versions; the test is performed at the end of the loop rather than at the beginning. Note that the use of begin and end in this context is strictly a kludge or hack; what is really happening is that a begin/end block (used for exception handling) is followed by a while or until modifier. In other words, this is only an illustration. Don’t code this way.

Loop 6 is arguably the “proper” way to write this loop. Note the simplicity of 5 and 6 compared with the others; there is no explicit initialization and no explicit test or increment. This is because an array “knows” its own size, and the standard iterator each (loop 6) handles such details automatically. Indeed, loop 3 is merely an indirect reference to this same iterator because the for loop works for any object having the iterator each defined. The for loop is only another way to call each.

Loops 7 and 8 both use the loop construct; as stated previously, loop looks like a keyword introducing a control structure, but it is really a method of the module Kernel, not a control structure at all.

Loops 9 and 10 take advantage of the fact that the array has a numeric index; the times iterator executes a specified number of times, and the upto iterator carries its parameter up to the specified value. Neither of these is truly suitable for this instance.

Loop 11 is a for loop that operates specifically on the index values, using a range, and loop 12 likewise uses the each_index iterator to run through the list of array indices.

In the preceding examples, we have not laid enough emphasis on the “modifier” form of the while and until loops. These are frequently useful, and they have the virtue of being concise. These two additional fragments both mean the same:

perform_task() until finished


perform_task() while not finished

Another fact is largely ignored in these examples: Loops do not always run smoothly from beginning to end, in a predictable number of iterations, or ending in a single predictable way. We need ways to control these loops further.

The first way is the break keyword, shown in loops 7 and 8. This is used to “break out” of a loop; in the case of nested loops, only the innermost one is halted. This will be intuitive for C programmers.

The redo keyword is jumps to the start of the loop body in while and until loops.

The next keyword effectively jumps to the end of the innermost loop and resumes execution from that point. It works for any loop or iterator.

The iterator is an important concept in Ruby, as we have already seen. What we have not seen is that the language allows user-defined iterators in addition to the predefined ones.

The default iterator for any object is called each. This is significant partly because it allows the for loop to be used. But iterators may be given different names and used for varying purposes.

It is also possible to pass parameters via yield, which will be substituted into the block’s parameter list (between vertical bars). As a somewhat contrived example, the following iterator does nothing but generate integers from 1 to 10, and the call of the iterator generates the first ten cubes:

def my_sequence
(1..10).each do |i|
yield i
end
end

my_sequence {|x| puts x**3 }

Note that do and end may be substituted for the braces that delimit a block. There are differences, but they are fairly subtle.

1.2.7 Exceptions

Ruby supports exceptions, which are standard means of handling unexpected errors in modern programming languages.

By using exceptions, special return codes can be avoided, as well as the nested if else “spaghetti logic” that results from checking them. Even better, the code that detects the error can be distinguished from the code that knows how to handle the error (because these are often separate anyway).

The raise statement raises an exception. Note that raise is not a reserved word but a method of the module Kernel. (There is an alias named fail.)

raise # Example 1
raise "Some error message" # Example 2
raise ArgumentError # Example 3
raise ArgumentError, "Bad data" # Example 4
raise ArgumentError.new("Bad data") # Example 5
raise ArgumentError, "Bad data", caller[0] # Example 6

In the first example in the preceding code, the last exception encountered is re-raised. In example 2, a RuntimeError (the default error) is created using the string Some error message.

In example 3, an ArgumentError is raised; in example 4, this same error is raised with the message “Bad data” associated with it. Example 5 behaves exactly the same as example 4. Finally, example 6 adds traceback information of the form "filename:line" or"filename:line:in 'method'" (as stored in the caller array).

Now, how do we handle exceptions in Ruby? The begin-end block is used for this purpose. The simplest form is a begin-end block with nothing but our code inside:

begin
# Just runs our code.
# ...
end

This is of no value in catching errors. The block, however, may have one or more rescue clauses in it. If an error occurs at any point in the code, between begin and rescue, control will be passed immediately to the appropriate rescue clause:

begin
x = Math.sqrt(y/z)
# ...
rescue ArgumentError
puts "Error taking square root."
rescue ZeroDivisionError
puts "Attempted division by zero."
end

Essentially the same thing can be accomplished by this fragment:

begin
x = Math.sqrt(y/z)
# ...
rescue => err
puts err
end

Here, the variable err is used to store the value of the exception; printing it causes it to be translated to some meaningful character string. Note that because the error type is not specified, the rescue clause will catch any descendant of StandardError. The notation rescue =>variable can be used with or without an error type before the => symbol.

In the event that error types are specified, it may be that an exception does not match any of these types. For that situation, we are allowed to use an else clause after all the rescue clauses:

begin
# Error-prone code...
rescue Type1
# ...
rescue Type2
# ...
else
# Other exceptions...
end

In many cases, we want to do some kind of recovery. In that event, the keyword retry (within the body of a rescue clause) restarts the begin block and tries those operations again:

begin
# Error-prone code...
rescue
# Attempt recovery...
retry # Now try again
end

Finally, it is sometimes necessary to write code that “cleans up” after a begin-end block. In the event this is necessary, an ensure clause can be specified:

begin
# Error-prone code...
rescue
# Handle exceptions
ensure
# This code is always executed
end

The code in an ensure clause is always executed before the begin-end block exits. This happens regardless of whether an exception occurred.

Exceptions may be caught in two other ways. First, there is a modifier form of the rescue clause:

x = a/b rescue puts("Division by zero!")

In addition, the body of a method definition is an implicit begin-end block; the begin is omitted, and the entire body of the method is subject to exception handling, ending with the end of the method:

def some_method
# Code...
rescue
# Recovery...
end

This sums up the basics of exception handling as well as the discussion of fundamental syntax and semantics.

There are numerous aspects of Ruby we have not discussed here. The rest of this chapter is devoted to the more advanced features of the language, including a collection of Ruby lore that will help the intermediate programmer learn to “think in Ruby.”

1.3 OOP in Ruby

Ruby has all the elements more generally associated with OOP languages, such as objects with encapsulation and data hiding, methods with polymorphism and overriding, and classes with hierarchy and inheritance. It goes further and adds limited metaclass features, singleton methods, modules, and mixins.

Similar concepts are known by other names in other OOP languages, but concepts of the same name may have subtle differences from one language to another. This section elaborates on the Ruby understanding and usage of these elements of OOP.

1.3.1 Objects

In Ruby, all numbers, strings, arrays, regular expressions, and many other entities are actually objects. Work is done by executing the methods belonging to the object:

3.succ # 4
"abc".upcase # "ABC"
[2,1,5,3,4].sort # [1,2,3,4,5]
some_object.some_method # some result

In Ruby, every object is an instance of some class; the class contains the implementation of the methods:

"abc".class # String
"abc".class.class # Class

In addition to encapsulating its own attributes and operations, an object in Ruby has an identity:

"abc".object_id # 53744407

This object ID is usually of limited usefulness to the programmer.

1.3.2 Built-in Classes

More than 30 built-in classes are predefined in the Ruby class hierarchy. Like many other OOP languages, Ruby does not allow multiple inheritance, but that does not necessarily make it any less powerful. Modern OO languages frequently follow the single inheritance model. Ruby does support modules and mixins, which are discussed in the next section. It also implements object IDs, as we just saw, which support the implementation of persistent, distributed, and relocatable objects.

To create an object from an existing class, the new method is typically used:

myFile = File.new("textfile.txt","w")
myString = String.new("This is a string object")

This is not always explicitly required, however. When using object literals, you do not need to bother with calling new, as we did in the previous example:

your_string = "This is also a string object"
number = 5 # new not needed here, either

Variables are used to hold references to objects. As previously mentioned, variables themselves have no type, nor are they objects themselves; they are simply references to objects:

x = "abc"

An exception to this is that small immutable objects of some built-in classes, such as Fixnum, are copied directly into the variables that refer to them. (These objects are no bigger than pointers, and it is more efficient to deal with them in this way.) In this case, assignment makes a copy of the object, and the heap is not used.

Variable assignment causes object references to be shared:

y = "abc"
x = y
x # "abc"

After x = y is executed, variables x and y both refer to the same object:

x.object_id # 53732208
y.object_id # 53732208

If the object is mutable, a modification done to one variable will be reflected in the other:

x.gsub!(/a/,"x")
y # "xbc"

Reassigning one of these variables has no effect on the other, however:

# Continuing previous example...
x = "abc"
y # still has value "xbc"

A mutable object can be made immutable using the freeze method:

x.freeze
x.gsub!(/b/,"y") # Error!

A symbol is a little unusual; it’s like an atom in Lisp. It acts like a kind of immutable string, and multiple uses of a symbol all reference the same value. A symbol can be converted to a string with the to_s method:

suits = [:hearts, :clubs, :diamonds, :spades]
lead = suits[1].to_s # "clubs"

Similar to arrays of strings, arrays of symbols can be created using the syntax shortcut %i:

suits = %i[hearts clubs diamonds spades] # an array of symbols

1.3.3 Modules and Mixins

Many built-in methods are available from class ancestors. Of special note are the Kernel methods mixed-in to the Object class; because Object is the universal parent class, the methods added to it from Kernel are also universally available. These methods form an important part of Ruby.

The terms module and mixin are nearly synonymous. A module is a collection of methods and constants that is external to the Ruby program. It can be used simply for namespace management, but the most common use of a module is to have its features “mixed” into a class (by usinginclude). In this case, it is used as a mixin.

This term was apparently borrowed most directly from Python. (It is sometimes written as mix-in, but we write it as a single word.) It is worth noting that some Lisp variants have had this feature for more than two decades.

Do not confuse this usage of the term module with another usage common in computing. A Ruby module is not an external source or binary file (though it may be stored in one of these). A Ruby module instead is an OOP abstraction similar to a class.

An example of using a module for namespace management is the frequent use of the Math module. To use the definition of pi, for example, it is not necessary to include the Math module; you can simply use Math::PI as the constant.

A mixin is a way of getting some of the benefits of multiple inheritance without dealing with all the difficulties. It can be considered a restricted form of multiple inheritance, but the language creator Matz has called it “single inheritance with implementation sharing.”

Note that include adds features of a module to the current space; the extend method adds features of a module to an object. With include, the module’s methods become available as instance methods; with extend, they become available as class methods.

We should mention that load and require do not relate to modules but rather to Ruby source and binary files (statically or dynamically loadable). A load operation reads a file and runs it in the current context so that its definitions become available at that point. A requireoperation is similar to a load, but it will not load a file if it has already been loaded.

The Ruby novice, especially from a C background, may be tripped up by require and include, which are basically unrelated to each other. You may easily find yourself doing a require followed by an include to use some externally stored module.

1.3.4 Creating Classes

Ruby has numerous built-in classes, and additional classes may be defined in a Ruby program. To define a new class, the following construct is used:

class ClassName
# ...
end

The name of the class is itself a global constant and therefore must begin with an uppercase letter. The class definition can contain class constants, class variables, class methods, instance variables, and instance methods. Class-level information is available to all objects of the class, whereas instance-level information is available only to the one object.

By the way, classes in Ruby do not, strictly speaking, have names. The “name” of a class is just a constant that is a reference to an object of type Class (because, in Ruby, Class is a class). There can certainly be more than one constant referring to a class, and these can be assigned to variables just as we can with any other object (because, in Ruby, Class is an object). If all this confuses you, don’t worry about it. For the sake of convenience, the novice can think of a Ruby class name as being like a C++ class name.

Here we define a simple class:

class Friend
@@myname = "Fred" # a class variable

def initialize(name, gender, phone)
@name, @sex, @phone = name, gender, phone
# These are instance variables
end

def hello # an instance method
puts "Hi, I'm #{@name}."
end

def Friend.our_common_friend # a class method
puts "We are all friends of #{@@myname}."
end

end

f1 = Friend.new("Susan", "female", "555-0123")
f2 = Friend.new("Tom", "male", "555-4567")

f1.hello # Hi, I'm Susan.
f2.hello # Hi, I'm Tom.
Friend.our_common_friend # We are all friends of Fred.

Because class-level data is accessible throughout the class, it can be initialized at the time the class is defined. If an instance method named initialize is defined, it is guaranteed to be executed right after an instance is allocated. The initialize method is similar to the traditional concept of a constructor, but it does not have to handle memory allocation. Allocation is handled internally by new, and deallocation is handled transparently by the garbage collector.

Now consider this fragment, and pay attention to the getmyvar, setmyvar, and myvar= methods:

class MyClass

NAME = "Class Name" # class constant
@@count = 0 # initialize a class variable

def initialize # called when object is allocated
@@count += 1
@myvar = 10
end

def self.getcount # class method
@@count # class variable
end

def getcount # instance returns class variable!
@@count # class variable
end

def getmyvar # instance method
@myvar # instance variable
end

def setmyvar(val) # instance method sets @myvar
@myvar = val
end

def myvar=(val) # Another way to set @myvar
@myvar = val
end
end

foo = MyClass.new # @myvar is 10
foo.setmyvar 20 # @myvar is 20
foo.myvar = 30 # @myvar is 30

Instance variables are different for each object that is an instance of the class. Class variables are shared between the class itself and every instance of the class. To create a variable that belongs only to the class, use an instance variable inside a class method. This class instance variablewill not be shared with instances and is therefore often preferred over class variables.

In the preceding code, we see that getmyvar returns the value of @myvar and that setmyvar sets it. (In the terminology of many programmers, these would be referred to as a getter and a setter, respectively.) These work fine, but they do not exemplify the “Ruby way” of doing things. The method myvar= looks like assignment overloading (though strictly speaking, it isn’t); it is a better replacement for setmyvar, but there is a better way yet.

The class Module contains methods called attr, attr_accessor, attr_reader, and attr_writer. These can be used (with symbols as parameters) to automatically handle controlled access to the instance data. For example, the three methods getmyvar, setmyvar, andmyvar= can be replaced by a single line in the class definition:

attr_accessor :myvar

This creates a method myvar that returns the value of @myvar and a method myvar= that enables the setting of the same variable. The methods attr_reader and attr_writer create read-only and write-only versions of an attribute, respectively.

Within the instance methods of a class, the pseudovariable self can be used as needed. This is only a reference to the current receiver, the object on which the instance method is invoked.

The modifying methods private, protected, and public can be used to control the visibility of methods in a class. (Instance variables are always private and inaccessible from outside the class, except by means of accessors.) Each of these modifiers takes a symbol like :foo as a parameter; if this is omitted, the modifier applies to all subsequent definitions in the class. Here is an example:

class MyClass

def method1
# ...
end

def method2
# ...
end

def method3
# ...
end

private :method1
public :method2
protected :method3

private

def my_method
# ...
end

def another_method
# ...
end

end

In the preceding class, method1 will be private, method2 will be public, and method3 will be protected. Because of the private method with no parameters, both my_method and another_method will be private.

The public access level is self-explanatory; there are no restrictions on access or visibility. The private level means that the method is accessible only within the class or its subclasses, and it is callable only in “function form”—with self, implicit or explicit, as a receiver. Theprotected level means that a method can be called by other objects of the class or its subclasses, unlike a private method (which can only be called on self).

The default visibility for the methods defined in a class is public. The exception is the instance-initializing method initialize. Methods defined at the top level are also public by default; if they are private, they can be called only in function form (as, for example, the methods defined in Object).

Ruby classes are themselves objects, being instances of the parent class Class. Ruby classes are always concrete; there are no abstract classes. However, it is theoretically possible to implement abstract classes in Ruby if you really want to do so.

The class Object is at the root of the hierarchy. It provides all the methods defined in the built-in Kernel module. (Technically, BasicObject is the parent of Object. It acts as a kind of “blank slate” object that does not have all the baggage of a normal object.)

To create a class that inherits from another class, define it in this way:

class MyClass < OtherClass
# ...
end

In addition to using built-in methods, it is only natural to define your own and also to redefine and override existing ones. When you define a method with the same name as an existing one, the previous method is overridden. If a method needs to call the “parent” method that it overrides (a frequent occurrence), the keyword super can be used for this purpose.

Operator overloading is not strictly an OOP feature, but it is familiar to C++ programmers and certain others. Because most operators in Ruby are simply methods anyway, it should come as no surprise that these operators can be overridden or defined for user-defined classes. Overriding the meaning of an operator for an existing class may be rare, but it is common to want to define operators for new classes.

It is possible to create aliases or synonyms for methods. The syntax (used inside a class definition) is as follows:

alias_method :newname, :oldname

The number of parameters will be the same as for the old name, and it will be called in the same way. An alias creates a copy of the method, so later changes to the original method will not be reflected in aliases created beforehand.

There is also a Ruby keyword called alias, which is similar; unlike the method, it can alias global variables as well as methods, and its arguments are not separated by a comma.

1.3.5 Methods and Attributes

As we’ve seen, methods are typically used with simple class instances and variables by separating the receiver from the method with a period (receiver.method). In the case of method names that are punctuation, the period is omitted. Methods can take arguments:

Time.mktime(2014, "Aug", 24, 16, 0)

Because every expression returns a value, method calls may typically be chained or stacked:

3.succ.to_s
/(x.z).*?(x.z).*?/.match("x1z_1a3_x2z_1b3_").to_a[1..3]
3+2.succ

Note that there can be problems if the cumulative expression is of a type that does not support that particular method. Specifically, some methods return nil under certain conditions, and this usually causes any methods tacked onto that result to fail. (Of course, nil is an object in its own right, but it will not have all the same methods that, for example, an array would have.)

Certain methods may have blocks passed to them. This is true of all iterators, whether built in or user defined. A block is usually passed as a do-end block or a brace-delimited block; it is not treated like the other parameters preceding it, if any. See especially the File.open example:

my_array.each do |x|
x.some_action
end

File.open(filename) { |f| f.some_action }

Methods may take a variable number of arguments:

receiver.method(arg1, *more_args)

In this case, the method called treats more_args as an array that it deals with as it would any other array. In fact, an asterisk in the list of formal parameters (on the last or only parameter) can likewise “collapse” a sequence of actual parameters into an array:

def mymethod(a, b, *c)
print a, b
c.each do |x| print x end
end

mymethod(1,2,3,4,5,6,7)

# a=1, b=2, c=[3,4,5,6,7]

Ruby also supports named parameters, which are called keyword arguments in the Python realm; the concept dates back at least as far as the Ada language developed in the 1960s and 70s. Named parameters simultaneously set default values and allow arguments to be given in any order because they are explicitly labeled:

def mymethod(name: "default", options: {})
options.merge!(name: name)
some_action_with(options)
end

When a named parameter has its default omitted in the method definition, it is a required named parameter:

def other_method(name:, age:)
puts "Person #{name} is aged #{age}."
# It's an error to call this method without specifying
# values for name and age.
end

Ruby has the capability to define methods on a per-object basis (rather than per class). Such methods are called singletons, and they belong solely to that object and have no effect on its class or superclasses. As an example, this might be useful in programming a GUI; you can define a button action for a widget by defining a singleton method for the button object.

Here is an example of defining a singleton method on a string object:

str = "Hello, world!"
str2 = "Goodbye!"

def str.spell
self.split(/./).join("-")
end

str.spell # "H-e-l-l-o-,- -w-o-r-l-d-!"
str2.spell # error!

Be aware that the method is defined for the object itself, and not for the variable.

It is theoretically possible to create a prototype-based object system using singleton methods. This is a less traditional form of OOP without classes. The basic structuring mechanism is to construct a new object using an existing object as a delegate; the new object is exactly like the old object except for things that are overridden. This enables you to build prototype/delegation-based systems rather than inheritance based, and, although we do not have experience in this area, we do feel that this demonstrates the power of Ruby.

1.4 Dynamic Aspects of Ruby

Ruby is a dynamic language in the sense that objects and classes may be altered at runtime. Ruby has the capability to construct and evaluate pieces of code in the course of executing the existing statically coded program. It has a sophisticated reflection API that makes it more “self-aware”; this enables the easy creation of debuggers, profilers, and similar tools and also makes certain advanced coding techniques possible.

This is perhaps the most difficult area a programmer will encounter in learning Ruby. In this section, we briefly examine some of the implications of Ruby’s dynamic nature.

1.4.1 Coding at Runtime

We have already discussed load and require, but it is important to realize that these are not built-in statements or control structures or anything of that nature; they are actual methods. Therefore, it is possible to call them with variables or expressions as parameters or to call them conditionally. Contrast with this the #include directive in C or C++, which is evaluated and acted on at compile time.

Code can be constructed piecemeal and evaluated. As another contrived example, consider this calculate method and the code calling it:

def calculate(op1, operator, op2)
string = op1.to_s + operator + op2.to_s
# operator is assumed to be a string; make one big
# string of it and the two operands
eval(string) # Evaluate and return a value
end

@alpha = 25
@beta = 12

puts calculate(2, "+", 2) # Prints 4
puts calculate(5, "*", "@alpha") # Prints 125
puts calculate("@beta", "**", 3) # Prints 1728

As an even more extreme example, the following code prompts the user for a method name and a single line of code; then it actually defines the method and calls it:

puts "Method name: "
meth_name = gets
puts "Line of code: "
code = gets

string = %[def #{meth_name}\n #{code}\n end] # Build a string
eval(string) # Define the method
eval(meth_name) # Call the method

Frequently, programmers want to code for different platforms or circumstances and still maintain only a single code base. In such a case, a C programmer would use #ifdef directives, but in Ruby, definitions are executed. There is no “compile time,” and everything is dynamic rather than static. So if we want to make some kind of decision like this, we can simply evaluate a flag at runtime:

if platform == Windows
action1
elsif platform == Linux
action2
else
default_action
end

Of course, there is a small runtime penalty for coding in this way because the flag may be tested many times in the course of execution. But this example does essentially the same thing, enclosing the platform-dependent code in a method whose name is the same across all platforms:

if platform == Windows
def my_action
action1
end
elsif platform == Linux
def my_action
action2
end
else
def my_action
default_action
end
end

In this way, the same result is achieved, but the flag is only evaluated once; when the user’s code calls my_action, it will already have been defined appropriately.

1.4.2 Reflection

Languages such as Smalltalk, LISP, and Java implement (to varying degrees) the notion of a reflective programming language—one in which the active environment can query the objects that define it and extend or modify them at runtime.

Ruby allows reflection quite extensively but does not go as far as Smalltalk, which even represents control structures as objects. Ruby control structures and blocks are not objects. (A Proc object can be used to “objectify” a block, but control structures are never objects.)

The keyword defined? (with the question mark) may be used to determine whether an identifier name is in use:

if defined? some_var
puts "some_var = #{some_var}"
else
puts "The variable some_var is not known."
end

Similarly, the method respond_to? determines whether an object can respond to the specified method call (that is, whether that method is defined for that object). The respond_to? method is defined in class Object.

Ruby supports runtime-type information in a radical way. The type (or class) of an object can be determined at runtime using the method class (defined in Object). Similarly, is_a? tells whether an object is of a certain class (including the superclasses); kind_of? is an alias. Here is an example:

puts "abc".class # Prints String
puts 345.class # Prints Fixnum
rover = Dog.new

print rover.class # Prints Dog

if rover.is_a? Dog
puts "Of course he is."
end

if rover.kind_of? Dog
puts "Yes, still a dog."
end

if rover.is_a? Animal
puts "Yes, he's an animal, too."
end

It is possible to retrieve an exhaustive list of all the methods that can be invoked for a given object; this is done by using the methods method, defined in Object. There are also variations such as instance_methods, private_instance_methods, and so on.

Similarly, you can determine the class variables and instance variables associated with an object. By the nature of OOP, the lists of methods and variables include the entities defined not only in the object’s class but also in its superclasses. The Module class has a method calledconstants that is used to list the constants defined within a module.

The class Module has a method called ancestors that returns a list of modules included in the given module. This list is self-inclusive; Mod.ancestors will always have at least Mod in the list. This list comprises not only parent classes (through inheritance) but “parent” modules (through module inclusion).

The class BasicObject has a method called superclass that returns the superclass of the object or returns nil. Because BasicObject itself is the only object without a superclass, it is the only case in which nil will be returned.

The ObjectSpace module is used to access any and all “living” objects. The method _idtoref can be used to convert an object ID to an object reference; it can be considered the inverse of the object_id method. ObjectSpace also has an iterator called each_object that iterates over all the objects currently in existence, including many that you will not otherwise explicitly know about. (Remember that certain small immutable objects, such as objects of class Fixnum, NilClass, TrueClass, and FalseClass, are not kept on the stack for optimization reasons.)

1.4.3 Missing Methods

When a method is invoked (my_object.my_method), Ruby first searches for the named method according to this search order:

1. Singleton methods in the receiver my_object

2. Methods defined in my_object’s class

3. Methods defined among my_object’s ancestors

If the method my_method is not found, Ruby searches for a method called method_missing. If this method is defined, it is passed the name of the missing method (as a symbol) and all the parameters that were passed to the nonexistent mymethod. This facility can be used for the dynamic handling of unknown messages sent at runtime.

1.4.4 Garbage Collection

Managing memory on a low level is hard and error prone, especially in a dynamic environment such as Ruby; having a garbage collection (GC) facility is a significant advantage. In languages such as C++, memory allocation and deallocation are handled by the programmer; in other languages such as Java, memory is reclaimed (when objects go out of scope) by a garbage collector.

Memory management done by the programmer is the source of two of the most common kinds of bugs. If an object is freed while still being referenced, a later access may find the memory in an inconsistent state. These so-called dangling pointers are difficult to track down because they often cause errors in code that is far removed from the offending statement. Memory leaks are caused when an object is not freed even though there are no references to it. Programs with this bug typically use up more and more memory until they crash; this kind of error is also difficult to find. Ruby has a GC facility that periodically tracks down unused objects and reclaims the storage that was allocated to them. For those who care about such things, Ruby’s GC is done using a generational mark and sweep algorithm rather than reference counting (which can have difficulties with recursive structures).

Certain performance penalties may be associated with garbage collection. Some environment variables and methods on the GC module allow a programmer to tailor garbage collection to the needs of the individual program. We can also define an object finalizer, but this is an advanced topic (see Section 11.3.10, “Defining Finalizers for Objects”).

1.5 Training Your Intuition: Things to Remember

It may truly be said that “everything is intuitive once you understand it.” This verity is the heart of this section because Ruby has many features and personality quirks that may be different from what the traditional programmer is used to.

Some readers may feel their time is wasted by a reiteration of some of these points; if that is the case for you, you are free to skip the paragraphs that seem obvious to you. Programmers’ backgrounds vary widely; an old-time C hacker and a Smalltalk guru will each approach Ruby from a different viewpoint. We hope, however, that a perusal of these following paragraphs will assist many readers in following what some call the Ruby Way.

1.5.1 Syntax Issues

The Ruby parser is complex and relatively forgiving. It tries to make sense out of what it finds instead of forcing the programmer into slavishly following a set of rules. However, this behavior may take some getting used to. Here is a list of things to know about Ruby syntax:

• Parentheses are usually optional with a method call. These calls are all valid:

foobar
foobar()
foobar(a,b,c)
foobar a, b, c

• Let’s try to pass a hash to a method:

my_method {a: 1, b: 2, 5 => 25}

This results in a syntax error, because the left brace is seen as the start of a block. In this instance, parentheses are necessary:

my_method({a: 1, b: 2, 5 => 25})

• Now let’s suppose that the hash is the only parameter (or the last parameter) to a method. Ruby forgivingly lets us omit the braces:

my_method(a: 1, b: 2, 5 => 25)

Some people might think that this looks like a method invocation with named parameters. Really it isn’t, though it could be if the method were defined that way.

• There are other cases in which blank spaces are semi-significant. For example, these four expressions may all seem to mean the same thing:

x = y + z
x = y+z
x = y+ z
x = y +z

And in fact, the first three do mean the same thing. However, in the fourth case, the parser thinks that y is a method call and +z is a parameter passed to it! It will then give an error message for that line if there is no method named y. The moral is to use blank spaces in a reasonable way.

• Similarly, x = y * z is a multiplication of y and z, whereas x = y *z is an invocation of method y, passing an expansion of array z as a parameter.

• When parsing identifiers, the underscore is considered to be lowercase. Thus, an identifier may start with an underscore, but it will not be a constant even if the next letter is uppercase.

• In linear nested-if statements, the keyword elsif is used rather than else if or elif, as in some languages.

• Keywords in Ruby are not really reserved words. When a method is called on a receiver (or in other cases where there is no ambiguity), a keyword may be used as a method name. Do this with caution, remembering that programs should be readable by humans.

• The keyword then is optional (in if and case statements). Those who want to use it for readability may do so. The same is true for do in while and until loops.

• The question mark and exclamation point are not really part of the identifier that they modify but rather should be considered suffixes. Thus, we see that although, for example, chomp and chomp! are considered different identifiers, it is not permissible to use these characters in any other position in the word. Likewise, we use defined? in Ruby, but defined is the keyword.

• Inside a string, the pound sign (#) is used to signal expressions to be evaluated. This means that in some circumstances, when a pound sign occurs in a string, it has to be escaped with a backslash, but this is only when the next character is a { (left brace), $ (dollar sign), or @ (at sign).

• Because of the fact that the question mark may be appended to an identifier, care should be taken with spacing around the ternary operator. For example, suppose we have a variable called my_flag, which stores either true or false. Then the first line of code shown here will be correct, but the second will give a syntax error:

x = my_flag ? 23 : 45 # OK
x = my_flag? 23 : 45 # Syntax error

• The ending marker =end for embedded documentation should not be considered a token. It marks the entire line and thus any characters on the rest of that line are not considered part of the program text but belong to the embedded document.

• There are no arbitrary blocks in Ruby; that is, you can’t start a block whenever you feel like it, as in C. Blocks are allowed only where they are needed—for example, attached to an iterator. The exception is the begin-end block, which can be used basically anywhere.

• Remember that the keywords BEGIN and END are completely different from the begin and end keywords.

• When strings bump together (static concatenation), the concatenation is of a lower precedence than a method call. Here is an example:

str = "First " 'second'.center(20) # Examples 1 and 2
str = "First " + 'second'.center(20) # are the same.
str = "First second".center(20) # Examples 3 and 4
str = ("First " + 'second').center(20) # are the same.

• Ruby has several pseudovariables, which look like local variables but really serve specialized purposes. These are self, nil, true, false, __FILE__, and __LINE__.

1.5.2 Perspectives in Programming

Presumably everyone who knows Ruby (at this point in time) has been a student or user of other languages in the past. This, of course, makes learning Ruby easy in the sense that numerous features in Ruby are just like the corresponding features in other languages. On the other hand, the programmer may be lulled into a false sense of security by some of the familiar constructs in Ruby and may draw unwarranted conclusions based on past experience—which we might term “geek baggage.”

Many people have come to Ruby from Python, Java, Perl, Smalltalk, C/C++, and various other languages. Their presuppositions and expectations may all vary somewhat, but they will always be present. For this reason, we discuss here a few of the things that some programmers may “trip over” in using Ruby:

• There is no Boolean type such as many languages have. TrueClass and FalseClass are distinct classes, and their only instantiations are true and false.

• Many of Ruby’s operators are similar or identical to those in C. Two notable exceptions are the increment and decrement operators (++ and —). These are not available in Ruby, neither in “pre” nor “post” forms.

• The modulus operator is known to work somewhat differently in different languages with respect to negative numbers. The two sides of this argument are beyond the scope of this book; Ruby’s behavior is as follows:

puts (5 % 3) # Prints 2
puts (-5 % 3) # Prints 1
puts (5 % -3) # Prints -1
puts (-5 % -3) # Prints -2

• Some may be used to thinking that a false value may be represented as a zero, a null string, a null character, or various other things. But in Ruby, all of these are true; in fact, everything is true except false and nil.

• In Ruby, variables don’t have classes; only values have classes.

• There are no declarations of variables in Ruby. It is good practice, however, to assign nil to a variable initially. This certainly does not assign a type to the variable and does not truly initialize it, but it does inform the parser that this is a variable name rather than a method name.

• ARGV[0] is truly the first of the command-line parameters, numbering naturally from zero; it is not the file or script name preceding the parameters, like argv[0] in C.

• Most of Ruby’s operators are really methods; the “punctuation” form of these methods is provided for familiarity and convenience. The first exception is the set of reflexive assignment operators (+=, -=, *=, and so on); the second exception is the following set: = .. ... ! not && and || or != !~.

• As in most (though not all) modern languages, Boolean operations are always short-circuited; that is, the evaluation of a Boolean expression stops as soon as its truth value is known. In a sequence of or operations, the first true will stop evaluation; in a string of and operations, the first false will stop evaluation.

• The prefix @@ is used for class variables (which are associated with the class rather than the instance).

• loop is not a keyword; it is a Kernel method, not a control structure.

• Some may find the syntax of unless-else to be slightly unintuitive. Because unless is the opposite of if, the else clause will be executed if the condition is true.

• The simpler Fixnum type is passed as an immediate value and therefore may not be changed from within methods. The same is true for true, false, and nil.

• Do not confuse the && and || operators with the & and | operators. These are used as in C; the former are for Boolean operations, and the latter are for arithmetic or bitwise operations.

• The and-or operators have lower precedence than the &&-|| operators. See the following code fragment:

a = true
b = false
c = true
d = true
a1 = a && b or c && d # &&'s are done first
a2 = a && (b or c) && d # or is done first
puts a1 # Prints false
puts a2 # Prints true

• Additionally, be aware that the assignment “operator” has a higher precedence than the and and or operators! (This is also true for the reflexive assignment operators +=, -=, and the others.) For example, in the following code, x = y or z looks like a normal assignment statement, but it is really a freestanding expression (equivalent to (x=y) or z, in fact). The third section shows a real assignment statement, x = (y or z), which may be what the programmer really intended.

y = false
z = true

x = y or z # = is done BEFORE or!
puts x # Prints false

(x = y) or z # Line 5: Same as previous
puts x # Prints false

x = (y or z) # or is done first
puts x # Prints true

• Don’t confuse object attributes and local variables. If you are accustomed to C++ or Java, you might forget this. The variable @my_var is an instance variable (or attribute) in the context of whatever class you are coding, but my_var, used in the same circumstance, is only a local variable within that context.

• Many languages have some kind of for loop, as does Ruby. The question sooner or later arises as to whether the index variable can be modified. Some languages do not allow the control variable to be modified at all (printing a warning or error either at compile time or runtime), and some will cheerfully allow the loop behavior to be altered in midstream by such a change. Ruby takes yet a third approach. When a variable is used as a for loop control variable, it is an ordinary variable and can be modified at will; however, such a modification does not affect the loop behavior! The for loop sequentially assigns the values to the variable on each iteration without regard for what may have happened to that variable inside the loop. For example, this loop will execute exactly ten times and print the values 1 through 10:

for var in 1..10
puts "var = #{var}"
if var > 5
var = var + 2
end
end

• Variable names and method names are not always distinguishable “by eye” in the immediate context. How does the parser decide whether an identifier is a variable or a method? The rule is that if the parser sees the identifier being assigned a value prior to its being used, it will be considered a variable; otherwise, it is considered to be a method name. (Note also that the assignment does not have to be executed but only seen by the interpreter.)

1.5.3 Ruby’s case Statement

Every modern language has some kind of multiway branch, such as the switch statement in C, C++, and Java or the case statement in Pascal. These serve basically the same purpose, and function much the same in most languages.

Ruby’s case statement is similar to these others, but on closer examination, it has some unique features. While it works somewhat intuitively in most cases, it has no preciase analogue in other well-known languages. As a result, we cover a few edge cases here for the sake of completeness.

We have already seen the syntax of this statement. We will concentrate here on its actual semantics:

• To begin with, consider the trivial case statement shown here. The expression shown is compared with the value, not surprisingly, and if they correspond, some_action is performed:

case expression
when value
some_action
end

Ruby uses the special operator === (called the relationship operator) for this. This operator is also referred to (somewhat inappropriately) as the case equality operator. We say “inappropriately” because it does not always denote equality.

Thus, the preceding simple statement is equivalent to this statement:

if value === expression
some_action
end

• However, do not confuse the relationship operator with the equality operator (==). They are utterly different, although their behavior may be the same in many circumstances. The relationship operator is defined differently for different classes and, for a given class, may behave differently for different operand types passed to it.

• Do not fall into the trap of thinking that the tested expression is the receiver and the value is passed as a parameter to it. The opposite is true (as we saw previously).

• This points up the fact that x === y is not typically the same as y === x! There will be situations in which this is true, but overall the relationship operator is not commutative. (That is why we do not favor the term case equality operator, because equality is always commutative.) In other words, reversing our original example, the following code does not behave the same way:

case value
when expression
some_action
end

• As an example, consider a string str and a pattern (regular expression) pat, which matches that string. The expression str =~ pat is true, just as in Perl. Because Ruby defines the opposite meaning for =~ in Regexp, you can also say that pat =~ str is true. Following this logic further, we find that (because of how Regexp defines ===) pat === str is also true. However, note that str === pat is not true. This means that the code fragment:

case "Hello"
when /Hell/
puts "We matched."
else
puts "We didn't match."
end

does not do the same thing as this fragment:

case /Hell/
when "Hello"
puts "We matched."
else
puts "We didn't match."
end

If this confuses you, just memorize the behavior. If it does not confuse you, so much the better.

• Programmers accustomed to C may be puzzled by the absence of break statements in the case statement; such a usage of break in Ruby is unnecessary (and illegal). This is due to the fact that “falling through” is rarely the desired behavior in a multiway branch. There is an implicit jump from each when-clause (or case limb, as it is sometimes called) to the end of the case statement. In this respect, Ruby’s case statement resembles the one in Pascal.

• The values in each case limb are essentially arbitrary. They are not limited to any certain type. They need not be constants but can be variables or complex expressions. Ranges or multiple values can be associated with each case limb.

• Case limbs may have empty actions (null statements) associated with them. The values in the limbs need not be unique but may overlap. Look at this example:

case x
when 0
when 1..5
puts "Second branch"
when 5..10
puts "Third branch"
else
puts "Fourth branch"
end

Here, a value of 0 for x will do nothing; a value of 5 will print Second branch, even though 5 is also included in the next limb.

• The fact that case limbs may overlap is a consequence of the fact that they are evaluated in sequence and that short-circuiting is done. In other words, if evaluation of the expressions in one limb results in success, the limbs that follow are never evaluated. Therefore, it is a bad idea for case limb expressions to have method calls that have side effects. (Of course, such calls are questionable in most circumstances anyhow.) Also, be aware that this behavior may mask runtime errors that would occur if expressions were evaluated. Here is an example:

case x
when 1..10
puts "First branch"
when foobar() # Possible side effects?
puts "Second branch"
when 5/0 # Dividing by zero!
puts "Third branch"
else
puts "Fourth branch"
end

As long as x is between 1 and 10, foobar() will not be called, and the expression 5/0 (which would naturally result in a runtime error) will not be evaluated.

1.5.4 Rubyisms and Idioms

Much of this material overlaps conceptually with the preceding pages. Don’t worry too much about why we divided it as we did; many of these tidbits were difficult to classify or organize. Our most important motivation was simply to break the information into digestible chunks.

Ruby was designed to be consistent and orthogonal. But it is also complex, and so, like every language, it has its own set of idioms and quirks. We discuss some of these in the following list:

• alias can be used to give alternate names for global variables and methods.

• The numbered global variables $1, $2, $3, and so on, cannot be aliased.

• We do not recommend the use of the “special variables,” such as $_, $$, and the rest. Though they can sometimes make code more compact, they rarely make it any clearer; we use them sparingly in this book and recommend the same practice. If needed, they can be aliased to longer, readable names such as $LAST_READ_LINE or $PROCESS_ID by using require 'English'.

• Do not confuse the .. and ... range operators. The former is inclusive of the upper bound, and the latter is exclusive. For example, 5..10 includes the number 10, but 5...10 does not.

• There is a small detail relating to ranges that may cause confusion. Given the range m..n, the method end will return the endpoint of the range; its alias last will do the same thing. However, these methods will return the same value (n) for the range m...n, even though n is not included in the latter range. The method end_excluded? is provided to distinguish between these two situations.

• Do not confuse ranges with arrays. These two assignments are entirely different:

x = 1..5
x = [1, 2, 3, 4, 5]

However, there is a convenient method (to_a) for converting ranges to arrays. (Many other classes also have such a method.)

• Often we want to assign a variable a value only if it does not already have a value. Because an unassigned variable has the value nil, we can, for example, shorten x = x || 5 to x ||= 5. Beware that the value false will be overwritten just as nil will.

• In most languages, swapping two variables takes an additional temporary variable. In Ruby, multiple assignment makes this unnecessary. For example, x, y = y, x will interchange the values of x and y.

• Keep a clear distinction in your mind between class and instance. For example, a class variable such as @@foobar has a classwide scope, but an instance variable such as @foobar has a separate existence in each object of the class.

• Similarly, a class method is associated with the class in which it is defined; it does not belong to any specific object and cannot be invoked as though it did. A class method is invoked with the name of a class, and an instance method is invoked with the name of an object.

• In writing about Ruby, the pound notation is sometimes used to indicate an instance method—for example, we use File.chmod to denote the class method chmod of class File, and we use File#chmod to denote the instance method that has the same name. This notation is not part of Ruby syntax but only Ruby folklore. We have tried to avoid it in this book.

• In Ruby, constants are not truly constant. They cannot be changed from within instance methods, but otherwise their values can be changed.

• In writing about Ruby, the word toplevel is common as both an adjective and a noun. We prefer to use top level as a noun and top-level as an adjective, but our meaning is the same as everyone else’s.

• The keyword yield comes from CLU and may be misleading to some programmers. It is used within an iterator to invoke the block with which the iterator is called. It does not mean “yield,” as in producing a result or returning a value, but is more like the concept of “yielding a timeslice.”

• The reflexive assignment operators +=, -=, and the rest are not methods (nor are they really operators); they are only “syntax sugar” or “shorthand” for their longer forms. Therefore, to say x += y is really identical to saying x = x + y, and if the + operator is overloaded, the +=operator is defined “automagically” as a result of this predefined shorthand.

• Because of the way the reflexive assignment operators are defined, they cannot be used to initialize variables. If the first reference to x is x += 1, an error will result. This will be intuitive to most programmers unless they are accustomed to a language where variables are initialized to some sort of zero or null value.

• It is actually possible in some sense to get around this behavior. One can define operators for nil such that the initial nil value of the variable produces the result we want. Here is a nil.+ method that will allow += to initialize a String or a Fixnum value, basically just returning other and thus ensuring that nil + other is equal to other:

def nil.+(other)
other
end

This illustrates the power of Ruby—but in general it’s not useful or appropriate to code this way.

• It is wise to recall that Class is an object and Object is a class. We will try to make this clear in a later chapter; for now, simply recite it every day as a mantra.

• Some operators can’t be overloaded because they are built into the language rather than implemented as methods. These are = .. ... and or not && || ! != !~.

Additionally, the reflexive assignment operators (+=, -=, and so on) cannot be overloaded. These are not methods, and it can be argued they are not true operators either.

• Be aware that although assignment is not overloadable, it is still possible to write an instance method with a name such as foo= (allowing statements such as x.foo = 5). Consider the equal sign to be like a suffix.

• Recall that a “bare” scope operator has an implied Object before it; therefore, ::Foo means Object::Foo.

• Recall that fail is an alias for raise.

• Recall that definitions in Ruby are executed. Because of the dynamic nature of the language, it is possible (for example) to define two methods completely differently based on a flag that is tested at runtime.

• Remember that the for construct (for x in a) is really calling the default iterator each. Any class having this iterator can be walked through with a for loop.

• Be aware that a method defined at the top level is added to Kernel and is therefore a member of Object.

• A setter method (such as foo=) must be called with a receiver; otherwise, it will look like a simple assignment to a local variable of that name.

• The keyword retry is used only in exception handling. (In older versions of Ruby, it was used in iterators as well.)

• An object’s initialize method is always private.

• Where a block ends in a left brace (or in end) and results in a value, that value can be used as the receiver for further method calls. Here is an example:

squares = [1,2,3,4,5].collect {|x| x**2 }.reverse
# squares is now [25,16,9,4,1]

• The idiom if $0 == __FILE__ is sometimes seen near the bottom of a Ruby program. This is a check to see whether the file is being run as a standalone piece of code (true) or is being used as some kind of auxiliary piece of code such as a library (false). A common use of this is to put a sort of “main program” (usually with test code in it) at the end of a library.

• Normal subclassing or inheritance is done with the < symbol:

class Dog < Animal
# ...
end

But creation of a singleton class (an anonymous class that extends a single instance) is done with the << symbol:

class << platypus
# ...
end

• When passing a block to an iterator, there is a slight difference between braces ({}) and a do-end pair. This is a precedence issue:

mymethod param1, foobar do ... end
# Here, do-end binds with mymethod

mymethod param1, foobar { ... }
# Here, {} binds with foobar, assumed to be a method

• It is somewhat traditional in Ruby to put single-line blocks in braces and multiline blocks in do-end pairs. Here are some examples:

my_array.each { |x| puts x }

my_array.each do |x|
print x
if x % 2 == 0
puts " is even."
else
puts " is odd."
end
end

This habit is not required, and there may be occasions where it is inappropriate to follow this rule.

• A closure remembers the context in which it was created. One way to create a closure is by using a Proc object. As a crude example, consider the following:

def power(exponent)
proc {|base| base**exponent}
end

square = power(2)
cube = power(3)

a = square.call(11) # Result is 121
b = square.call(5) # Result is 25
c = cube.call(6) # Result is 216
d = cube.call(8) # Result is 512

Observe that the closure “knows” the value of exponent that it was given at the time it was created.

• However, let’s assume that a closure uses a variable defined in an outer scope (which is perfectly legal). This property can be useful, but here we show a misuse of it:

$exponent = 0

def power
proc {|base| base**$exponent}
end

$exponent = 2
square = power

$exponent = 3
cube = power

a = square.call(11) # Wrong! Result is 1331

b = square.call(5) # Wrong! Result is 125

# The above two results are wrong because the CURRENT
# value of $exponent is being used. This would be true
# even if it had been a local variable that had gone
# out of scope (e.g., using define_method).

c = cube.call(6) # Result is 216
d = cube.call(8) # Result is 512

• Finally, consider this somewhat contrived example. Inside the block of the times iterator, a new context is started so that x is a local variable. The variable closure is already defined at the top level, so it will not be defined as local to the block.

closure = nil # Define closure so the name will
be known
1.times do # Start a new context
x = 5 # x is local to this block
closure = Proc.new { puts "In closure, x = #{x}" }
end

x = 1

# Define x at top level

closure.call # Prints: In closure, x = 5

Now note that the variable x that is set to 1 is a new variable, defined at the top level. It is not the same as the other variable of the same name. The closure therefore prints 5 because it remembers its creation context with the previous variable x and its previous value.

• Variables starting with a single @, defined inside a class, are generally instance variables. However, if they are defined outside any method, they are really class instance variables. (This usage is somewhat contrary to most OOP terminology in which a class instance is regarded to be the same as an instance or an object.) Here is an example:

class MyClass

@x = 1 # A class instance variable
@y = 2 # Another one

def my_method
@x = 3 # An instance variable
# Note that @y is not accessible here.
end

end

The class instance variable @y in the preceding code example is really an attribute of the class object MyClass, which is an instance of the class Class. (Remember, Class is an object, and Object is a class.) Class instance variables cannot be referenced from within instance methods and, in general, are not very useful.

• attr, attr_reader, attr_writer, and attr_accessor are shorthand for the actions of defining “setters” and “getters”; they take symbols as arguments (that is, instances of class Symbol).

• There is never any assignment with the scope operator; for example, the assignment Math::PI = 3.2 is illegal.

1.5.5 Expression Orientation and Other Miscellaneous Issues

In Ruby, expressions are nearly as significant as statements. If you are a C programmer, this will be somewhat familiar to you; if your background is in Pascal, it may seem utterly foreign. But Ruby carries expression orientation even further than C.

In addition, we use this section to remind you of a couple of minor issues regarding regular expressions. Consider them to be a small bonus:

• In Ruby, any kind of assignment returns the same value that was assigned. Therefore, we can sometimes take little shortcuts like the ones shown here, but be careful when you are dealing with objects! Remember that these are nearly always references to objects.

x = y = z = 0 # All are now zero.

a = b = c = [] # Danger! a, b, and c now all refer
# to the SAME empty array.
x = 5
y = x += 2 # Now x and y are both 7

However, remember that values such as Fixnums are actually stored as immediate values, not as object references.

• Many control structures return values—if, unless, and case. The following code is all valid; it demonstrates that the branches of a decision need not be statements but can simply be expressions:

a = 5
x = if a < 8 then 6 else 7 end # x is now 6

y = if a < 8 # y is 6 also; the
6 # if-statement can be
else # on a single line
7 # or on multiple lines.
end

# unless also works; z will be assigned 4
z = unless x == y then 3 else 4 end

t = case a # t gets assigned
when 0..3 # the value
"low" # "medium"
when 4..6
"medium"
else
"high"
end

Here, we indent as though the case started with the assignment. This looks proper to our eyes, but you may disagree.

• Note by way of contrast that the while and until loops do not return useful values but typically return nil:

i = 0
x = while (i < 5) # x is nil
puts i+=1
end

• The ternary decision operator can be used with statements or expressions. For syntactic reasons (or parser limitations), the parentheses here are necessary:

x = 6
y = x == 5 ? 0 : 1 # y is now 1
x == 5 ? puts("Hi") : puts("Bye") # Prints Bye

• The return at the end of a method can be omitted. A method always returns the last expression evaluated in its body, regardless of where that happens.

• When an iterator is called with a block, the last expression evaluated in the block is returned as the value of the block. Therefore, if the body of an iterator has a statement such as x = yield, that value can be captured.

• Recall that the multiline modifier /m can be appended to a regex, in which case . (dot) will match a newline character.

• Beware of zero-length matches in regular expressions. If all elements of a regex are optional, then “nothingness” will match that pattern, and a match will always be found at the beginning of a string. This is a common error for regex users, particularly novices.

1.6 Ruby Jargon and Slang

You don’t have to relearn English when you learn Ruby. But certain pieces of jargon and slang are commonly used in the community. Some of these may be used in slightly different ways from the rest of the computer science world. Most of these are discussed in this section.

In Ruby, the term attribute is used somewhat unofficially. We can think of an attribute as being an instance variable that is exposed to the outside world via one of the attr family of methods. This is a gray area because we could have methods such as foo and foo= that don’t correspond to @foo as we would expect. And certainly not all instance variables are considered attributes. As always, common sense should guide your usage.

Attributes in Ruby can be broken down into readers and writers (called getters and setters in some languages—terms we don’t commonly use). An accessor is both a reader and a writer; this is consistent with the name of the attr_accessor method but disagrees with common usage in other communities where an accessor is read-only.

The operator === is unique to Ruby (as far as I am aware). The common name for it is the case comparison operator because it is used implicitly by case statements. In this book, I often use the term relationship operator. I did not invent this term, but I can’t find its origin, and it is not in common use today. It is sometimes also called the threequal operator (“three equal signs”).

The <=> operator is probably best called the comparison operator. It is commonly called the spaceship operator because it looks like a side view of a flying saucer in an old-fashioned video game or text-based computer game.

The term poetry mode is used by some to indicate the omission of needless punctuation and tokens (a tongue-in-cheek reference to the punctuation habits of poets in the last six decades or so). Poetry mode is often taken to mean “omission of parentheses around method calls.” Here is an example:

some_method(1, 2, 3) # unneeded parentheses
some_method 1, 2, 3 # "poetry mode"

But I think the principle is more general than that. For example, when a hash is passed as the last or lone parameter, the braces may be omitted. At the end of a line, the semicolon may be omitted (and really always is). In most cases, the keyword then may be omitted, whether in ifstatements or case statements.

Some coders even go so far as to omit parentheses in a method definition, though most do not:

def my_method(a, b, c) # Also legal: def my_method a, b, c
# ...
end

It is worth noting that in some cases, the complexity of the Ruby grammar causes the parser to be confused easily. When method calls are nested, it is better to use parentheses for clarity:

def alpha(x)
x*2
end

def beta(y)
y*3
end

gamma = 5
delta = alpha(beta(gamma))
delta = alpha beta gamma # same, but less clear

The term duck typing, as far as I can tell, originated with Dave Thomas. It refers to the old saying that if something looks like a duck, walks like a duck, and quacks like a duck, it might as well be a duck. Exactly what this term means may be open to discussion; I would say that it refers to the tendency of Ruby to be less concerned with the class of an object and more concerned with what methods can be called on it and what operations can be performed on it. Therefore, in Ruby we rarely use is_a? or kind_of?, but we more often use the respond_to? method. Most often of all, we simply pass an object to a method and expect that an exception will be raised if it is used inappropriately. That usually happens sooner rather than later, but the exceptions that are raised may be hard to understand and debug quickly.

The unary asterisk that is used to expand an array could be called an array expansion operator, but I don’t think I have ever heard that. Terms such as star and splat are inevitable in casual conversation, along with derivatives such as splatted and unsplatted. David Black, author of The Well-Grounded Rubyist, cleverly calls this the unary unarray operator.

The term singleton is sometimes regarded as overused. It is useful to remember that this is a perfectly good English word in its own right—referring to a thing that is solitary or unique. As long as we use it as a modifier, there should be no confusion.

The Singleton Pattern is a well-known design pattern in which a class allows itself to be instantiated only once; the singleton library in Ruby facilitates the use of this pattern.

A singleton class (called an eigenclass by some people) is the instance of Class where methods are stored that are “per object” rather than “per class.” It is arguably not a “true” class because it cannot be instantiated. The following is an example of opening up the singleton class for a string object and adding a method:

str = "hello"
class << str # Alternatively:
def hyphenated # def str.hyphenated
self.split("").join("-")
end
end

str.hyphenated # "h-e-l-l-o"

Let’s go back to our previous example. Because the method hyphenate exists in no other object or class, it is a singleton method on that object. This also is unambiguous. Sometimes the object itself will be called a singleton because it is one of a kind—it is the only instance of that class.

But remember that in Ruby, a class is itself an object. Therefore, we can add a method to the singleton class of a class, and that method will be unique to that object, which happens to be a class. Here is an example:

class MyClass
class << self # Alternatively: def self.hello
def hello # or: def MyClass.hello
puts "Hello from #{self}!"
end
end
end

So we don’t have to instantiate MyClass to call this method:

MyClass.hello # Hello from MyClass!

However, you will notice that this is simply what we call a class method in Ruby. In other words, a class method is a singleton method on a class. We could also say it’s a singleton method on an object that happens to be a class.

There are a few more terms to cover. A class variable is one that starts with a double-@, of course; it is perhaps a slight misnomer because of its nontrivial behavior with regard to inheritance. A class instance variable is a somewhat different animal. It is an ordinary instance variable where the object it belongs to happens to be a class. For more information, see Chapter 11, “OOP and Dynamic Features in Ruby.”

Especially since the advent of Ruby on Rails, many have used the term “monkey-patching”; this refers to the reopening of a class (especially a system class) in order to add (or override) methods or other features. I don’t use this term in this book or elsewhere because it is a disparaging term which originated outside our community. Open classes in Ruby are a feature, not a bug. They can be used properly and safely, or they can be used improperly, just like any other language feature.

1.7 Conclusion

That ends our review of object-oriented programming and our whirlwind tour of the Ruby language. Later chapters will expand on this material greatly.

Although it was not my intention to “teach Ruby” to the beginner in this chapter, it is possible that the beginner might pick it up here anyhow. (Several people have reported to me that they learned Ruby from the first or second edition of this book.) However, the later material in the book should be useful to the beginning and intermediate Rubyist alike. It is my hope that even the advanced Ruby programmer may still gain some new knowledge here and there.