Reflection, ObjectSpace, and Distributed Ruby - Ruby Crystallized - Programming Ruby 1.9 & 2.0: The Pragmatic Programmers’ Guide (2013)

Programming Ruby 1.9 & 2.0: The Pragmatic Programmers’ Guide (2013)

Part 3. Ruby Crystallized

Chapter 25. Reflection, ObjectSpace, and Distributed Ruby

One of the advantages of dynamic languages such as Ruby is the ability to introspect —to examine aspects of a program from within the program itself. This is also called reflection .

When people introspect, we think about our thoughts and feelings. This is interesting, because we’re using thought to analyze thought. It’s the same when programs use introspection—a program can discover the following information about itself:

  • What objects it contains
  • Its class hierarchy
  • The attributes and methods of objects
  • Information on methods

Armed with this information, we can look at particular objects and decide which of their methods to call at runtime—even if the class of the object didn’t exist when we first wrote the code. We can also start doing clever things, perhaps modifying the program while it’s running. Later in this chapter we’ll look at distributed Ruby and marshaling, two reflection-based technologies that let us send objects around the world and through time.

25.1 Looking at Objects

Have you ever craved the ability to traverse all the living objects in your program? We have! Ruby lets you perform this trick with ObjectSpace.each_object. We can use it to do all sorts of neat tricks.

For example, to iterate over all objects of type Complex, you’d write the following:

a = Complex(1, 2)

b = Complex(99, -100)

ObjectSpace.each_object(Complex) {|x| puts x }

Produces:

0+1i

99-100i

1+2i

Where did that extra number, (0+1i), come from? We didn’t define it in our program. Well, the Complex class defines a constant for I, the square root of -1. Since we are examining all living objects in the system, these turn up as well.

Let’s try the same example with different values. This time, they’re objects of type Fixnum:

a = 102

b = 95

ObjectSpace.each_object(Fixnum) {|x| p x }

(Produces no output.)

Neither of the Fixnum objects we created showed up. That’s because ObjectSpace doesn’t know about objects with immediate values: Fixnum, Symbol, true, false, nil, and (on 64-bit platforms) most Floats.«2.0»

Looking Inside Objects

Once you’ve found an interesting object, you may be tempted to find out just what it can do. Unlike static languages, where a variable’s type determines its class, and hence the methods it supports, Ruby supports liberated objects. You really cannot tell exactly what an object can do until you look under its hood.[114] We talk about this in Chapter 23, Duck Typing.

For instance, we can get a list of all the methods to which an object will respond (these include methods in an object’s class and that class’s ancestors):

r = 1..10 # Create a Range object

list = r.methods

list.length # => 111

list[0..3] # => [:==, :===, :eql?, :hash]

We can check to see whether an object responds to a particular method:

r = 1..10

r.respond_to?("frozen?") # => true

r.respond_to?(:has_key?) # => false

"me".respond_to?("==") # => true

We can ask for an object’s class and unique object ID and test its relationship to other classes:

num = 1

num.object_id # => 3

num.class # => Fixnum

num.kind_of? Fixnum # => true

num.kind_of? Numeric # => true

num.instance_of? Fixnum # => true

num.instance_of? Numeric # => false

25.2 Looking at Classes

Knowing about objects is one part of reflection, but to get the whole picture, you also need to be able to look at classes—the methods and constants that they contain.

Looking at the class hierarchy is easy. You can get the parent of any particular class using Class#superclass. For classes and modules, the Module#ancestors method lists both superclasses and mixed-in modules:

klass = Fixnum

begin

print klass

klass = klass.superclass

print " < " if klass

endwhile klass

puts

p Fixnum.ancestors

Produces:

Fixnum < Integer < Numeric < Object < BasicObject

[Fixnum, Integer, Numeric, Comparable, Object, Kernel, BasicObject]

If you want to build a complete class hierarchy, just run that code for every class in the system. We can use ObjectSpace to iterate over all Class objects:

ObjectSpace.each_object(Class) do |klass|

# ...

end

Looking Inside Classes

We can find out a bit more about the methods and constants in a particular object. We can ask for methods by access level, and we can ask for just singleton methods. We can also take a look at the object’s constants, local, and instance variables:

class Demo

@@var = 99

CONST = 1.23

private

def private_method

end

protected

def protected_method

end

public

def public_method

@inst = 1

i = 1

j = 2

local_variables

end

def Demo.class_method

end

end

Demo.private_instance_methods(false) # => [:private_method]

Demo.protected_instance_methods(false) # => [:protected_method]

Demo.public_instance_methods(false) # => [:public_method]

Demo.singleton_methods(false) # => [:class_method]

Demo.class_variables # => [:@@var]

Demo.constants(false) # => [:CONST]

demo = Demo.new

demo.instance_variables # => []

# Get 'public_method' to return its local variables

# and set an instance variable

demo.public_method # => [:i, :j]

demo.instance_variables # => [:@inst]

You may be wondering what all the false parameters were in the previous code. As of Ruby 1.8, these reflection methods will by default recurse into parent classes, their parents, and so on, up the ancestor chain. Passing in false stops this kind of prying.

Given a list of method names, we may now be tempted to try calling them. Fortunately, that’s easy with Ruby.

25.3 Calling Methods Dynamically

The Object#send method lets you tell any object to invoke a method by name.

"John Coltrane".send(:length) # => 13

"Miles Davis".send("sub", /iles/, '.') # => "M. Davis"

Another way of invoking methods dynamically uses Method objects. A Method object is like a Proc object: it represents a chunk of code and a context in which it executes. In this case, the code is the body of the method, and the context is the object that created the method. Once we have our Method object, we can execute it sometime later by sending it the message call :

trane = "John Coltrane".method(:length)

miles = "Miles Davis".method("sub")

trane.call # => 13

miles.call(/iles/, '.') # => "M. Davis"

You can pass the Method object around as you would any other object, and when you invoke Method#call, the method is run just as if you had invoked it on the original object. It’s like having a C-style function pointer but in a fully object-oriented style.

You can use Method objects where you could use proc objects. For example, they work with iterators:

def double(a)

2*a

end

method_object = method(:double)

[ 1, 3, 5, 7 ].map(&method_object) # => [2, 6, 10, 14]

Method objects are bound to one particular object. You can create unbound methods (of class UnboundMethod) and then subsequently bind them to one or more objects. The binding creates a new Method object. As with aliases, unbound methods are references to the definition of the method at the time they are created:

unbound_length = String.instance_method(:length)

class String

def length

99

end

end

str = "cat"

str.length # => 99

bound_length = unbound_length.bind(str)

bound_length.call # => 3

Because good things come in threes, here’s yet another way to invoke methods dynamically. The eval method (and its variations such as class_eval , module_eval , and instance_eval ) will parse and execute an arbitrary string of legal Ruby source code.

trane = %q{"John Coltrane".length}

miles = %q{"Miles Davis".sub(/iles/, '.')}

eval trane # => 13

eval miles # => "M. Davis"

When using eval , it can be helpful to state explicitly the context in which the expression should be evaluated, rather than using the current context. You obtain a context using Object#binding at the desired point:

def get_a_binding

val = 123

binding

end

val = "cat"

the_binding = get_a_binding

eval("val", the_binding) # => 123

eval("val") # => "cat"

The first eval evaluates val in the context of the binding as it was when the method get_a_binding was executing. In this binding, the variable val had a value of 123. The second eval evaluates eval in the top-level binding, where it has the value "cat".

Performance Considerations

As we’ve seen in this section, Ruby gives us several ways to invoke an arbitrary method of some object: Object#send, Method#call, and the various flavors of eval .

You may prefer to use any one of these techniques depending on your needs, but be aware that, as the following benchmark shows, eval is significantly slower than the others (or, for optimistic readers, send and call are significantly faster than eval ).

require 'benchmark'

include Benchmark

test = "Stormy Weather"

m = test.method(:length)

n = 100000

bm(12) do |x|

x.report("call") { n.times { m.call } }

x.report("send") { n.times { test.send(:length) } }

x.report("eval") { n.times { eval "test.length" } }

end

Produces:

user system total real

call 0.020000 0.000000 0.020000 ( 0.022150)

send 0.020000 0.000000 0.020000 ( 0.019678)

eval 1.230000 0.000000 1.230000 ( 1.237393)

25.4 System Hooks

A hook is a technique that lets you trap some Ruby event, such as object creation. Let’s take a look at some common Ruby hook techniques.

Hooking Method Calls

The simplest hook technique in Ruby is to intercept calls to methods in system classes. Perhaps you want to log all the operating system commands your program executes. Simply rename the method Kernel.system, and substitute it with one of your own that both logs the command and calls the original Kernel method:

class Object

alias_method :old_system, :system

def system(*args)

old_system(*args).tap do |result|

puts "system(#{args.join(', ')}) returned #{result.inspect}"

end

end

end

system("date")

system("kangaroo", "-hop 10", "skippy")

Produces:

Mon May 27 12:31:42 CDT 2013

system(date) returned true

system(kangaroo, -hop 10, skippy) returned nil

The problem with this technique is that you’re relying on there not being an existing method called old_system . A better alternative is to make use of method objects, which are effectively anonymous:

class Object

old_system_method = instance_method(:system)

define_method(:system) do |*args|

old_system_method.bind(self).call(*args).tap do |result|

puts "system(#{args.join(', ')}) returned #{result.inspect}"

end

end

end

system("date")

system("kangaroo", "-hop 10", "skippy")

Produces:

Mon May 27 12:31:43 CDT 2013

system(date) returned true

system(kangaroo, -hop 10, skippy) returned nil

Ruby 2.0 gives us a new way of doing this. Modules can be used to include new instance methods in some other module or class. Until now, these methods were added behind the host module or class’s own methods—if the module defined a method with the same name as one in the host, the host method would be called. Ruby 2 adds the prepend method to modules. This lets you insert the module’s methods before the host’s. Within the module’s methods, calling super calls the host’s method of the same name. This gives us:«2.0»

module SystemHook

private

def system(*args)

super.tap do |result|

puts "system(#{args.join(', ')}) returned #{result.inspect}"

end

end

end

class Object

prepend SystemHook

end

system("date")

system("kangaroo", "-hop 10", "skippy")

Produces:

Mon May 27 12:31:43 CDT 2013

system(date) returned true

system(kangaroo, -hop 10, skippy) returned nil

Object Creation Hooks

Ruby lets you get involved when objects are created. If you can be present when every object is born, you can do all sorts of interesting things: you can wrap them, add methods to them, remove methods from them, and add them to containers to implement persistence—you name it. We’ll show a simple example here. We’ll add a timestamp to every object as it’s created. First, we’ll add a timestamp attribute to every object in the system. We can do this by hacking class Object itself:

class Object

attr_accessor :timestamp

end

Then, we need to hook object creation to add this timestamp. One way to do this is to do our method-renaming trick on Class#new, the method that’s called to allocate space for a new object. The technique isn’t perfect—some built-in objects, such as literal strings, are constructed without calling new —but it’ll work just fine for objects we write.

class Class

old_new = instance_method :new

define_method :new do |*args, &block|

result = old_new.bind(self).call(*args, &block)

result.timestamp = Time.now

result

end

end

Finally, we can run a test. We’ll create a couple of objects a few milliseconds apart and check their timestamps:

class Test

end

obj1 = Test.new

sleep(0.002)

obj2 = Test.new

obj1.timestamp.to_f # => 1369675903.251721

obj2.timestamp.to_f # => 1369675903.2541282

25.5 Tracing Your Program’s Execution

While we’re having fun reflecting on all the objects and classes in our programs, let’s not forget about the humble statements that make our code actually do things. It turns out that Ruby lets us look at these statements, too.

First, you can watch the interpreter as it executes code. In older Rubies, you use set_trace_func , while in Ruby 2«2.0» you use the TracePoint class. Both are used to execute a proc with all sorts of juicy debugging information whenever a new source line is executed, methods are called, objects are created, and so on.

The reference section contains full descriptions of set_trace_func and TracePoint, but here’s a taste:

class Test

def test

a = 1

end

end

TracePoint.trace do |tp|

p tp

end

t = Test.new

t.test

Produces:

#<TracePoint:c_return `trace'@prog.rb:7>

#<TracePoint:line@prog.rb:10>

#<TracePoint:c_call `new'@prog.rb:10>

#<TracePoint:c_call `initialize'@prog.rb:10>

#<TracePoint:c_return `initialize'@prog.rb:10>

#<TracePoint:c_return `new'@prog.rb:10>

#<TracePoint:line@prog.rb:11>

#<TracePoint:call `test'@prog.rb:2>

#<TracePoint:line@prog.rb:3 in `test'>

#<TracePoint:return `test'@prog.rb:4>

The method trace_var (described in the reference section) lets you add a hook to a global variable; whenever an assignment is made to the global, your proc is invoked.

How Did We Get Here?

That’s a fair question...one we ask ourselves regularly. Mental lapses aside, in Ruby you can find out “how you got there” using the method caller , which returns an array of strings representing the current call stack:

def cat_a

puts caller

end

def cat_b

cat_a

end

def cat_c

cat_b

end

cat_c

Produces:

prog.rb:5:in `cat_b'

prog.rb:8:in `cat_c'

prog.rb:10:in `<main>'

Ruby 1.9 also introduces __callee__ , which returns the name of the current method.

Source Code

Ruby executes programs from plain old files. You can look at these files to examine the source code that makes up your program using one of a number of techniques.

The special variable __FILE__ contains the name of the current source file. This leads to a fairly short (if cheating) Quine—a program that outputs its own source code:

print File.read(__FILE__)

Produces:

print File.read(__FILE__)

As we saw in the previous section, the method Object#caller returns the call stack as a list. Each entry in this list starts off with a filename, a colon, and a line number in that file. You can parse this information to display source. In the following example, we have a main program, main.rb, that calls a method in a separate file, sub.rb. That method in turns invokes a block, where we traverse the call stack and write out the source lines involved. Notice the use of a hash of file contents, indexed by the filename.

Here’s the code that dumps out the call stack, including source information:

ospace/caller/stack_dumper.rb

def dump_call_stack

file_contents = {}

puts "File Line Source Line"

puts "---------------+----+------------"

caller.each do |position|

nextunless position =~ /\A(.*?):(\d+)/

file = $1

line = Integer($2)

file_contents[file] ||= File.readlines(file)

printf("%-15s:%3d - %s", File.basename(file), line,

file_contents[file][line-1].lstrip)

end

end

The (trivial) file sub.rb contains a single method:

ospace/caller/sub.rb

def sub_method(v1, v2)

main_method(v1*3, v2*6)

end

The following is the main program, which invokes the stack dumper after being called back by the submethod.

require_relative 'sub'

require_relative 'stack_dumper'

def main_method(arg1, arg2)

dump_call_stack

end

sub_method(123, "cat")

Produces:

File Line Source Line

---------------+----+------------

prog.rb : 5 - dump_call_stack

sub.rb : 2 - main_method(v1*3, v2*6)

prog.rb : 8 - sub_method(123, "cat")

The SCRIPT_LINES__ constant is closely related to this technique. If a program initializes a constant called SCRIPT_LINES__ with a hash, that hash will receive a new entry for every file subsequently loaded into the interpreter using require or load . The entry’s key is the name of the file, and the value is the source of the file as an array of strings.

25.6 Behind the Curtain: The Ruby VM

Ruby 1.9 comes with a new virtual machine, called YARV. As well as being faster than the old interpreter, YARV exposes some of its state via Ruby classes.

If you’d like to know what Ruby is doing with all that code you’re writing, you can ask YARV to show you the intermediate code that it is executing. You can ask it to compile the Ruby code in a string or in a file and then disassemble it and even run it.[115] Here’s a trivial example:

code = RubyVM::InstructionSequence.compile('a = 1; puts 1 + a')

puts code.disassemble

Produces:

== disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========

local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1] s1)

[ 2] a

0000 trace 1 ( 1)

0002 putobject_OP_INT2FIX_O_1_C_

0003 setlocal_OP__WC__0 2

0005 trace 1

0007 putself

0008 putobject_OP_INT2FIX_O_1_C_

0009 getlocal_OP__WC__0 2

0011 opt_plus <callinfo!mid:+, argc:1, ARGS_SKIP>

0013 opt_send_simple <callinfo!mid:puts, argc:1, FCALL|ARGS_SKIP>

0015 leave

Maybe you want to know how Ruby handles #{...} substitutions in strings. Ask the VM.

code = RubyVM::InstructionSequence.compile('a = 1; puts "a = #{a}."')

puts code.disassemble

Produces:

== disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========

local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1] s1)

[ 2] a

0000 trace 1 ( 1)

0002 putobject_OP_INT2FIX_O_1_C_

0003 setlocal_OP__WC__0 2

0005 trace 1

0007 putself

0008 putobject "a = "

0010 getlocal_OP__WC__0 2

0012 tostring

0013 putobject "."

0015 concatstrings 3

0017 opt_send_simple <callinfo!mid:puts, argc:1, FCALL|ARGS_SKIP>

0019 leave

For a full list of the opcodes, print out RubyVM::INSTRUCTION_NAMES.

25.7 Marshaling and Distributed Ruby

Ruby features the ability to serialize objects, letting you store them somewhere and reconstitute them when needed. You can use this facility, for instance, to save a tree of objects that represent some portion of application state—a document, a CAD drawing, a piece of music, and so on.

Ruby calls this kind of serialization marshaling (think of railroad marshaling yards where individual cars are assembled in sequence into a complete train, which is then dispatched somewhere). Saving an object and some or all of its components is done using the method Marshal.dump. Typically, you will dump an entire object tree starting with some given object. Later, you can reconstitute the object using Marshal.load.

Here’s a short example. We have a class Chord that holds a collection of musical notes. We’d like to save away a particularly wonderful chord so we can e-mail it to a couple hundred of our closest friends so they can load it into their copy of Ruby and savor it too. Let’s start with the classes for Note and Chord:

ospace/chord.rb

Note = Struct.new(:value) do

def to_s

value.to_s

end

end

class Chord

def initialize(arr)

@arr = arr

end

def play

@arr.join('-')

end

end

Now we’ll create our masterpiece and use Marshal.dump to save a serialized version to disk:

ospace/chord.rb

c = Chord.new( [ Note.new("G"),

Note.new("Bb"),

Note.new("Db"),

Note.new("E") ] )

File.open("posterity", "w+") do |f|

Marshal.dump(c, f)

end

Finally, our grandchildren read it in and are transported by our creation’s beauty:

chord = Marshal.load(File.open("posterity"))

chord.play # => "G-Bb-Db-E"

Custom Serialization Strategy

Not all objects can be dumped: bindings, procedure objects, instances of class IO, and singleton objects cannot be saved outside the running Ruby environment (a TypeError will be raised if you try). Even if your object doesn’t contain one of these problematic objects, you may want to take control of object serialization yourself.

Marshal provides the hooks you need. In the objects that require custom serialization, simply implement two instance methods: one called marshal_dump , which writes the object out to a string, and one called marshal_load , which reads a string that you had previously created and uses it to initialize a newly allocated object. (In earlier Ruby versions you’d use methods called _dump and _load , but the new versions play better with Ruby’s object allocation scheme.) The instance method marshal_dump should return an object representing the state to be dumped. When the object is subsequently reconstituted using Marshal.load, the method marshal_load will be called with this object and will use it to set the state of its receiver—it will be run in the context of an allocated but not initialized object of the class being loaded.

For instance, here is a sample class that defines its own serialization. For whatever reasons, Special doesn’t want to save one of its internal data members, @volatile. The author has decided to serialize the two other instance variables in an array.

class Special

def initialize(valuable, volatile, precious)

@valuable = valuable

@volatile = volatile

@precious = precious

end

def marshal_dump

[ @valuable, @precious ]

end

def marshal_load(variables)

@valuable = variables[0]

@precious = variables[1]

@volatile = "unknown"

end

def to_s

"#@valuable #@volatile #@precious"

end

end

obj = Special.new("Hello", "there", "World")

puts "Before: obj = #{obj}"

data = Marshal.dump(obj)

obj = Marshal.load(data)

puts "After: obj = #{obj}"

Produces:

Before: obj = Hello there World

After: obj = Hello unknown World

For more details, see the reference section.

YAML for Marshaling

The Marshal module is built into the interpreter and uses a binary format to store objects externally. Although fast, this binary format has one major disadvantage: if the interpreter changes significantly, the marshal binary format may also change, and old dumped files may no longer be loadable.

An alternative is to use a less fussy external format, preferably one using text rather than binary files. One option, supplied as a standard library, is YAML.[116]

We can adapt our previous marshal example to use YAML. Rather than implement specific loading and dumping methods to control the marshal process, we simply define the method to_yaml_properties , which returns a list of instance variables to be saved:

ospace/yaml.rb

require 'yaml'

class Special

def initialize(valuable, volatile, precious)

@valuable = valuable

@volatile = volatile

@precious = precious

end

def to_yaml_properties

%w{ @precious @valuable }

end

def to_s

"#@valuable #@volatile #@precious"

end

end

obj = Special.new("Hello", "there", "World")

puts "Before: obj = #{obj}"

data = YAML.dump(obj)

obj = YAML.load(data)

puts "After: obj = #{obj}"

Produces:

Before: obj = Hello there World

After: obj = Hello World

We can take a look at what YAML creates as the serialized form of the object—it’s pretty simple:

obj = Special.new("Hello", "there", "World")

puts YAML.dump(obj)

Produces:

--- !ruby/object:Special

precious: World

valuable: Hello

Distributed Ruby

Since we can serialize an object or a set of objects into a form suitable for out-of-process storage, we can transmit objects from one process to another. Couple this capability with the power of networking, and voilà—you have a distributed object system. To save you the trouble of having to write the code, we suggest using Masatoshi Seki’s Distributed Ruby library (drb), which is now available as a standard Ruby library.

Using drb, a Ruby process may act as a server, as a client, or as both. A drb server acts as a source of objects, while a client is a user of those objects. To the client, it appears that the objects are local, but in reality the code is still being executed remotely.

A server starts a service by associating an object with a given port. Threads are created internally to handle incoming requests on that port, so remember to join the drb thread before exiting your program:

require 'drb'

class TestServer

def add(*args)

args.inject {|n,v| n + v}

end

end

server = TestServer.new

DRb.start_service('druby://localhost:9000', server)

DRb.thread.join # Don't exit just yet!

A simple drb client simply creates a local drb object and associates it with the object on the remote server; the local object is a proxy:

ospace/drb/drb_client.rb

require 'drb'

DRb.start_service()

obj = DRbObject.new(nil, 'druby://localhost:9000')

# Now use obj

puts "Sum is: #{obj.add(1, 2, 3)}"

The client connects to the server and calls the method add , which uses the magic of inject to sum its arguments. It returns the result, which the client prints out:

Sum is: 6

The initial nil argument to DRbObject indicates that we want to attach to a new distributed object. We could also use an existing object.

Ho hum, you say. This sounds like Java’s RMI or CORBA or whatever. Yes, it is a functional distributed object mechanism—but it is written in just a few hundred lines of Ruby code. No C, nothing fancy, just plain old Ruby code. Of course, it has no naming service, trader service, or anything like you’d see in CORBA, but it is simple and reasonably fast. On a 2.5GHz Power Mac system, this sample code runs at about 1,300 remote message calls per second. And if you do need naming services, DRb has a ring server that might fit the bill.

And, if you like the look of Sun’s JavaSpaces, the basis of the JINI architecture, you’ll be interested to know that drb is distributed with a short module that does the same kind of thing. JavaSpaces is based on a technology called Linda. To prove that its Japanese author has a sense of humor, Ruby’s version of Linda is known as Rinda.

25.8 Compile Time? Runtime? Anytime!

The important thing to remember about Ruby is that there isn’t a big difference between “compile time” and “runtime.” It’s all the same. You can add code to a running process. You can redefine methods on the fly, change their scope from public to private, and so on. You can even alter basic types, such as Class and Object.

Once you get used to this flexibility, it is hard to go back to a static language such as C++ or even to a half-static language such as Java.

But then, why would you want to do that?

Footnotes

[114]

Or under its bonnet, for objects created to the east of the Atlantic

[115]

People often ask whether they can dump the opcodes out and later reload them. The answer is no—the interpreter has the code to do this, but it is disabled because there is not yet an intermediate code verifier for YARV.

[116]

http://www.yaml.org . YAML stands for YAML Ain’t Markup Language, but that hardly seems important.