Effective Ruby: 48 Specific Ways to Write Better Ruby (Effective Software Development Series) (2015)
5. Metaprogramming
In Ruby, everything is open to modification at run time. Classes, modules, and even the behavior of individual objects can all be changed while a program is running. It’s trivial to write code which defines new classes or adds methods to an existing class or object. Virtually nothing is off-limits. This so-called “metaprogramming” is one of Ruby’s most powerful features. It’s also one of its most dangerous.
There are a lot of good uses for metaprogramming. Cleaning up redundant code, generalizing a feature to work with more than one class, and creating domain specific languages are just a few examples. But there are downsides too. Methods like eval tend to become a crutch since they can be used to solve so many programming problems, yet at the same time they can expose your application to serious security issues. Knowing which kinds of metaprogramming are safe and helpful, and which are problematic is the responsibility of every Ruby programmer.
Metaprogramming can be a slippery slope. This chapter will help you maintain sure footing.
Item 28: Familiarize Yourself with Module and Class Hooks
Love them or hate them, callbacks are a recurring pattern in software design. From user interface event handling to asynchronous APIs, callbacks are practically everywhere. Working with a library or framework that requires callbacks isn’t always the most enjoyable experience. That said, some languages make it more natural to pass functions or anonymous chucks of code around as callbacks. You could even consider a block in Ruby to be a type of callback. In that light callbacks don’t seem so bad after all.
At the risk of outing myself, my favorite text editor uses callbacks as a way to hook into nearly every feature it offers. I love them. I can write a so-called hook function to be notified before or after a file has been opened, saved, or closed. Almost every action performed in my editor triggers an event which runs registered hook functions. I’ve done some pretty weird stuff using hooks, always for the greater good of course.
Ruby also uses this idea of events and hook functions, albeit a much simpler model. Registering to receive event notifications is as simple as writing a method with the correct name. No doubt you’ve already done something similar by defining methods such as initialize and method_missing. While these methods do seem to be callbacks with respect to object lifetimes and method dispatching, Ruby doesn’t consider them to be hooks. Technically, hooks facilitate metaprogramming at the class and module level and are therefore written as class and module methods (also known as singleton methods). There are about a dozen such methods which you can define, but let’s start with the most frequently used hooks first.
As you know, it’s quite common to mix modules into objects, classes, and other modules using the include and extend methods. Each time you mix in a module Ruby calls the included or extended hook depending on how the module was mixed in. This is basically a notification to the module that it’s being inserted into a class hierarchy somewhere. Both hooks are given a single argument, the receiver of the include or extend method. In other words, the argument is the object doing the including or extending. What can you do in one of these hooks? Let’s explore the extended hook by revisiting the example from Item 21.
Recall that the RaisingHash class used delegation instead of inheritance in order to reuse the functionality of the Hash class. The major difference between Hash and RaisingHash is that the latter raises an exception if a nonexistent key is accessed. The majority of the Hash instance methods were made available as instance methods in RaisingHash using the def_delegators method from the Forwardable module. These delegated methods in RaisingHash simply forward the method call on to the @hash instance variable.
There were a few instance methods which weren’t so simple, however. The freeze, taint, and untaint methods needed to invoke the appropriate method on @hash, followed by a call to super, so the RaisingHash object itself was updated accordingly. The implementations of these methods were almost identical. Let’s fix that by writing a new delegation helper method which calls super after delegating the method to @hash. Now, we could just add this new method to the existing Forwardable module but Item 32 advises against this. Instead, let’s write a new module called SuperForwardable. To make life a bit easier for users of SuperForwardable, we’ll use the extended module hook to make sure that the Forwardable library file is loaded and any class which extends SuperForwardable also extends Forwardable. Consider this:
module SuperForwardable
# Module hook.
def self.extended (klass)
require('forwardable')
klass.extend(Forwardable)
end
# Creates delegator which calls super.
def def_delegators_with_super (target, *methods)
methods.each do |method|
target_method = "#{method}_without_super".to_sym
def_delegator(target, method, target_method)
define_method(method) do |*args, &block|
send(target_method, *args, &block)
super(*args, &block)
end
end
end
end
Extending a class with the SuperForwardable module triggers the SuperForwardable::extended hook method. This method is given the object doing the extending as its only argument. It’s common practice to name this argument klass or mod depending on how you assume the module is going to be used. The name “klass” is used because “class” is a keyword in Ruby and can’t be used as a variable name. The extended hook in SuperForwardable expects to be given a class which it then extends with the Forwardable module. Therefore, extending a class with SuperForwardable not only brings in all of the methods defined in that module but it also brings in all of the methods defined in the Forwardable module as well. Let’s see how we can use this new module from within the RaisingHash class:
class RaisingHash
extend(SuperForwardable)
def_delegators(:@hash, :[], :[]=) # etc.
def_delegators_with_super(:@hash, :freeze, :taint, :untaint)
def initialize
# Create @hash...
end
end
Okay, let’s take a moment and trace what’s going on in the RaisingHash class definition. The first line in the class uses the extend method with a single argument, the SuperForwardable module. The extend method can actually take any number of modules as arguments. For each module, the extend method adds all of the methods and constants defined in the module to the receiver. For the RaisingHash class this means that the single instance method defined in SuperForwardable becomes a class method in RaisingHash.
After all of the definitions from the module are loaded into the receiver, extend invokes the module’s extended hook. When the hook method is called it is passed the object which invoked extend. In our example SuperForwardable::extended is called with RaisingHash as its argument. This leads to SuperForwardable::extended calling RaisingHash::extend and passing in the Forwardable module. This whole process repeats so that the Forwardable module is loaded into RaisingHash.
With SuperForwardable and Forwardable loaded into RaisingHash we can use def_delegators and def_delegators_with_super to set up the method forwarding to @hash. We’ve seen def_delegators before in Item 21 and def_delegators_with_super does the exact same thing with one exception. After forwarding the original method call to the target object, the generated delegation method also calls super. Using this helper method gives RaisingHashthree new instance methods: freeze, taint, and untaint. Each of which calls a corresponding method on the @hash object followed by super.
This trick of using the extended hook in a module to further extend a class is an interesting way to emulate module inheritance. It’s as if the SuperForwardable module inherited from the Forwardable module. A similar sort of thing can be done with the include method and the included hook. Of course, the include method brings in a module’s instance methods and sets them up as instance methods on the receiver. Mixing in a module with the include method triggers the included hook which is defined just like the extended hook was in SuperForwardable.
The extended and included hooks are unique to modules. There’s a third hook which was introduced in Ruby 2.0, prepended. It’s triggered when you use the prepend method to mix in a module. The prepended hook and the prependmethod are discussed further in Item 35.
Almost all of the remaining hooks are available on modules and classes. The one exception is unique because it only works with classes. Each time a class is defined, Ruby triggers the inherited hook on its parent class to notify it about the new subclass. Let’s do something interesting with this hook. Item 21 makes the case that you should never inherit from the core collection classes. Let’s enforce that with some code.
We can use the inherited hook to intercept a class definition and raise an exception, preventing the inheritance. Since we want this logic to apply to all of the core collection classes, it makes sense to write it into a module. We can then use extend to insert the module’s instance methods as class methods on each of the core collection classes. Consider the PreventInheritance module:
module PreventInheritance
class InheritanceError < StandardError; end
def inherited (child)
raise(InheritanceError,
"#{child} cannot inherit from #{self}")
end
end
As an instance method in a module, the inherited method does nothing special. But turn it into a class method using extend and it becomes a proper hook method:
irb> Array.extend(PreventInheritance)
irb> class BetterArray < Array; end
PreventInheritance::InheritanceError:
BetterArray cannot inherit from Array
Defining a hook in a module and then mixing it into a class is rather indirect, but useful in this case. If you want to define an inherited hook directly in a class you need to make sure it’s a class method. For example:
class Parent
def self.inherited (child)
# ..
end
end
The Parent::inherited method will be called anytime a class is defined which inherits from the Parent class. It’s worth mentioning that when the inherited hook is called, the child class isn’t fully defined. That is, the body of the child class hasn’t yet executed and therefore hasn’t had a chance to define any methods. This may limit what you can do in an inherited hook, something you’ll want to keep in mind.
That leaves us with the final six hooks which apply to both modules and classes alike. All of them have to do with methods. The method_added, method_removed, and method_undefined hooks are for instance methods while the singleton_method_added, singleton_method_removed, and singleton_method_undefined hooks are for class and module methods.
Defining these hooks for modules or classes is similar to the previous hooks we’ve seen. All of them should be defined in modules as module methods and in classes as class methods. Here’s an example of a class which monitors instance methods:
class InstanceMethodWatcher
def self.method_added (m); end
def self.method_removed (m); end
def self.method_undefined (m); end
# Triggers method_added(:hello)
def hello; end
# Triggers method_removed(:hello)
remove_method(:hello)
# Triggers method_added(:hello), again.
def hello; end
# Triggers method_undefined(:hello)
undef_method(:hello)
end
There are a couple of things to watch out for when using these hooks. The only argument which is given to each of the hooks is a symbol representing the name of the method which was added, removed, or undefined. You’re not given the class on which the method status changed. If a method is added to a subclass, you’ll need to rely of the value of self to know that. Speaking of subclasses, since classes can participate in inheritance, you should probably call super from within these hooks. See Item 29 for more information on hooks and super (including the inherited hook).
Hooks relating to singleton methods are very similar to their instance method counterparts, except for one strange side effect. Defining a singleton_method_added hook will trigger itself. That is, defining the hook—which is a singleton method—causes Ruby to trigger the singleton_method_added hook which now exists. You’ll want to watch out for that. Otherwise, just remember that class methods are implemented as singleton methods, which is why the following code uses the “class << self” trick to enter into the singleton class before invoking remove_method or undef_method:
class SingletonMethodWatcher
def self.singleton_method_added (m); end
def self.singleton_method_removed (m); end
def self.singleton_method_undefined (m); end
# Triggers singleton_method_added(:hello)
def self.hello; end
# Triggers singleton_method_removed(:hello)
class << self; remove_method(:hello); end
# Triggers singleton_method_removed(:hello), again.
def self.hello; end
# Triggers singleton_method_undefined(:hello)
class << self; undef_method(:hello); end
end
And there you have it, all ten hook methods. Let’s wrap up with a few notes about hooks. First, all of the hook methods are automatically marked as private. They’re meant to be called by the Ruby interpreter and not from user space for obvious reasons. Second, there are three methods which are related to the hook methods but are not hooks themselves: extend_object, append_features, and prepend_features.
None of these methods should be overridden, that’s what the hooks are for after all. As an example, when you use the include method to mix a module into a class, the module’s append_features method is invoked to do the actual work before the included hook is triggered. While you could certainly override the append_features method and call super, the preferred way of intercepting this type of module mixing is by defining the included hook. The same thing goes for extend_object and the extended hook, and prepend_features and the prepended hook. Prefer defining hook methods to overriding these internal methods.
Things to Remember
• All of the hook methods should be defined as singleton methods.
• The hooks which are called when a method is added, removed, or undefined only receive the name of the method, not the class where the change occurred. Use the value of self if you need to know this.
• Defining a singleton_method_added hook will trigger itself.
• Don’t override the extend_object, append_features, or prepend_features methods. Use the extended, included, or prepended hooks instead.
Item 29: Invoke super from within Class Hooks
Let’s say that after reading Item 28 you became really excited about using hook functions, specifically the inherited class hook. As a matter of fact, it solves a problem that you’ve been having with one of your class hierarchies. Suppose that the base class of the troublesome hierarchy uses the factory method pattern. It’s an abstract class representing an interface for downloading files when given URLs. Each subclass knows how to work with a single protocol such as HTTP or FTP. The application you’re writing only has to pass a URL to the base class, and in response it receives back an instance of the appropriate subclass ready to download the specified file.
The problem you’ve been having has to do with the base class knowing about each of its subclasses. Up to this point you’ve had to manually connect everything together. But now, with the inherited hook, things are much easier. Consider this:
class DownloaderBase
def self.inherited (subclass)
handlers << subclass
end
def self.handlers
@handlers ||= []
end
private_class_method(:handlers)
end
Thanks to the inherited hook, the subclasses will now automatically register themselves with the base class when they are defined. As recommended in Item 15, the base class uses a class instance variable to track all of its subclasses as opposed to a regular class variable. This keeps it from appearing in the subclasses and avoids any accidental mutation.
With just a few more methods the DownloaderBase class will be able to accept a URL and return the appropriate subclass. But there’s something missing in the inherited hook that’s more important for our discussion here, it needs to invoke super. Technically, the inherited hook works fine just the way it is. The DownloaderBase implicitly inherits from Object and so there’s no real need to use super to call an inherited hook higher up in the hierarchy. As we’ve seen before though, inheritance isn’t the only way for a class to be associated with a superclass.
Including or extending a module might insert an inherited hook higher in the hierarchy. Popular frameworks such as Ruby on Rails do this from time to time. Take the ActiveModel::Validations module for example. When you include that module into a class it sets up an inherited hook which copies any attribute validation callbacks into subclasses. If you tried to use the ActiveModel::Validations module with the current implementation of DownloaderBase, this copying wouldn’t happen. Other modules with more important inherited hooks might break entirely. Of course, the solution is simple, make sure you call super:
def self.inherited (subclass)
super
handlers << subclass
end
As the title of this item suggests, this advice goes beyond the inherited hook. All of the class hooks should invoke versions of themselves higher up in the hierarchy using super. (For a full list of the class hooks go back and take a look at Item 28.)
It might seem redundant to use super from hooks in classes like DownloaderBase, and perhaps it is. Keep in mind that since modules can insert class hooks, it’s not always obvious when the hook you’re writing might override another one higher up in the inheritance hierarchy. Using super is good way to future-proof your code, but ultimately, you’ll have to use your best judgment.
Things to Remember
• Invoke super from within class hooks.
Item 30: Prefer define_method to method_missing
When newcomers to Ruby discover method_missing, it’s as if they’ve just found a multipurpose tool which is begging to be used. It calls to them while in the shower and professes itself to be the perfect solution to a difficult problem from the previous day. One way or another, method_missing is going to end up in their code, even if they have to use a crowbar to get it to fit. What is it about method_missing that makes it so attractive?
Clearly, method_missing is one of the most powerful tools in the Ruby toolbox. Unfortunately, it has a lot of dubious uses. Want an object to respond to any possible message? No problem. Ever needed a Hash to act more like an OpenStruct? Piece of cake. Do you like the idea of method names automatically being turned into SQL? Take a look at Rails 2. It only goes downhill from there.
You can do all these things with method_missing because it’s a catchall, your last ditch effort to respond to a message when a matching method can’t be found. But it comes with a cost. We’ve already seen in Item 7 how defining method_missing can lead to confusing error messages when using super. Then you have introspection methods like respond_to? which won’t agree with reality. There’s also a small performance difference when you use method_missing due to the extra traversal of the inheritance hierarchy, but it’s fairly negligible so we won’t consider it further.
The good news is that there’s nearly always a way to implement the same features without resorting to method_missing. In order to demonstrate this I’ll tackle two of the most common uses of method_missing and show how define_method can be used without incurring the drawbacks listed above. Let’s start with the biggest use of method_missing, proxies.
We’ve already seen how to use the Forwardable module to delegate methods to an instance variable. Back in Item 21 we looked at the RaisingHash class which forwarded many of its instance methods without ever exposing the internal hash kept in @hash. This is a great use of the Forwardable module and the way I recommend you implement delegators or proxies. Unfortunately, the Forwardable module isn’t well known and method_missing seems to be the next best thing. Consider the HashProxy class:
class HashProxy
def initialize
@hash = {}
end
private
def method_missing (name, *args, &block)
if @hash.respond_to?(name)
@hash.send(name, *args, &block)
else
super
end
end
end
This very simple class uses method_missing to forward all undefined methods to its @hash instance variable. That is, as long as the hash object responds to the current message. If it doesn’t, method_missing calls super so that the version of method_missing in the BasicObject class can raise a NoMethodError exception. My biggest complaint about this class is that while it’s pretending to be a Hash, it doesn’t do a very good job. Consider this:
irb> h = HashProxy.new
irb> h.respond_to?(:size)
---> false
irb> h.size
---> 0
irb> h.public_methods(false)
---> []
Ruby programmers espouse that duck typing is the correct way to work with dynamic types. The type of an object isn’t what’s important, it’s the interface that you should be concerned with. But using method_missing this way exposes no interface at all. Using respond_to? and other introspective methods to confirm that an object supports the needed interface isn’t possible. Even if you prefer run time NoMethodError exceptions to using respond_to?, there’s no good reason why delegation can’t be implemented properly. (One way to fix respond_to? is with respond_to_missing?. We’ll look at why it’s not a great solution in just a bit.)
If you’ve decided that the Forwardable module won’t work for you, the next best thing is to use define_method. That’s essentially what the Forwardable module is doing behind the scenes anyway. Given a method name and a block, define_method will create an instance method whose body and arguments are specified by the block. It’s a private class method so you can’t call it with a receiver. That’s usually not a problem though. Consider this implementation of the HashProxy class:
class HashProxy
Hash.public_instance_methods(false).each do |name|
define_method(name) do |*args, &block|
@hash.send(name, *args, &block)
end
end
def initialize
@hash = {}
end
end
This version uses a little metaprogramming to iterate over the public instance methods from the Hash class. For each of them an instance method is created in HashProxy using define_method. The generated methods simply forward their messages on to the @hash object. The effect is the same as the version which used method_missing, but this implementation is more explicit and correctly exposes a Hash-like interface. See for yourself:
irb> h = HashProxy.new
irb> h.respond_to?(:size)
---> true
irb> h.public_methods(false).sort.take(5)
---> [:==, :[], :[]=, :assoc, :clear]
Ah, that looks much better. I would argue that using define_method in this case isn’t any more complicated than using method_missing. In fact, I’d say that it’s quite a bit clearer and you don’t have to guess at what’s going on. It’s too easy for method_missing to become a black hole of confusion. Hopefully you can now see that it’s also easy to replace with define_method. Let’s solidify this by looking at a more complicated example.
The next major thing that Ruby programmers use method_missing for is to implement the decorator pattern. This pattern is very similar to the delegation pattern we just explored, but with a twist. Classes implementing the decorator pattern wrap an arbitrary object and extend its capabilities in some way. In the HashProxy class we knew ahead of time that we’d be forwarding messages to a hash object. The decorator class, on the other hand, accepts objects of any class and needs to delegate to them appropriately. Let’s write a decorator class which records log entries before delegating to the target object. It’s pretty easy to implement using method_missing:
class AuditDecorator
def initialize (object)
@object = object
@logger = Logger.new($stdout)
end
private
def method_missing (name, *args, &block)
@logger.info("calling `#{name}' on #{@object.inspect}")
@object.send(name, *args, &block)
end
end
The AuditDecorator class can add a method logging feature to any object. Calling a method on an instance of AuditDecorator will log the message and then forward it to the wrapped object, with an exception of course. Methods already defined in AuditDecorator (or its superclasses) won’t trigger method_missing. Therefore, implementing the decorator pattern this way isn’t as transparent as we’d like. Consider this:
irb> fake = AuditDecorator.new("Am I a String?")
irb> fake.downcase
INFO: calling `downcase' on "Am I a String?"
---> "am i a string?"
irb> fake.class
---> AuditDecorator
As before, using method_missing means that AuditDecorator instances don’t respond correctly to introspective methods such as respond_to? and public_methods. But now we have an additional problem. The AuditDecorator instance methods get in the way and keep us from logging and forwarding methods like class. Ideally, the decorator class would be completely transparent and would forward all methods. That’s where define_method comes in. However, since the AuditDecorator class can wrap any object we’re going to have to rely on a little more metaprogramming to make this work. The initialize method needs to inspect the object which it’s wrapping and then create the appropriate forwarding methods. But those methods can’t be instance methods for the AuditDecorator class since each instance should be able to wrap different objects with different classes. Therefore, the generated methods will have to exist in a single instance of AuditDecorator and not for all AuditDecorator instances. Thankfully, we have anonymous modules to work with:
class AuditDecorator
def initialize (object)
@object = object
@logger = Logger.new($stdout)
mod = Module.new do
object.public_methods.each do |name|
define_method(name) do |*args, &block|
@logger.info("calling `#{name}' on #{@object.inspect}")
@object.send(name, *args, &block)
end
end
end
extend(mod)
end
end
There’s a bit more going on in this version compared to its predecessor. In order to generate methods on the current AuditDecorator instance (as opposed to all AuditDecorator instances) we need to create an anonymous module and define the methods we want inside the module. Then, all we have to do is extend the AuditDecorator instance with the anonymous module, thus coping all of those generated methods into the instance. Pretty slick huh? In exchange for a few extra lines of code we now have full transparency:
irb> fake = AuditDecorator.new("I'm a String!")
irb> fake.downcase
INFO: calling `downcase' on "I'm a String!"
---> "i'm a string!"
irb> fake.class
INFO: calling `class' on "I'm a String!"
---> String
There’s one very subtle but important thing I need to point out. Did you notice that inside the block given to Module::new the public_methods message was sent to the object variable and not @object? That’s because inside the module—but outside a method definition—@object refers to a module variable and not the instance variable defined in AuditDecorator#initialize. The block does, however, form a closure which allows us to access the local variables defined in the initialize method. That’s why using the object variable works from inside the module. Knowing about these scoping rules will keep you from pulling your hair out while you’re exercising Ruby’s metaprogramming features.
I would be remiss if I didn’t mention a method related to define_method. While only modules and classes respond to define_method, objects have their own version called define_singleton_method. Thanks to this metaprogramming gem we can remove the need for Module::new in the previous example. Using define_singleton_method has the same effect as defining a method in an anonymous module which is then extended:
class AuditDecorator
def initialize (object)
@object = object
@logger = Logger.new($stdout)
@object.public_methods.each do |name|
define_singleton_method(name) do |*args, &block|
@logger.info("calling `#{name}' on #{@object.inspect}")
@object.send(name, *args, &block)
end
end
end
end
Replacing method_missing with define_method doesn’t just make your code more explicit, it also restores proper introspection capabilities, and in the case of the decorator pattern, allows for complete transparency. Before reaching for method_missing you should consider whether using defined_method is possible. I’ve yet to find a situation where method_missing was the only possible solution. If you do find yourself unable to use defined_method (or define_singleton_method) then there’s one last trick you should know about.
Just as method_missing is a catchall for method dispatch, introspection in Ruby uses the respond_to_missing? method as a half-baked callback. If you define this method it will be called for two different reasons. First, if you use respond_to? to see if an object responds to a specific message and a matching method isn’t defined, respond_to? will invoke respond_to_missing?, giving you the opportunity to make respond_to? return true. For the HashProxyabove you’d want to implement the respond_to_missing? method like this:
def respond_to_missing? (name, include_private)
@hash.respond_to?(name, include_private) || super
end
Adding this code to the HashProxy class would allow the respond_to? method to return true for all of the Hash instance methods. The reason I said that it’s half-baked is because it doesn’t have any affect on methods like public_methods. This leads to some of the introspective methods reporting that a method exists while others say it doesn’t. That’s even more confusing in my opinion. If you’re using method_missing it’s the best you can do.
The other reason respond_to_missing? is used has to do with an interesting method which goes by the name “method”. It takes the name of a method and returns an object which can be used to invoke the method at a later time. If you call method for a method which isn’t defined but for which respond_to_missing? returns true, Ruby will return a Method object that encapsulates a call to method_missing. Again, this is another example where some methods report one interface and other methods report a different interface. Something that isn’t a problem with define_method.
Things to Remember
• Prefer define_method to method_missing.
• If you absolutely must use method_missing consider defining respond_to_missing?.
Item 31: Know the Difference Between the Variants of eval
We’ve seen that Ruby has a rich set of features for run time metaprogramming, and without a doubt the most powerful of these are the family of eval methods. The vanilla eval method is similar to those found in other interpreted languages, you build up a string of valid Ruby code and have it evaluated at run time. This, of course, can be very dangerous, especially in an application which is processing untrustworthy data. Fortunately, there’s rarely ever a valid reason to evaluate a string these days. That’s because the majority of the evaluation methods in Ruby all except blocks of code. Combined with metaprogramming tricks like define_method, you can replace nearly every use of string evaluation with block evaluation.
Knowing which of the evaluation methods to use can be the confusing part. Most of them have names which hint at how they work. Or more accurately, the context in which they evaluate their input. That’s the major differentiating feature between them, that and what they’re willing to evaluate: strings, blocks, or both. Most of the evaluation methods use their receiver as the evaluation context. A notable exception is the eval method from the Kernel module.
While eval only accepts strings as input, you can have that string evaluated in any context you want. If you don’t specify a context then the string is evaluated as if it were written into the code at the point where eval is used. That’s not always desirable. Perhaps you don’t want to expose the variables which are currently in scope. In this case you can explicitly provide a Binding object which represents the context in which the string should be evaluated. The Kernel module defines a private method called binding which captures the local scope and returns it inside a Binding object. This context can then be given to eval as its second argument:
irb> def glass_case_of_emotion
x = "I'm in a " + __method__.to_s.tr('_', ' ')
binding
end
irb> x = "I'm in scope"
irb> eval("x")
---> "I'm in scope"
irb> eval("x", glass_case_of_emotion)
---> "I'm in a glass case of emotion"
Being able to specify the exact context to use for evaluation is pretty neat. But eval only accepts strings as input, so you need to be very careful with what goes into them. Allowing any untrustworthy data (such as user input) to make their way into eval exposes your application to code injection attacks. That’s why we’ll turn our attention away from strings and towards blocks. All of the remaining evaluation methods support blocks as input. The only difference between them is the context they use when evaluating those blocks.
Thanks to the BasicObject class, every object in Ruby responds to the instance_eval method. Its name provides a clue about the context it uses when evaluating its input. Unlike with eval, you can’t provide a Binding object directly to instance_eval. Instead, the object you invoke instance_eval on becomes the context for the evaluation. This allows you to reach into an object and access its private methods and instance variables. Things get a little confusing when you start defining methods with instance_eval. In order to play around with the evaluation methods let’s look at a simple Widget class:
class Widget
def initialize (name)
@name = name
end
end
Now we can see how instance_eval can be used to access instance variables and define methods:
irb> w = Widget.new("Muffler Bearing")
irb> w.instance_eval {@name}
---> "Muffler Bearing"
irb> w.instance_eval do
def in_stock?; false; end
end
irb> w.singleton_methods(false)
---> [:in_stock?]
If you use instance_eval to define a method then that method will only exist for a single object. In other words, instance_eval creates singleton methods. What happens if you use instance_eval with a class object? What are singleton methods in the context of a class? Yep, class methods. Observe:
irb> Widget.instance_eval do
def table_name; "widgets"; end
end
irb> Widget.table_name
---> "widgets"
irb> Widget.singleton_methods(false)
---> [:table_name]
It might be a bit confusing at first but if you remember that methods defined using instance_eval are singleton methods, you’ll be in good shape. What if, on the other hand, we wanted to define an instance method in the Widgetclass so that it’s available to all instances? That’s where our next evaluation method comes in: class_eval. Just as its name suggests, class_eval evaluates a string or a block in the context of a class. It’s exactly like opening the class back up and inserting new code. Anything you can do between the class and end keywords in a normal class definition can be done using class_eval. For example:
irb> Widget.class_eval do
attr_accessor(:name)
def sold?; false; end
end
irb> w = Widget.new("Blinker Fluid")
irb> w.public_methods(false)
---> [:name, :name=, :sold?]
You can’t use class_eval on just any object though. As a matter of fact, it’s defined in the Module module as a singleton method which means it can only be used on modules and classes. There’s even an alias for it so you can make your code look better when you’re manipulating modules: module_eval. It’s purely aesthetics though, there’s no difference between class_eval and module_eval.
An easy way to remember the context for these evaluation methods is to think about the receiver. While evaluating their input the instance_eval and class_eval methods set the self variable to their receiver. That’s why you can access instance variables with instance_eval and define instance methods with class_eval. They also yield their receiver to the input block. This can sometimes be useful if there’s some indirection between the receiver and the block. For example, consider this variant of the Widget class:
class Widget
attr_accessor(:name, :quantity)
def initialize (&block)
instance_eval(&block) if block
end
end
irb> w = Widget.new do |widget|
widget.name = "Elbow Grease"
@quantity = 0
end
irb> [w.name, w.quantity]
---> ["Elbow Grease", 0]
Because the block given to initialize is passed to instance_eval, it is evaluated in the context of the new Widget object. When instance_eval invokes the block it sets self to its receiver (the Widget object) and yields the same object to the block. Since the self variable is set to the Widget object the block can manipulate internal instance variables directly as if it were an instance method. This might sometimes be useful, but it does break encapsulation.
So far we’ve focused on some simple uses of run time evaluation. It might seem that evaluating a block isn’t as flexible as evaluating a string since you’re stuck with static code. It’s actually quite common to see instance_eval or class_eval used in Ruby libraries out in the wild with strings instead of blocks. Let’s put this myth to bed using the final set of evaluation methods: instance_exec, class_exec, and module_exec.
These methods are very similar to their eval counterparts. Where the eval versions accept strings or blocks the exec variants only accept blocks. They also differ in what they yield to their blocks. The exec methods don’t yield anything to their blocks by default. Instead, any arguments given to them are passed on to the block. This gives us enough power to do just about everything we can do when evaluating strings.
Suppose you have a class to represents a counter which can be used to increment an instance variable, but not reset it back to its starting value. Why can’t you reset it? Don’t ask me, this is supposed to be your code, not mine. Anyways, also suppose that you don’t want to add a reset feature directly to the class but instead you want to reset counter objects externally. Consider this:
class Counter
DEFAULT = 0
attr_reader(:counter)
def initialize (start=DEFAULT)
@counter = start
end
def inc
@counter += 1
end
end
You’d like to reset the @counter instance variable back to the value set in the DEFAULT constant. You also want to make this code generic so you can use it with other classes in the future. As a first stab in the dark you resort to evaluating a string:
module Reset
def self.reset_var (object, name)
object.instance_eval("@#{name} = DEFAULT")
end
end
Using this helper method is pretty straight forward. If you give the reset_var module function an object and a variable name it will set an instance variable with that name to the value in DEFAULT. But notice what happens if you give it an invalid variable name:
irb> c = Counter.new(10)
---> #<Counter @counter=10>
irb> Reset.reset_var(c, "counter")
---> 0
irb> Reset.reset_var(c, "x;")
SyntaxError: (eval):1:
syntax error, unexpected '=', expecting end-of-input
This example might be slightly contrived but it does demonstrate how to inject code into an evaluation method. Let’s look at how we can use instance_exec to write reset_var without having to evaluate a string. Since instance_exec will pass its arguments on to the block which it evaluates, we can use that as a way to pass constructed names into the block for use in methods like instance_variable_set:
module Reset
def self.reset_var (object, name)
object.instance_exec("@#{name}".to_sym) do |var|
const = self.class.const_get(:DEFAULT)
instance_variable_set(var, const)
end
end
end
Ruby’s metaprogramming API is rich enough that we rarely need to even use the evaluation methods, especially those which evaluate strings. Methods like define_method and instance_variable_set are also much easier to read than a mess of strings with interpolated variables. Back to our use of instance_exec, look what happens now if you give reset_var a bad variable name:
irb> Reset.reset_var(c, "x;")
NameError: `@x;' is not allowed as an instance variable name
This time the code raises a NameError unlike the previous version which raised a SyntaxError. The difference of course is that the string which was passed to reset_var wasn’t evaluated as Ruby code. Instead, it was only used to look up an instance variable in an object’s variable table. But before that even happened it was validated to ensure that it could be used as a valid variable name, which failed and raised an exception. This is another important difference between the variants of eval, one that you’ll want to keep in mind.
Things to Remember
• Methods defined using instance_eval or instance_exec are singleton methods.
• The class_eval, module_eval, class_exec, and module_exec methods can only be used with classes and modules. Methods defined with one of these become instance methods.
Item 32: Consider Alternatives to Monkey Patching
Unless you’ve been living on an island where this book happened to wash up you’ve no doubt heard of Ruby on Rails. It made a pretty big splash in the web application development community and helped put Ruby in the spotlight. But it hasn’t been all rainbows and roses. Rails includes a library called Active Support which modifies nearly every Ruby core class, something referred to as “monkey patching”.
While it’s not a new concept, Active Support’s heavy use of monkey patching kicked up quite a bit of dust in the Ruby community. The lines were drawn, you were either for or against monkey patching. Okay, maybe it wasn’t so dramatic, but there are definitely some outspoken Ruby experts who strongly advise against modifying the core classes. So, what’s the harm; what’s so bad about monkey patching?
As you know, classes, objects, and modules are always open in Ruby. You can modify them at any point while a program is running. Nothing is really off limits, maybe you’ll get a run time warning, maybe not. Have you ever wished that the String class had a to_french method? No problem, open the class or use something like class_exec to add it. But what if one of the libraries you’re using in a project also adds a String#to_french method? This is a pretty big problem and is often referred to as patch collision. Ruby will definitely give you a warning if you redefine an existing method, which is why Item 5 urges you to pay attention to run time warnings. But as we’ll see shortly, there are ways to run into patch collision without triggering a warning.
Now, if both of the colliding methods do the exact same thing, maybe you won’t care so much. If they happen to do completely different things but collide because they have the same name, well, you’ll probably care a lot more. Actually, both situations seem pretty serious to me. I want to know for sure which implementation of a method I’m using and that it’s been tested appropriately. Imagine tearing your hair out because your code looks totally fine, except thanks to monkey patching, it’s not your code which is actually running and breaking things. Trust me, this has happened to me more than once. Clearly monkey patching can be dangerous. That’s why we’re going to explore alternatives, ways to do some of the same things you can do with monkey patching but with different trade-offs.
There are a handful of Ruby Gems (including Active Support) which monkey patch the String class to add a method which tests if a string is empty or only includes space characters. It’s a surprisingly useful feature which has somehow avoided getting included into the official String class. Let’s write our own version of this method and experiment with a few ways to use it without resorting to altering the String class. One of the safest ways to write this method is to make it a module function. Consider this:
module OnlySpace
ONLY_SPACE_UNICODE_RE = %r/\A[[:space:]]*\z/
def self.only_space? (str)
if str.ascii_only?
!str.bytes.any? {|b| b != 32 && !b.between?(9, 13)}
else
ONLY_SPACE_UNICODE_RE === str
end
end
end
The only_space? method is callable directly through the OnlySpace module. The biggest downsides to this technique are rather obvious. It’s not very object-oriented and it’s a bit verbose. Using the module function is simple, but just doesn’t feel right:
irb> OnlySpace.only_space?("\r\n")
---> true
One way to improve upon this is to define an instance method version of only_space? in the OnlySpace module. You can then extend individual string objects as necessary.
module OnlySpace
def only_space?
# Forward to module function.
OnlySpace.only_space?(self)
end
end
irb> str = "Yo Ho!"
irb> str.extend(OnlySpace)
irb> str.only_space?
---> false
While this restores some object-oriented flavoring, it’s still a bit long-winded. The upside is that we’ve managed to avoid monkey patching the String class. Strings which haven’t been extended by our module won’t be affected. On the other hand, this technique introduces inconsistency because some string objects will respond to only_space? while others won’t. Extending individual objects with a module tends to work best when very little of your code needs to use the methods defined in that module. As you use those methods more and more, you’ll probably want to consider another alternative.
Even though extending individual string objects with the OnlySpace module doesn’t alter the String class in any way, it is still a form of monkey patching, albeit on a smaller scale and a bit more controlled. To completely avoid monkey patching altogether, let’s turn to our next technique, creating a new String class.
Back in Item 21 we looked at changing the behavior of the Hash class by writing a new class, RaisingHash. We avoided inheritance in order to maintain total control over which methods were exposed by the RaisingHash class. Instead, RaisingHash stored a hash inside an instance variable. It then used method delegation to forward methods to the hash using the Forwardable module. We can use this technique to create a new string class:
require('forwardable')
class StringExtra
extend(Forwardable)
def_delegators(:@string,
*String.public_instance_methods(false))
def initialize (str="")
@string = str
end
def only_space?
...
end
end
The StringExtra class works just like the core String class thanks to the Forwardable module and the def_delegators method. Unlike with the RaisingHash class, I haven’t overridden any methods which might return a new Stringobject instead of a StringExtra object. I also haven’t implemented some important methods like freeze and taint. If you define a delegating class like StringExtra, make sure you go back to Item 21 and add in these missing features.
Using StringExtra::new to wrap an existing string object can be easier to stomach than using the extend method. Both techniques suffer from the fact that you need to take an extra step to get an object which responds to the only_space? message. The String class has a monopoly on automatic string creation from syntax literals. There’s just no getting away from needing to take this extra step if you want to avoid monkey patching. Then again, if you tuck the call to StringExtra::new away in your initialize method, it’s not such a big deal. And it’s a lot less painful than having to debug the mess caused by adding methods to an existing class.
Sometimes though, even after considering the alternatives, you really do want to modify one of the core classes. If you’re using at least Ruby 2.0, there’s a feature specifically designed to rein in monkey patching, refinements. You can think of refinements as being somewhat similar to the StringExtra class, except that Ruby will automatically wrap and unwrap the string which we want to add extra features to. There are two parts to refinements, modifying a class in some way (usually by adding instance methods) and activating those changes for a limited scope.
Refinements are a very interesting way to deal with monkey patching, but they do come with some limitations. The biggest limitation may be that refinements are relatively new to Ruby and they’re still in flux. As I mentioned earlier, they were introduced in Ruby 2.0 as an experimental feature. Defining and activating refinements will produce run time warnings reminding you that these features are subject to change. Starting in Ruby 2.1 refinements are no longer an experimental feature and won’t produce any warnings. But the feature still isn’t considered stable and the next version of Ruby is free to change refinements as necessary.
Another limitation is that you can only refine classes. Attempting to refine anything else—like a module—will raise a TypeError exception. This probably isn’t too restrictive, just something to keep in mind.
Defining a refinement is done inside a module using the refine method. You pass in the class you plan to modify as the argument to refine and do any necessary patching inside a block. Take a look at a refinement which adds the only_space? method to String:
module OnlySpace
refine(String) do
def only_space?
...
end
end
end
Using the refine method to define a refinement isn’t enough to add the only_space? method to String, it’s just the first step. The next thing you need to do is to activate the refinement with the using method. This is where things get a little tricky. Ruby 2.0 only allows you to activate a refinement at the top-level of a file, outside of any module or class definition. After activating the refinement it will be available from that point until the end of the file. Ruby 2.1 is more flexible, you can activate a refinement at the top-level of a file, inside a module, or inside a class. Consider this:
class Person
using(OnlySpace)
def initialize (name)
@name = name
end
def valid?
!@name.only_space?
end
def display (io=$stdout)
io.puts(@name)
end
end
The using method expects a single argument, a module which contains refinements. The refinements in the module are activated, but only for the current lexical scope. This is an important feature and the reason why refinements are safer than monkey patching. Instead of patching a class and making those changes globally visible, refinements automatically deactivate outside of the lexical scope in which they were activated. Clearly, the only_space? method is available on strings inside the Person class. But what about the display method and the string it passes to puts? Here’s the cool part. Once control leaves display and enters puts, the refinements defined in OnlySpace are deactivated. The puts method can’t call only_space? on the string, that method is no longer available.
It makes sense that puts can’t use the refinements activated in the Person class. But you may be wondering why I made it a point to say “lexical scoping” over and over again in the previous paragraph. Obviously it’s important. The lexical scoping rules are stricter than you might first think. For example, if you defined a Customer class which inherits from Person, you would not be able to use only_space? from within Customer just because its parent class can. Refinements aren’t scoped that way. In this example, if you invoked the valid? method on a Customer object it would indeed work correctly since that method is defined in Person. But any methods defined directly in the Customerclass cannot call only_space? without the refinement being activated in Customer first. (For a refresher on the lexical scoping rules take a look at Item 11.)
Just like anything in software development, you should use the simplest technique which will get the job done. If you can get away with something like the StringExtra class, prefer that over refinements. If you can’t resist the temptation to monkey patch one of the core classes, then at least protect those around you with one of these techniques.
Things to Remember
• While refinements might not be experimental anymore, they’re still subject to change as the feature matures.
• A refinement must be activated in each lexical scope in which you want to use it.
Item 33: Invoke Modified Methods with Alias Chaining
Back when I programmed on Motorola 68k processors running Macintosh System 7, I would use a pretty neat hack to replace parts of the operating system with my own code. System 7 had a dispatch table in RAM where it would look up the RAM or ROM locations of system code. I remember working on an application that needed to monitor key press events, even when it wasn’t the active application. You couldn’t do this directly so a really common workaround back then was to patch the system dispatch table.
The technique was simple enough. You start by searching through the dispatch table and find the entry for the system function which handles keyboard events, then you stash away the function’s address somewhere in your running application, and finally you install a new entry with an address that points to your code. (System 7 didn’t have protected memory so an application could write into any RAM location, including the system heap.) When a key was pressed, the operating system would look in the dispatch table, find your address, and then invoke your code instead of the system function. Since you had the location of the original system function you could resume normal keyboard processing by invoking the function at the stored location.
As a matter of fact, this technique was so common that when you fetched the address of the system function from the dispatch table, it might not refer to the actual system function. Another application might have already patched the dispatch table and inserted its address in place of the original. Each application that modified the dispatch table formed a call chain where one function would invoke the next function until the original operating system function was finally called and the chain ended. (Apple itself used this technique to patch bugs that were burned into the system ROM.)
If you think about method names as being addresses to some chunk of code you want to run, it becomes clear that this sort of thing is possible in Ruby too. That’s because you can use alias_method to give an existing method a new name. You can then call the method by its old name and its new name. But if you then redefine that method and give it a new implementation, you’ll still be able invoke the original implementation through its other name. You can therefore hijack a method like in my dispatch table example and eventually call the real version. This is referred to as alias chaining and an example is in order.
Suppose you want to enhance a method in one of the core classes so it outputs logging information each time it’s called. You don’t want to change its behavior in any way, you just want to wrap around it so you can log when it’s called and when it’s finished. Sounds like a good use of alias chaining to me.
Even though it’s a form of monkey patching, what allows alias chaining to avoid the downsides discussed in Item 30 is that it can be undone, and usually doesn’t alter the behavior of the target class in a way that will affect other code. Let’s take a look at a module which can be used to add logging capabilities to any method, in any class:
module LogMethod
def log_method (method)
# Choose a new, unique name for the method.
orig = "#{method}_without_logging".to_sym
# Make sure name is unique.
if instance_methods.include?(orig)
raise(NameError, "#{orig} isn't a unique name")
end
# Create a new name for the original method.
alias_method(orig, method)
# Replace original method.
define_method(method) do |*args, &block|
$stdout.puts("calling method `#{method}'")
result = send(orig, *args, &block)
$stdout.puts("`#{method}' returned #{result.inspect}")
result
end
end
end
When a class is extended with the LogMethod module, it will receive a new class method named log_method. You can use log_method to wrap any existing method so that it outputs messages before and after it invokes the original method. Before we dig into the details let’s see it in action:
irb> Array.extend(LogMethod)
irb> Array.log_method(:first)
irb> [1, 2, 3].first
calling method `first'
`first' returned 1
---> 1
irb> %w(a b c).first_without_logging
---> "a"
Before redefining the target method, log_method uses alias_method to create a new name for it. The first argument to alias_method is the new name you want to create and the second argument is the existing name. After alias_method is called the method will be available by both names. Then log_method redefines the method using define_method, giving it a new implementation which performs the logging and uses the aliased name to invoke the original method. It’s like my Macintosh hacking days all over again. Unfortunately, the LogMethod module isn’t as safe as patching the good old dispatch table in System 7.
One thing you’ll want to ensure is that the new name you create with alias_method is unique. If a method already exists with that name you’ll clobber it without so much as a warning. That’s why log_method raises an exception if the aliased name already exists. This version is also a bit simplistic and can’t be used with operators. If you passed “:*” as the method name to log_method, it would try to create an alias named “:*_without_logging”. Obviously that’s not going to work. If you’re looking for something more elaborate and robust you might consider continuously generating method names with a random component until you find one that isn’t already defined. The technique you choose for generating the aliased name will depend on your particular needs.
A final feature to consider is adding a method which can put things back to the way they were originally. This usually involves a call to alias_method to restore the original implementation and a couple of calls to remove_method to delete the patched version and aliased name. Consider unlog_method:
module LogMethod
def unlog_method (method)
orig = "#{method}_without_logging".to_sym
# Make sure log_method was called first.
if !instance_methods.include?(orig)
raise(NameError, "was #{orig} already removed?")
end
# Remove the logging version.
remove_method(method)
# Put the method back to it's original name.
alias_method(method, orig)
# Remove the name created by log_method.
remove_method(orig)
end
end
Alias chaining is an interesting way to intercept method calls. As long as each link in the chain uses a unique name with alias_method, the original method can eventually be called through the chain.
Things to Remember
• When setting up an alias chain, make sure the aliased named is unique.
• Consider providing a method which can undo the alias chaining.
Item 34: Consider Supporting Differences in Proc Arity
Instances of the Proc class are ubiquitous in Ruby. Off the top of my head I can think of at least seven different ways to create Proc objects. And that’s saying something since I have the attention span of a block argument. (That’s right folks, enjoy the comedy while you can.) The most idiomatic way to create a Proc object is by passing a block to a method. While the block itself is just Ruby syntax, it eventually gets wrapped up in a Proc and passed to the method. We can see this directly if we write a method that accepts a block and then passes that block through as its return value:
irb> def pass (&block) block; end
irb> greeter = pass {|name| "Hello #{name}"}
---> #<Proc>
irb> greeter.call("World")
---> "Hello World"
The pass method takes a block and binds it to the variable block, then simply returns it. What is actually bound to the block variable is an instance of the Proc class. Like all good objects you can send it messages, call being one of them. Creating Proc objects this way is the most common, but not the only way. There are the proc and lambda methods, the Proc::new method, lambda syntax literals, and several other ways which I could continue to enumerate and bore us both to death. The reason I even bring this up is because all these various ways of creating a Proc object can be divided into two categories which I’ll call weak and strong. The major differences between weak and strong Proc objects are how they deal with invalid arguments. (They also differ in how they’re affected by control flow expressions. I won’t go into that here but any introductory book on Ruby should include this information.)
Weak Proc objects play fast and loose with their arguments. Calling a weak Proc object with the wrong number of arguments doesn’t raise an exception or produce a warning. If you give too few arguments the missing ones will be set to nil. If you give too many arguments the extras are ignored. This is much different than strong Proc objects. Calling a strong Proc object obeys all the rules of a normal method call. If the given number of arguments isn’t exactly correct, an ArgumentError exception will be raised. Blocks turn into weak Proc objects and lambdas into strong, it’s pretty easy to see this in action:
irb> def test
# Yield one argument.
yield("a")
end
irb> test {|x, y, z| [x, y, z]} # Expect 3.
---> ["a", nil, nil]
irb> test {"b"} # Expect 0.
---> "b"
irb> func = ->(x) {"Hello #{x}"} # Expect 1.
---> #<Proc>
irb> func.call("a", "b") # Send 2.
ArgumentError: wrong number of arguments (2 for 1)
You can distinguish between weak and strong Proc objects using the lambda? method. It returns false for weak Proc objects and true for strong. This can be helpful in methods which accept blocks because they may receive weak orstrong blocks, depending on how they’re called. For example:
irb> def test (&block)
block.lambda?
end
irb> test {|x| x}
---> false
irb> test(& ->(x){x})
---> true
Knowing whether a Proc is weak or strong isn’t in itself very useful. That is to say, it’s not likely that you’ll want to treat them any differently just by knowing what type of Proc they are. But knowing that strong Proc objects raise exceptions if they are called with the wrong number of arguments is good motivation for knowing how many arguments they expect. Let me illustrate this with an example. Suppose you’ve written a class for streaming data from an I/O object to a Proc. The class feeds the Proc data in chunks until the input has been exhausted. You also keep track of how many seconds it takes to read each chunk just in case the Proc wants to calculate throughput. Consider this:
class Stream
def initialize (io=$stdin, chunk=64*1024)
@io, @chunk = io, chunk
end
def stream (&block)
loop do
start = Time.now
data = @io.read(@chunk)
return if data.nil?
time = (Time.now - start).to_f
block.call(data, time)
end
end
end
The stream method will always give the Proc object two arguments, the data which was read and the timing information. If the Proc doesn’t want it—and it’s a weak Proc—it just ignores the argument. Consider this naive (and inefficient) method for calculating the size of a file:
def file_size (file)
File.open(file) do |f|
bytes = 0
s = Stream.new(f)
s.stream {|data| bytes += data.size}
bytes
end
end
By always yielding two arguments to the Proc object you’re limiting yourself to weak Proc objects or alternatively, strong Proc objects which declare an argument that goes unused. But you can’t always control how many arguments a Proc expects. What if you wanted to pass a method to stream instead of a block and that method only accepted a single argument? For example, here’s a method which uses the Stream class to generate a SHA256 cryptographic hash:
require('digest')
def digest (file)
File.open(file) do |f|
sha = Digest::SHA256.new
s = Stream.new(f)
s.stream(&sha.method(:update))
sha.hexdigest
end
end
The Digest::SHA256 class has a method called update which allows you to supply data in chunks instead of having to read an entire file into memory. It expects one argument, a string containing the next chuck to add to the hash. We can use the “&” operator within a method invocation to turn the update method into a strong Proc object. But with the Stream class the way it is now, passing the update method to stream will raise an exception because it’s passing two arguments to the Proc instead of one.
Wouldn’t it be nice if we knew how many arguments the Proc object expected? That’s where the Proc#arity method comes in. It returns a Fixnum which contains the number of arguments which the Proc object expects to be given. Well, almost. If only it were that simple. Recall that methods can have default arguments, making them optional, what should arity return in that case? A method might also have a variadic argument by using “*” to collect all remaining arguments into a single array, basically making the method’s arity infinite. In these cases the arity method will return a negative Fixnum which tells you indirectly how many arguments are required.
I say that it’s indirect because the negative Fixnum is actually the 1s-compliment of the number of required arguments. If a method had one mandatory argument and one optional argument then arity will return -2. You can use the unary complement operator (“~”) to turn that result into the number of required arguments:
irb> func = ->(x, y=1) {x+y}
irb> func.arity
---> -2
irb> ~ func.arity
---> 1
Now we can rewrite the stream method so that only gives the timing information to Proc objects that are expecting two arguments:
def stream (&block)
loop do
start = Time.now
data = @io.read(@chunk)
return if data.nil?
arg_count = block.arity
arg_list = [data]
if arg_count == 2 || ~arg_count == 2
arg_list << (Time.now - start).to_f
end
block.call(*arg_list)
end
end
And with that change the digest method can now pass the Digest::SHA256#update method as a Proc object to the stream method. Being able to use strong Proc objects this way is a neat trick and something you should consider when writing methods which take blocks.
Things to Remember
• Unlike weak Proc objects, their strong counterparts will raise an ArgumentError exception if called with the wrong number of arguments.
• You can use the Proc#arity method to find out how many arguments a Proc object expects. A positive number means it expects that exact number of arguments. A negative number, on the other hand, means there are optional arguments and is the 1s-compliment of the number of required arguments.
Item 35: Think Carefully Before Using Module Prepending
Back in Item 6 we looked into Ruby’s internals to see how including modules into a class altered the inheritance hierarchy. Recall that when you use the include method from within a class, Ruby creates a singleton class to hold the module’s methods and inserts it as an invisible superclass. When multiple modules are included into a class they are found by the method dispatching algorithm in reverse order. An example makes this easier to visualize:
module A
def who_am_i?
"A#who_am_i?"
end
end
module B
def who_am_i?
"B#who_am_i?"
end
end
class C
include(A)
include(B)
def who_am_i?
"C#who_am_i?"
end
end
The two modules (A and B) are included into the C class. All three define a method named who_am_i? to help us see how the different implementations override one another based on the include order. We can also use the ancestorsclass method to get an idea of how the class hierarchy is constructed and which method would be invoked if we called super from within each of the who_am_i? methods. Consider this:
irb> C.ancestors
---> [C, B, A, Object, Kernel, BasicObject]
irb> C.new.who_am_i?
---> "C#who_am_i?"
As you’d expect, methods defined in C come before those included from the modules. And since the include method inserts modules between the C class and its superclass, the B module comes before A in the search order. Based on the output from the ancestors method we can see that if the C#who_am_i? method used super it would invoke the B#who_am_i? method. The ordering is really important because it allows you to emulate multiple inheritance while retaining the ability to override specific methods in the class which is doing the including, just like you can with a traditional parent class. In other words, methods defined in the class take priority over any methods higher up in the hierarchy. Pretty standard object-oriented behavior. But it’s not the only way a module can appear in the class hierarchy.
Starting in Ruby 2.0 you can use the prepend method as another way to insert a module into the inheritance hierarchy. It looks and feels just like the include method, it even has its own module hook called prepended. But prependworks much differently than include. Where include inserts a list of modules between the receiver and its superclass, prepend inserts them before the receiver. That’s right, before. This makes for some very surprising changes to method dispatching. Let’s change the C class to use prepend instead of include and see what happens:
class C
prepend(A)
prepend(B)
def who_am_i?
"C#who_am_i?"
end
end
irb> C.ancestors
---> [B, A, C, Object, Kernel, BasicObject]
irb> C.new.who_am_i?
---> "B#who_am_i?"
After prepending the A and B modules into the C class you can see that they show up before C in the list of ancestors. Calling the who_am_i? method on an instance of C will therefore trigger the implementation in the B module first, before the definition in C is even seen. The biggest side effect of prepending is that you can no longer override a module’s method by simply defining a version in the class. The definition in the class is overridden by the module and will only be invoked if the module’s method calls super. This goes against the grain of most object-oriented languages because method dispatch will start below the object’s class in the hierarchy and only make its way upward to the class if the method isn’t found or super is used.
You might be wondering if prepending a module is useful or not. For the most part it gives us a second way of doing things we could already do without it. Take method alias chaining from Item 33 for example. We used alias_method to create a new name for an existing method so we could redefine it with a new implementation but retain the ability to invoke the original implementation. This is analogous to prepending a module in order to redefine a method and then using super to access the original. I prefer using alias_method, however, because it’s easy to put things back the way they were by using it a second time to restore the original implementation. There’s no way to do the same thing with prepend, removing a module once it’s been prepended isn’t possible.
Overall, using prepend to add a module to a class leaves the inheritance hierarchy in a non-intuitive state. If you’re going to use it, think very carefully about how it affects method dispatching before proceeding.
Things to Remember
• Using the prepend method inserts a module before the receiver in the class hierarchy, which is much different than include which inserts a module between the receiver and its superclass.
• Similar to the included and extended module hooks, prepending a module triggers the prepended hook.