The Evolution of Attribute Methods - Metaprogramming in Rails - Metaprogramming Ruby 2: Program Like the Ruby Pros (2014)

Metaprogramming Ruby 2: Program Like the Ruby Pros (2014)

Part 2. Metaprogramming in Rails

Chapter 12. The Evolution of Attribute Methods

By this point in your reading, you’ve seen plenty of metaprogramming snippets and examples. However, you might still wonder what happens when you use metaprogramming in a real, large system. How do these sophisticated techniques fare in the messy world out there, where code often grows in complexity and evolves in unexpected directions?

To answer this question, we will close our tour with a look at attribute methods, one of Rails’ most popular features. Their source code contains a lot of metaprogramming, and it has been changing constantly since the first version of Rails. If we track the history of attribute methods, we’ll see what happened as their code became more complicated and nuanced.

One word of warning before we begin: there is plenty of complex code in this chapter, and it would be pointless to explain it in too much detail. Instead, I’ll just try to make a point by giving you a high-level idea of what’s going on. Don’t feel as if you have to understand each and every line of code as you read through the next few pages.

Let’s start with a quick example of attribute methods.

Attribute Methods in Action

Assume that you’ve created a database table that contains tasks.

part2/ar_attribute_methods.rb

require 'active_record'

ActiveRecord::Base.establish_connection :adapter => "sqlite3",

:database => "dbfile"

ActiveRecord::Base.connection.create_table :tasks do |t|

t.string :description

t.boolean :completed

end

Now you can define an empty Task class that inherits from ActiveRecord::Base, and you can use objects of that class to interact with the database:

class Task < ActiveRecord::Base; end

task = Task.new

task.description = 'Clean up garage'

task.completed = true

task.save

task.description # => "Clean up garage"

task.completed? # => true

The previous code calls four accessor methods to read and write the object’s attributes: two write accessors (description= and completed=), one read accessor (description), and one query accessor (completed?). None of these Mimic Methods (Mimic Method) comes from the definition of Task. Instead, Active Record generated them by looking at the columns of the tasks table. These automatically generated accessors are called attribute methods.

You probably expect that attribute methods such as description= are either Ghost Methods (Ghost Method) implemented through method_missing or Dynamic Methods (Dynamic Method) defined with define_method. Things are actually more complicated than that, as you’ll find out soon.

A History of Complexity

Instead of looking at the current implementation of attribute methods, let me go all the way back to 2004—the year that Rails 1.0.0 was unleashed on an unsuspecting world.

Rails 1: Simple Beginnings

In the very first version of Rails, the implementation of attribute methods was just a few lines of code:

gems/activerecord-1.0.0/lib/active_record/base.rb

module ActiveRecord

class Base

def initialize(attributes = nil)

@attributes = attributes_from_column_definition

# ...

end

def attribute_names

@attributes.keys.sort

end

alias_method :respond_to_without_attributes?, :respond_to?

def respond_to?(method)

@@dynamic_methods ||= attribute_names +

attribute_names.collect { |attr| attr + "=" } +

attribute_names.collect { |attr| attr + "?" }

@@dynamic_methods.include?(method.to_s) ?

true :

respond_to_without_attributes?(method)

end

def method_missing(method_id, *arguments)

method_name = method_id.id2name

if method_name =~ read_method? && @attributes.include?($1)

return read_attribute($1)

elsif method_name =~ write_method?

write_attribute($1, arguments[0])

elsif method_name =~ query_method?

return query_attribute($1)

else

super

end

end

def read_method?() /^([a-zA-Z][-_\w]*)[^=?]*$/ end

def write_method?() /^([a-zA-Z][-_\w]*)=.*$/ end

def query_method?() /^([a-zA-Z][-_\w]*)\?$/ end

def read_attribute(attr_name) # ...

def write_attribute(attr_name, value) #...

def query_attribute(attr_name) # ...

Take a look at the initialize method: when you create an ActiveRecord::Base object, its @attributes instance variable is populated with the name of the attributes from the database. For example, if the relevant table in the database has a column named description, then @attributes will contain the string "description", among others.

Now skip down to method_missing, where those attribute names become the names of Ghost Methods (Ghost Method). When you call a method such as description=, method_missing notices two things: first, description is the name of an attribute; and second, the name of description=matches the regular expression for write accessors. As a result, method_missing calls write_attribute("description"), which writes the value of the description in the database. A similar process happens for query accessors (that end in a question mark) and read accessors (that are just the same as attribute names).

In Chapter 3, Tuesday: Methods, you also learned that it’s generally a good idea to redefine respond_to? (or respond_to_missing?) together with method_missing. For example, if I can call my_task.description, then I expect that my_task.respond_to?(:description) returns true. TheActiveRecord::Base#respond_to? method is an Around Alias (Around Alias) of the original respond_to?, and it also checks whether a method name matches the rules for attribute readers, writers, or queries. The overridden respond_to? uses a Nil Guard (Nil Guard) to calculate those names only once, and store them in an @@dynamic_methods class variable.

I stopped short of showing you the code that accesses the database, such as read_attribute, write_attribute, and query_attribute. Apart from that, you’ve just looked at the entire implementation of attribute methods in Rails 1. By the time Rails 2 came out, however, this code had become more complex.

Rails 2: Focus on Performance

Do you remember the explanation of method_missing in Chapter 3, Tuesday: Methods? When you call a method that doesn’t exist, Ruby walks up the chain of ancestors looking for the method. If it reaches BasicObject without finding the method, then it starts back at the bottom and callsmethod_missing. This means that, in general, calling a Ghost Method (Ghost Method) is slower than calling a normal method, because Ruby has to walk up the entire chain of ancestors at least once.

In most concrete cases, this difference in performance between Ghost Methods and regular methods is negligible. In Rails, however, attribute methods are called very frequently. In Rails 1, each of those calls also had to walk up ActiveRecord::Base’s extremely long chain of ancestors. As a result, performance suffered.

The authors of Rails could solve this performance problem by replacing Ghost Methods with Dynamic Methods (Dynamic Method)—using define_method to create read, write, and query accessors for all attributes, and getting rid of method_missing altogether. Interestingly, however, they went for a mixed solution, including both Ghost Methods and Dynamic Methods. Let’s look at the result.

Ghosts Incarnated

If you check the source code of Rails 2, you’ll see that the code for attribute methods moved from ActiveRecord::Base itself to a separate ActiveRecord::AttributeMethods module, which is then included by Base. The original method_missing has also become complicated, so we will discuss it in two separate parts. Here is the first part:

gems/activerecord-2.3.2/lib/active_record/attribute_methods.rb

module ActiveRecord

module AttributeMethods

def method_missing(method_id, *args, &block)

method_name = method_id.to_s

if self.class.private_method_defined?(method_name)

raise NoMethodError.new("Attempt to call private method", method_name, args)

end

# If we haven't generated any methods yet, generate them, then

# see if we've created the method we're looking for.

if !self.class.generated_methods?

self.class.define_attribute_methods

if self.class.generated_methods.include?(method_name)

return self.send(method_id, *args, &block)

end

end

# ...

end

def read_attribute(attr_name) # ...

def write_attribute(attr_name, value) # ...

def query_attribute(attr_name) # ...

When you call a method such as Task#description= for the first time, the call is delivered to method_missing. Before it does its job, method_missing ensures that you’re not inadvertently bypassing encapsulation and calling a private method. Then it calls an intriguing-soundingdefine_attribute_methods method.

We’ll look at define_attribute_methods in a minute. For now, all you need to know is that it defines read, write, and query Dynamic Methods (Dynamic Method) for all the columns in the database. The next time you call description= or any other accessor that maps to a database column, your call isn’t handled by method_missing. Instead, you call a real, non-ghost method.

When you entered method_missing, description= was a Ghost Method (Ghost Method). Now description= is a regular flesh-and-blood method, and method_missing can call it with a Dynamic Dispatch (Dynamic Dispatch) and return the result. This process takes place only once for each class that inherits from ActiveRecord::Base. If you enter method_missing a second time for any reason, the class method generated_methods? returns true, and this code is skipped.

The following code shows how define_attribute_methods defines non-ghostly accessors.

gems/activerecord-2.3.2/lib/active_record/attribute_methods.rb

# Generates all the attribute related methods for columns in the database

# accessors, mutators and query methods.

def define_attribute_methods

returnif generated_methods?

columns_hash.each do |name, column|

unless instance_method_already_implemented?(name)

if self.serialized_attributes[name]

define_read_method_for_serialized_attribute(name)

elsif create_time_zone_conversion_attribute?(name, column)

define_read_method_for_time_zone_conversion(name)

else

define_read_method(name.to_sym, name, column)

end

end

unless instance_method_already_implemented?("#{name}=")

if create_time_zone_conversion_attribute?(name, column)

define_write_method_for_time_zone_conversion(name)

else

define_write_method(name.to_sym)

end

end

unless instance_method_already_implemented?("#{name}?")

define_question_method(name)

end

end

end

The instance_method_already_implemented? method is there to prevent involuntary Monkeypatches (Monkeypatch): if a method by the name of the attribute already exists, then this code skips to the next attribute. Apart from that, the previous code does little but delegate to other methods that do the real work, such as define_read_method or define_write_method.

As an example, take a look at define_write_method. I’ve marked the most important lines with arrows:

gems/activerecord-2.3.2/lib/active_record/attribute_methods.rb

*

def define_write_method(attr_name)

*

evaluate_attribute_method attr_name,

*

"def #{attr_name}=(new_value);write_attribute('#{attr_name}', new_value);end",

*

"#{attr_name}="

*

end

*

def evaluate_attribute_method(attr_name, method_definition, method_name=attr_name)

unless method_name.to_s == primary_key.to_s

generated_methods << method_name

end

begin

*

class_eval(method_definition, __FILE__, __LINE__)

rescue SyntaxError => err

generated_methods.delete(attr_name)

if logger

logger.warn "Exception occurred during reader method compilation."

logger.warn "Maybe #{attr_name} is not a valid Ruby identifier?"

logger.warn err.message

end

end

end

The define_write_method method builds a String of Code (String of Code) that is evaluated by class_eval. For example, if you call description=, then evaluate_attribute_method evaluates this String of Code:

def description=(new_value);write_attribute('description', new_value);end

Thus the description= method is born. A similar process happens for description, description?, and the accessors for all the other database columns.

Here’s a recap of what we’ve covered so far. When you access an attribute for the first time, that attribute is a Ghost Method (Ghost Method). ActiveRecord::Base#method_missing takes this chance to turn the Ghost Method into a real method. While it’s there, method_missing also dynamically defines read, write, and query accessors for all the other database columns. The next time you call that attribute or another database-backed attribute, you find a real accessor method waiting for you, and you don’t have to enter method_missing anymore.

However, this logic doesn’t apply to each and every attribute accessor, as you’ll discover by looking at the second half of method_missing.

Attributes That Stay Dynamic

As it turns out, there are cases where Active Record doesn’t want to define attribute accessors. For example, think of attributes that are not backed by a database column, such as calculated fields:

part2/ar_attribute_methods.rb

my_query = "tasks.*, (description like '%garage%') as heavy_job"

task = Task.find(:first, :select => my_query)

task.heavy_job? # => true

Attributes like heavy_job can be different for each object, so there’s no point in generating Dynamic Methods (Dynamic Method) to access them. The second half of method_missing deals with these attributes:

gems/activerecord-2.3.2/lib/active_record/attribute_methods.rb

module ActiveRecord

module AttributeMethods

def method_missing(method_id, *args, &block)

# ...

if self.class.primary_key.to_s == method_name

id

elsif md = self.class.match_attribute_method?(method_name)

attribute_name, method_type = md.pre_match, md.to_s

if @attributes.include?(attribute_name)

__send__("attribute#{method_type}", attribute_name, *args, &block)

else

super

end

elsif @attributes.include?(method_name)

read_attribute(method_name)

else

super

end

end

private

# Handle *? for method_missing.

def attribute?(attribute_name)

query_attribute(attribute_name)

end

# Handle *= for method_missing.

def attribute=(attribute_name, value)

write_attribute(attribute_name, value)

end

Look at the code in method_missing above. If you’re accessing the object’s identifier, then it returns its value. If you’re calling an attribute accessor, then it calls the accessor with either a Dynamic Dispatch (Dynamic Dispatch) (for write or query accessors) or a direct call to read_attribute(for read accessors). Otherwise, method_missing sends the call up the chain of ancestors with super.

I don’t want to waste your time with unnecessary details, so I only showed you part of the code for attribute methods in Rails 2. What you’ve seen, however, shows that both the feature and its code became more complicated in the second major version of Rails. Let’s see how this trend continued in the following versions.

Rails 3 and 4: More Special Cases

In Rails 1, attribute methods were implemented with a few dozen lines of code. In Rails 2, they had their own file and hundreds of lines of code. In Rails 3, they spanned nine files of source code, not including tests.

As Rails applications became larger and more sophisticated, the authors of the framework kept uncovering small twists, performance optimizations, and corner cases related to attribute methods. The code and the number of metaprogramming tricks it used grew with the number of corner cases. I’ll show you only one of those corner cases, but even this single example is too long to fit in this chapter, so I will just show you a few snippets of code as quickly as I can. Brace yourself.

The example I picked is one of the most extreme optimizations in modern Rails. We’ve seen that Rails 2 improved performance by turning Ghost Methods (Ghost Method) into Dynamic Methods (Dynamic Method). Rails 4 goes one step further: when it defines an attribute accessor, it also turns it into an UnboundMethod and stores it in a method cache. If a second class has an attribute by the same name, and hence needs the same accessor, Rails 4 just retrieves the previously defined accessor from the cache and binds it to the second class. This way, if different attributes in separate classes happen to have the same name, then Rails defines only a single set of accessor methods and reuses those methods for all attributes. (I’m as surprised as you are that this optimization has a visible effect on performance—but in the case of Rails, it does.)

I’ll start with code from deep inside the attribute methods implementation:

gems/activerecord-4.1.0/lib/active_record/attribute_methods/read.rb

module ActiveRecord

module AttributeMethods

module Read

extend ActiveSupport::Concern

module ClassMethods

if Module.methods_transplantable?

def define_method_attribute(name)

method = ReaderMethodCache[name]

generated_attribute_methods.module_eval { define_method name, method }

end

else

def define_method_attribute(name)

# ...

end

end

This code defines a method named define_method_attribute. This method will ultimately become a class method of ActiveRecord::Base, thanks to the mechanism we discussed in Chapter 10, Active Support’s Concern Module. Here, however, comes a twist: define_method_attribute is defined differently depending on the result of the Module.methods_transplantable? method.

Module.methods_transplantable? comes from the Active Support library, and it answers one very specific question: can I bind an UnboundMethod to an object of a different class? In Unbound Methods, I mentioned that you can only do that from Ruby 2.0 onward, so this code definesdefine_method_attribute in two different ways depending on whether you’re running Rails on Ruby 1.9 or 2.x.

Assume that you’re running Ruby 2.0 or later. In this case, define_method_attribute retrieves an UnboundMethod from a cache of methods, and it binds the method to the current module with define_method. The cache of methods is stored in a constant named ReaderMethodCache.

(The call to generated_attribute_methods might look confusing—it returns a Clean Room (Clean Room) that serializes method definitions happening in different threads.)

Let’s go see how ReaderMethodCache is initialized. The long comment gives an idea of how tricky it must have been to write this code:

gems/activerecord-4.1.0/lib/active_record/attribute_methods/read.rb

module ActiveRecord

module AttributeMethods

module Read

ReaderMethodCache = Class.new(AttributeMethodCache) {

private

# We want to generate the methods via module_eval rather than

# define_method, because define_method is slower on dispatch.

# Evaluating many similar methods may use more memory as the instruction

# sequences are duplicated and cached (in MRI). define_method may

# be slower on dispatch, but if you're careful about the closure

# created, then define_method will consume much less memory.

#

# But sometimes the database might return columns with

# characters that are not allowed in normal method names (like

# 'my_column(omg)'. So to work around this we first define with

# the __temp__ identifier, and then use alias method to rename

# it to what we want.

#

# We are also defining a constant to hold the frozen string of

# the attribute name. Using a constant means that we do not have

# to allocate an object on each call to the attribute method.

# Making it frozen means that it doesn't get duped when used to

# key the @attributes_cache in read_attribute.

def method_body(method_name, const_name)

<<-EOMETHOD

def #{method_name}

name = ::ActiveRecord::AttributeMethods::AttrNames::ATTR_#{const_name}

read_attribute(name) { |n| missing_attribute(n, caller) }

end

EOMETHOD

end

}.new

ReaderMethodCache is an instance of an anonymous class—a subclass of AttributeMethodCache. This class defines a single method that returns a String of Code (String of Code). (If you’re perplexed by the call to Class.new, take a look back at Quiz: Class Taboo. If you don’t understand theEOMETHOD lines, read about “here documents” in The REST Client Example.)

Let’s leave ReaderMethodCache for a moment and move to the definition of its superclass AttributeMethodCache:

gems/activerecord-4.1.0/lib/active_record/attribute_methods.rb

module ActiveRecord

module AttributeMethods

AttrNames = Module.new {

def self.set_name_cache(name, value)

const_name = "ATTR_#{name}"

unless const_defined? const_name

const_set const_name, value.dup.freeze

end

end

}

class AttributeMethodCache

def initialize

@module = Module.new

@method_cache = ThreadSafe::Cache.new

end

def [](name)

@method_cache.compute_if_absent(name) do

safe_name = name.unpack('h*').first

temp_method = "__temp__#{safe_name}"

ActiveRecord::AttributeMethods::AttrNames.set_name_cache safe_name, name

@module.module_eval method_body(temp_method, safe_name),

__FILE__, __LINE__

@module.instance_method temp_method

end

end

private

def method_body; raise NotImplementedError; end

end

First, look at AttrNames: it’s a module with one single method, set_name_cache. Given a name and a value, set_name_cache defines a conventionally named constant with that value. For example, if you pass it the string "description", then it defines a constant named ATTR_description.AttrNames is somewhat similar to a Clean Room (Clean Room); it only exists to store constants that represent the names of attributes.

Now move down to AttributeMethodCache. Its [] method takes the name of an attribute, and it returns an accessor to that attribute as an UnboundMethod. It also takes care of at least one important special case: attribute accessors are Ruby methods, but not all attributes names are valid Ruby method names. (You can read one counterexample in the comment to ReaderMethodCache#method_body above.) This code solves that problem by decoding the attribute name to an hexadecimal sequence and creating a conventional safe method name from it.

Once it has a safe name for the accessor, AttributeMethodCache#[] calls method_body to get a String of Code that defines the accessor’s body, and it defines the accessor inside a Clean Room named simply @module. (We discussed additional arguments to method_eval, such as __FILE__and __LINE__, in The irb Example.) Finally, AttributeMethodCache#[] gets the newly created accessor method from the Clean Room and returns it as an UnboundMethod.

On subsequent calls, AttributeMethodCache#[] won’t need to define the method anymore. Instead, @method_cache.compute_if_absent will store the result and return it automatically. This policy shaves some time off in cases where the same accessor is defined on multiple classes.

To close the loop, look back at the code of ReaderMethodCache. By overriding method_body and returning the String of Code for a read accessor, ReaderMethodCache turns the generic AttributeMethodCache into a cache for read accessors. As you might expect, there is also aWriterMethodCache class that takes care of write accessors.

Is your head spinning a little after this long explanation? Mine is. This example shows how deep and complex attribute methods have become, how many special cases they have covered, and how much they’ve changed since their simple beginnings. Now we can draw some general conclusions.

A Lesson Learned

Here is one question that developers often ask themselves: How many special cases should I cover in my code? On one extreme, you could always strive for code that is perfect right from the start and leaves no stones unturned. Let’s call this approach Do It Right the First Time. On the other extreme, you might put together some simple code that just solves your obvious problem today, and maybe make it more comprehensive later, as you uncover more special cases. Let’s call this approach Evolutionary Design. The act of designing code largely consists of striking the right balance between these two approaches.

What do Rails’ attribute methods teach us about design? In Rails 1, the code for accessor methods was so simple, you might consider it simplistic. While it was correct and good enough for simple cases, it ignored many nonobvious use cases, and its performance turned out to be problematic in large applications. As the needs of Rails users evolved, the authors of the framework kept working to make it more flexible. This is a great example of Evolutionary Design.

Think back to the optimization in Rails 2: Focus on Performance. Most attribute accessors, in particular those that are backed by database tables, start their lives as Ghost Methods (Ghost Method). When you access an attribute for the first time, Active Record takes the opportunity to turn most of those ghosts into Dynamic Methods (Dynamic Method). Some other accessors, such as accessors to calculated fields, never become real methods, and they remain ghosts forever.

This is one of a number of different possible designs. The authors of Active Record had no shortage of alternatives, including the following:

· Never define accessors dynamically, relying on Ghost Methods exclusively.

· Define accessors when you create the object, in the initialize method.

· Define accessors only for the attribute that is being accessed, not for the other attributes.

· Always define all accessors for each object, including accessors for calculated fields.

· Define accessors with define_method instead of a String of Code.

I don’t know about you, but I wouldn’t have been able to pick among all of these options just by guessing which ones are faster. How did the authors of Active Record settle on the current design? You can easily imagine them trying a few alternative designs, then profiling their code in a real-life system to discover where the performance bottlenecks were…and then optimizing.

The previous example focused on optimizations, but the same principles apply to all aspects of Rails’ design. Think about the code in Rails 2 that prevents you from using method_missing to call a private method—or the code in Rails 4 that maps column names in the database to safe Ruby method names. You could certainly foresee special cases such as these, but catching them all could prove very hard. It’s arguably easier to cover a reasonable number of special cases like Rails 1 did, and then change your code as more special cases become visible.

Rails’ approach seems to be very much biased toward Evolutionary Design rather than Do It Right the First Time. There are two obvious reasons for that. First, Ruby is a flexible, pliable language, especially when you use metaprogramming, so it’s generally easy to evolve your code as you go. And second, writing perfect metaprogramming code up front can be hard, because it can be difficult to uncover every possible corner case.

To sum it all up in a single sentence: keep your code as simple as possible, and add complexity as you need it. When you start, strive to make your code correct in the general cases, and simple enough that you can add more special cases later. This is a good rule of thumb for most code, but it seems to be especially relevant when metaprogramming is involved.

This last consideration also leads us to a final, deeper lesson—one that has to do with the meaning of metaprogramming itself.