Active Record Associations - The Rails 4 Way (2014)

The Rails 4 Way (2014)

Chapter 7. Active Record Associations

Any time you can reify something, you can create something that embodies a concept, it gives you leverage to work with it more powerfully. That’s exactly what’s going on with has_many :through.

—Josh Susser

Active Record associations let you declaratively express relationships between model classes. The power and readability of the Associations API is an important part of what makes working with Rails so special.

This chapter covers the different kinds of Active Record associations available while highlighting use cases and available customizations for each of them. We also take a look at the classes that give us access to relationships themselves.

7.1 The Association Hierarchy

Associations typically appear as methods on Active Record model objects. For example, the method timesheets might represent the timesheets associated with a given user.

user.timesheets

However, people might get confused about the type of objects that are returned by association with these methods. This is because they have a way of masquerading as plain old Ruby objects. For instance, in previous versions of Rails, an association collection would seem to return an array of objects, when in fact the return type was actually an association proxy. As of Rails 4, asking any association collection what its return type is will tell you that it is an ActiveRecord::Associations::CollectionProxy:

>> user.timesheets

=> #<ActiveRecord::Associations::CollectionProxy []>

It’s actually lying to you, albeit very innocently. Association methods for has_many associations are actually instances of HasManyAssociation.

The CollectionProxy acts like a middleman between the object that owns the association, and the actual associated object. Methods that are unknown to the proxy are sent to the target object via method_missing.

Fortunately, it’s not the Ruby way to care about the actual class of an object. What messages an object responds to is a lot more significant.

The parent class of all has_many associations is CollectionAssociation and most of the methods that it defines work similarly regardless of the options declared for the relationship. Before we get much further into the details of the association proxies, let’s delve into the most fundamental type of association that is commonly used in Rails applications: the has_many / belongs_to pair, used to define one-to-many relationships.

7.2 One-to-Many Relationships

In our recurring sample application, an example of a one-to-many relationship is the association between the User, Timesheet, and ExpenseReport classes:

1 classUser < ActiveRecord::Base

2 has_many :timesheets

3 has_many :expense_reports

4 end

Timesheets and expense reports should be linked in the opposite direction as well, so that it is possible to reference the user to which a timesheet or expense report belongs.

1 classTimesheet < ActiveRecord::Base

2 belongs_to :user

3 end

4

5 classExpenseReport < ActiveRecord::Base

6 belongs_to :user

7 end

When these relationship declarations are executed, Rails uses some metaprogramming magic to dynamically add code to your models. In particular, proxy collection objects are created that let you manipulate the relationship easily. To demonstrate, let’s play with these relationships in the console. First, I’ll create a user.

>> obie = User.create login: 'obie', password: '1234',

password_confirmation: '1234', email: 'obiefernandez@gmail.com'

=> #<User...>

Now I’ll verify that I have collections for timesheets and expense reports.

>> obie.timesheets

Timesheet Load (0.4ms) SELECT "timesheets".* FROM "timesheets" WHERE

"timesheets"."user_id" = ? [[nil, 1]]

SQLite3::SQLException: no such column: timesheets.user_id: SELECT

"timesheets".* FROM "timesheets" WHERE "timesheets"."user_id" = ?

As David might say, “Whoops!” I forgot to add the foreign key columns to the timesheets and expense_reports tables, so in order to go forward I’ll generate a migration for the changes:

$ rails generate migration add_user_foreign_keys

invoke active_record

create db/migrate/20130330201532_add_user_foreign_keys.rb

Then I’ll open db/migrate/20130330201532_add_user_foreign_keys.rb and add the missing columns. (Using change_table would mean writing many more lines of code, so we’ll stick with the traditional add_column syntax, which still works fine.)

1 classAddUserForeignKeys < ActiveRecord::Migration

2 def change

3 add_column :timesheets, :user_id, :integer

4 add_column :expense_reports, :user_id, :integer

5 end

6 end

Running rake db:migrate applies the changes:

$ rake db:migrate

== AddUserForeignKeys: migrating========================================

-- add_column(:timesheets, :user_id, :integer)

-> 0.0011s

-- add_column(:expense_reports, :user_id, :integer)

-> 0.0005s

== AddUserForeignKeys: migrated (0.0018s) ==============================

Index associations for performance boost

Premature optimization is the root of all evil. However, most experienced Rails developers don’t mind adding indexes for foreign keys at the time that those are created. In the case of our migration example, you’d add the following statements:

1 add_index :timesheets, :user_id

2 add_index :expense_reports, :user_id

Loading of your associations (which is usually more common than creation of items) will get a big performance boost.

Now I should be able to add a new blank timesheet to my user and check timesheets again to make sure it’s there:

>> obie = User.find(1)

=> #<User id: 1...>

>> obie.timesheets << Timesheet.new

=> #<ActiveRecord::Associations::CollectionProxy [#<Timesheet id: 1 ...]>

>> obie.timesheets

=> #<ActiveRecord::Associations::CollectionProxy [#<Timesheet id: 1 ...]>

Notice that the Timesheet object gains an id immediately.

7.2.1 Adding Associated Objects to a Collection

As you can deduce from the previous example, appending an object to a has_many collection automatically saves that object. That is, unless the parent object (the owner of the collection) is not yet stored in the database. Let’s make sure that’s the case using Active Record’s reload method, which re-fetches the attributes of an object from the database:

>> obie.timesheets.reload

=> #<ActiveRecord::Associations::CollectionProxy [#<Timesheet id: 1, user_id: 1 ...]>

There it is. The foreign key, user_id, was automatically set by the << method. It takes one or more association objects to add to the collection, and since it flattens its argument list and inserts each record, push and concat behave identically.

In the blank timesheet example, I could have used the create method on the association proxy, and it would have worked essentially the same way:

>> obie.timesheets.create

=> #<ActiveRecord::Associations::CollectionProxy [#<Timesheet id: 1, user_id: 1 ...]>

Even though at first glance << and create do the same thing, there are some important differences in how they’re implemented that are covered in the following section.

7.2.2 Association Collection Methods

Association collections are basically fancy wrappers around a Ruby array, and have all of a normal array’s methods. Named scopes and all of ActiveRecord::Base’s class methods are also available on association collections, including find, order, where, etc.

user.timesheets.where(submitted: true).order('updated_at desc')

user.timesheets.late # assuming a scope :late defined on the Timesheet class

The following methods of CollectionProxy are available to association collections:

7.2.2.1 <<(*records) and create(attributes = {})

Both methods will add either a single associated object or many, depending on whether you pass them an array or not. They both also trigger the :before_add and :after_add callbacks (covered in this chapter’s options section for has_many).

Finally, the return value behavior of both methods varies wildly. The create method returns the new instance created, which is what you’d expect given its counterpart in ActiveRecord::Base. The << method returns the association proxy, which allows chaining and is also natural behavior for a Ruby array.

However, << will return false and not itself if any of the records being added causes the operation to fail. You shouldn’t depend on the return value of << being an array that you can continue operating on in a chained fashion.

7.2.2.2 any? and many?

The any? method behaves like its Enumerable counterpart if you give it a block, otherwise it’s the opposite of empty? Its companion method many?, which is an ActiveSupport extension to Enumerable, returns true if the size of the collection is greater than one, or if a block is given, if two or more elements match the supplied criteria.

7.2.2.3 average(column_name, options = {})

Convenience wrapper for calculate(:average, ...)

7.2.2.4 build(attributes={}, &block)

Traditionally, the build method has corresponded to the new method of Active Record classes, except that it presets the owner’s foreign key and appends it to the association collection in one operation. However, as of Rails 2.2, the new method has the same behavior and probably should be used instead of build.

user.timesheets.build(attributes)

user.timesheets.new(attributes) # same as calling build

One possible reason to still use build is that as a convenience, if the attributes parameter is an array of hashes (instead of just one) then build executes for each one. However, you would usually accomplish that kind of behavior using accepts_nested_attributes_for on the owning class, covered in Chapter 11, “All About Helpers”, in the section about fields_for.

7.2.2.5 calculate(operation, column_name, options = {})

Provides aggregate (:sum, :average, :minimum and :maximum) values within the scope of associated records. Covered in detail in Chapter 9, “Advanced Active Record”.

7.2.2.6 clear

The clear method is similar to invoking delete_all (covered lated in this section), however instead of returning an array of deleted objects, it is chainable.

7.2.2.7 count(column_name=nil, options={})

Counts all associated records in the database. The first parameter, column_name gives you the option of counting on a column instead of generating COUNT(*) in the resulting SQL. If the :counter_sql option is set for the association, it will be used for the query, otherwise you can pass a custom value via the options hash of this method.

Assuming that no :counter_sql or :finder_sql options are set on the association, nor passed to count, the target class’s count method is used, scoped to only count associated records.

7.2.2.8 create(attributes, &block) and create!(attributes, &block)

Instantiate a new record with its foreign key attribute set to the owner’s id, add it to the association collection, and save it, all in one method call. The bang variant raises Active::RecordInvalid if saving fails, while the non-bang variant returns true or false, as you would expect it to based on the behavior of create methods in other places.

The owning record must be saved in order to use create, otherwise an ActiveRecord::RecordNotSaved exception is raised.

>> User.new.timesheets.create

ActiveRecord::RecordNotSaved: You cannot call create unless the parent is saved

If a block is passed to create or create!, it will get yielded the newly-created instance after the passed-in attributes are assigned, but before saving the record to the database.

7.2.2.9 delete(*records) and delete_all

The delete and delete_all methods are used to sever specified associations, or all of them, respectively. Both methods operate transactionally.

Invoking delete_all executes a SQL UPDATE that sets foreign keys for all currently associated objects to nil, effectively disassociating them from their parent.

Note

The names of the delete and delete_all methods can be misleading. By default, they don’t delete anything from the database—they only sever associations by clearing the foreign key field of the associated record. This behavior is related to the :dependent option, which defaults to :nullify. If the association is configured with the :dependent option set to :delete or :destroy, then the associated records will actually be deleted from the database.

7.2.2.10 destroy(*records) and destroy_all

The destroy and destroy_all methods are used to remove specified associations from the database, or all of them, respectively. Both methods operate transactionally.

The destroy_all method takes no parameters; it’s an all or nothing affair. When called, it begins a transaction and invokes destroy on each object in the association, causing them all to be deleted from the database with individual DELETE SQL statements. There are load issues to consider if you plan to use this method with large association collections, since many objects will be loaded into memory at once.

7.2.2.11 empty?

Simply calls size.zero?

7.2.2.12 find(id)

Find an associated record by id, a really common operation when dealing with nested RESTful resources. Raises ActiveRecord::RecordNotFound exception if either the id or foreign_key of the owner record is not found.

7.2.2.13 first(*args)

Returns the first associated record. Wondering how Active Record figures out whether to go to the database instead of loading the entire association collection into memory?

1 def fetch_first_or_last_using_find?(args)

2 if args.first.is_a?(Hash)

3 true

4 else

5 !(loaded? ||

6 owner.new_record? ||

7 options[:finder_sql] ||

8 target.any? { |record| record.new_record? || record.changed? } ||

9 args.first.kind_of?(Integer))

10 end

11 end

Passing first an integer argument mimics the semantics of Ruby’s Array#first, returning that number of records.

>> c = Client.first

=> #<Client id: 1, name: "Taigan", code: "TAIGAN", created_at: "2010-01-24

03:18:58", updated_at: "2010-01-24 03:18:58">

>> c.billing_codes.first(2)

=> [#<BillingCode id: 1, client_id: 1, code: "MTG", description: "Meetings">,

#<BillingCode id: 2, client_id: 1, code: "DEV", description: "Development">]

7.2.2.14 ids

Convenience wrapper for pluck(primary_key), covered in detail in Chapter 9, “Advanced Active Record”.

7.2.2.15 include?(record)

Checks to see if the supplied record exists in the association collection and that it still exists in the underlying database table.

7.2.2.16 last(*args)

Returns the last associated record. Refer to description of first earlier in this section for more details—it behaves exactly the same except for the obvious.

7.2.2.17 length

Returns the size of the collection by loading it and calling size on the array.

7.2.2.18 maximum(column_name, options = {})

Convenience wrapper for calculate(:maximum, ...), covered in detail in Chapter 9, “Advanced Active Record”.

7.2.2.19 minimum(column_name, options = {})

Convenience wrapper for calculate(:minimum, ...), covered in detail in Chapter 9, “Advanced Active Record”.

7.2.2.20 new(attributes, &block)

Instantiate a new record with its foreign key attribute set to the owner’s id, and add it to the association collection, in one method call.

7.2.2.21 pluck(*column_names)

Returns an array of attribute values, covered in detail in Chapter 9, “Advanced Active Record”.

7.2.2.22 replace(other_array)

Replaces the collection with other_array. Works by deleting objects that exist in the current collection, but not in other_array and inserting (using concat) objects that don’t exist in the current collection, but do exist in other_array.

7.2.2.23 select(select=nil, &block)

The select method allows the specification one or many attributes to be selected for an association result set.

>> user.timesheets.select(:submitted).to_a

=> [#<Timesheet id: nil, submitted: false>,

#<Timesheet id: nil, submitted: true>]

>> user.timesheets.select([:id,:submitted]).to_a

=> [#<Timesheet id: 1, submitted: false>,

#<Timesheet id: 2, submitted: true>]

Keep in mind that only attributes specified will be populated in the resulting objects! For instance, continuing the first example, trying to access updated_at on any of the returned timesheets, results in an ActiveModel::MissingAttributeError exception being raised.

>> timesheet = user.timesheets.select(:submitted).first

=> #<Timesheet id: nil, submitted: false>

>> timesheet.updated_at

ActiveModel::MissingAttributeError: missing attribute: updated_at

Alternatively, passing a block to the select method behaves similarly to Array#select. The result set from the database scope is converted into an array of objects, and iterated through using Array#select, including only objects where the specified block returns true.

7.2.2.24 size

If the collection has already been loaded, or its owner object has never been saved, the size method simply returns the size of the current underlying array of associated objects. Otherwise, assuming default options, a SELECT COUNT(*) query is executed to get the size of the associated collection without having to load any objects. The query is bounded to the :limit option of the association, if there is any set.

Note that if there is a counter_cache option set on the association, then its value is used instead of hitting the database.

When you know that you are starting from an unloaded state and it’s likely that there are associated records in the database that you will need to load no matter what, it’s more efficient to use length instead of size.

Some association options, such as :group and :uniq, come into play when calculating size—basically they will always force all objects to be loaded from the database so that the resulting size of the association array can be returned.

7.2.2.25 sum(column_name, options = {})

Convenience wrapper for calculate(:sum, ...), covered in detail in Chapter 9, “Advanced Active Record”.

7.2.2.26 uniq

Iterates over the target collection and populates an Array with the unique values present. Keep in mind that equality of Active Record objects is determined by identity, meaning that the value of the id attribute is the same for both objects being compared.

A Warning About Association Names

Don’t create associations that have the same name as instance methods of ActiveRecord::Base. Since the association adds a method with that name to its model, it will override the inherited method and break things. For instance, attributes and connection would make really bad choices for association names.

7.3 The belongs_to Association

The belongs_to class method expresses a relationship from one Active Record object to a single associated object for which it has a foreign key attribute. The trick to remembering whether a class “belongs to” another one is considering which has the foreign key column in its database table.

Assigning an object to a belongs_to association will set its foreign key attribute to the owner object’s id, but will not save the record to the database automatically, as in the following example:

>> timesheet = Timesheet.create

=> #<Timesheet id: 1409, user_id: nil...>

>> timesheet.user = obie

=> #<User id: 1, login: "obie"...>

>> timesheet.user.login

=> "obie"

>> timesheet.reload

=> #<Timesheet id: 1409, user_id: nil...>

Defining a belongs_to relationship on a class creates a method with the same name on its instances. As mentioned earlier, the method is actually a proxy to the related Active Record object and adds capabilities useful for manipulating the relationship.

7.3.1 Reloading the Association

Just invoking the association method will query the database (if necessary) and return an instance of the related object. The method takes a force_reload parameter that tells Active Record whether to reload the related object, if it happens to have been cached already by a previous access.

In the following capture from my console, I look up a timesheet and take a peek at the object_id of its related user object. Notice that the second time I invoke the association via user, the object_id remains the same. The related object has been cached. However, passing true to the accessor reloads the relationship and I get a new instance.

>> ts = Timesheet.first

=> #<Timesheet id: 3, user_id: 1...>

>> ts.user.object_id

=> 70279541443160

>> ts.user.object_id

=> 70279541443160

>> ts.user(true).object_id

=> 70279549419740

7.3.2 Building and Creating Related Objects via the Association

During the belongs_to method’s metaprogramming it also adds factory methods for creating new instances of the related class and attaching them via the foreign key automatically.

The build_association method does not save the new object, but the create_association method does. Both methods take an optional hash of attribute parameters with which to initialize the newly instantiated objects. Both are essentially one-line convenience methods, which I don’t find particularly useful. It just doesn’t usually make sense to create instances in that direction!

To illustrate, I’ll simply show the code for building a User from a Timesheet or creating a Client from a BillingCode, neither of which would ever happen in real code because it just doesn’t make sense to do so:

>> ts = Timesheet.first

=> #<Timesheet id: 3, user_id: 1...>

>> ts.build_user

=> #<User id: nil, email: nil...>

>> bc = BillingCode.first

=> #<BillingCode id: 1, code: "TRAVEL"...>

>> bc.create_client

=> #<Client id: 1, name=>nil, code=>nil...>

You’ll find yourself creating instances of belonging objects from the has_many side of the relationship much more often.

7.3.3 belongs_to Options

The following options can be passed in a hash to the belongs_to method.

7.3.3.1 autosave: true

Whether to automatically save the owning record whenever this record is saved. Defaults to false.

7.3.3.2 :class_name

Assume for a moment that we wanted to establish another belongs_to relationship from the Timesheet class to User, this time modelling the relationship to the approver of the timesheet. You might start by adding an approver_id column to the timesheets table and an authorized_approvercolumn to the users table via a migration. Then you would add a second belongs_to declaration to the Timesheet class:

1 classTimesheet < ActiveRecord::Base

2 belongs_to :approver

3 belongs_to :user

4 ...

Active Record won’t be able to figure out what class you’re trying to link with just the information provided, because you’ve (legitimately) acted against the Rails convention of naming a relationship according to the related class. It’s time for a :class_name parameter.

1 classTimesheet < ActiveRecord::Base

2 belongs_to :approver, class_name: 'User'

3 belongs_to :user

4 ...

7.3.3.3 :counter_cache

Use this option to make Rails automatically update a counter field on the associated object with the number of belonging objects. The option value can be true, in which case the pluralized name of the belonging class plus _count is used, or you can supply your own column name to be used:

counter_cache: true

counter_cache: :number_of_children

If a significant percentage of your association collections will be empty at any given moment, you can optimize performance at the cost of some extra database storage by using counter caches liberally. The reason is that when the counter cache attribute is at zero, Rails won’t even try to query the database for the associated records!

Note

The value of the counter cache column must be set to zero by default in the database! Otherwise the counter caching won’t work at all. It’s because the way that Rails implements the counter caching behavior is by adding a simple callback that goes directly to the database with an UPDATE command and increments the value of the counter. If you’re not careful, and neglect to set a default value of 0 for the counter cache column on the database, or misspell the column name, the counter cache will still seem to work! There is a magic method on all classes with has_many associations called collection_count`, just like the counter cache. It will return a correct count value based on the in-memory object, even if you don’t have a counter cache option set or the counter cache column value is null!

In the case that a counter cache was altered on the database side, you may tell Active Record to reset a potentially stale value to the correct count via the class method reset_counters. It’s parameters are the id of the object and a list of association names.

Timesheet.reset_counters(5, :weeks)

7.3.3.4 dependent: :destroy or :delete

Specifies a rule that the associated owner record should be destroyed or just deleted from the database, depending on the value of the option. When triggered, :destroy will call the dependent’s callbacks, whereas :delete will not.

Usage of this option might make sense in a has_one / belongs_to pairing. However, it is really unlikely that you want this behavior on has_many / belongs_to relationship; it just doesn’t seem to make sense to code things that way. Additionally, if the owner record has its :dependent option set on the corresponding has_many association, then destroying one associated record will have the ripple effect of destroying all of its siblings.

7.3.3.5 foreign_key: column_name

Specifies the name of the foreign key column that should be used to find the associated object. Rails will normally infer this setting from the name of the association, by adding _id to it. You can override the inferred foreign key name with this option if necessary.

# without the explicit option, Rails would guess administrator_id

belongs_to :administrator, foreign_key: 'admin_user_id'

7.3.3.6 inverse_of: name_of_has_association

Explicitly declares the name of the inverse association in a bi-directional relationship. Considered an optimization, use of this option allows Rails to return the same instance of an object no matter which side of the relationship it is accessed from.

Covered in detail in Section “inverse_of: name_of_belongs_to_association.

7.3.3.7 polymorphic: true

Use the :polymorphic option to specify that an object is related to its association in a polymorphic way, which is the Rails way of saying that the type of the related object is stored in the database along with its foreign key. By making a belongs_to relationship polymorphic, you abstract out the association so that any other model in the system can fill it.

Polymorphic associations let you trade some measure of relational integrity for the convenience of implementation in child relationships that are reused across your application. Common examples are models such as photo attachments, comments, notes, line items, and so on.

Let’s illustrate by writing a Comment class that attaches to its subjects polymorphically. We’ll associate it to both expense reports and timesheets. Listing 7.1 has the schema information in migration code, followed by the code for the classes involved. Notice the :subject_type column, which stores the class name of the associated class.

Listing 7.1: Comment class using polymorphic belongs to relationship


1 create_table :comments do |t|

2 t.text :body

3 t.references :subject, polymorphic: true

4

5 # references can be used as a shortcut for following two statements

6 # t.integer :subject_id

7 # t.string :subject_type

8

9 t.timestamps

10 end

11

12 classComment < ActiveRecord::Base

13 belongs_to :subject, polymorphic: true

14 end

15

16 classExpenseReport < ActiveRecord::Base

17 belongs_to :user

18 has_many :comments, as: :subject

19 end

20

21 classTimesheet < ActiveRecord::Base

22 belongs_to :user

23 has_many :comments, as: :subject

24 end


As you can see in the ExpenseReport and Timesheet classes of Listing 7.1, there is a corresponding syntax where you give Active Record a clue that the relationship is polymorphic by specifying as: :subject. We haven’t covered has_many’s options yet in this chapter, and polymorphic relationships have their own section in Chapter 9, “Advanced Active Record”.

7.3.3.8 primary_key: column_name

You should never need to use this option, except perhaps with strange legacy database schemas. It allows you to specify a surrogate column on the owning record to use as the target of the foreign key, instead of the usual primary key.

7.3.3.9 touch: true or column_name

“Touches” the owning record’s updated_at timestamp, or a specific timestamp column specified by column_name, if it is supplied. Useful for caching schemes where timestamps are used to invalidate cached view content. The column_name option is particularly useful here, if you want to do fine-grained fragment caching of the owning record’s view.

For example, let’s set the foundation for doing just that with the User / Timesheet association:

$ rails generate migration AddTimesheetsUpdatedAtToUsers timesheets_updated_at:datetime

invoke active_record

create db/migrate/20130413175038_add_timesheets_updated_at_to_users.rb

$ rake db:migrate

== AddTimesheetsUpdatedAtToUsers: migrating ==================================

-- add_column(:users, :timesheets_updated_at, :datetime)

-> 0.0005s

== AddTimesheetsUpdatedAtToUsers: migrated (0.0005s) =========================

1 classTimesheet < ActiveRecord::Base

2 belongs_to :user, touch: :timesheets_updated_at

3 ...

7.3.3.10 validate: true

Defaults to false on belongs_to associations, contrary to its counterpart setting on has_many. Tells Active Record to validate the owner record, but only in circumstances where it would normally save the owning record, such as when the record is new and a save is required in order to get a foreign key value.

tip

Tim says…

Use validates_associated if you want association validation outside of automatic saving.

7.3.4 belongs_to Scopes

Sometimes the need arises to have a relationship that must satisfy certain conditions in order for it to be valid. To facilitate this, Rails allows us to supply chain query criteria, or a “scope”, to a relationship definition as an optional second block argument. Active Record scopes are covered in detail in Chapter 9.

7.3.4.1 where(*conditions)

To illustrate supplying a condition to a belongs_to relationship, let’s assume that the users table has a column approver:

1 classTimesheet < ActiveRecord::Base

2 belongs_to :approver,

3 -> { where(approver: true) },

4 class_name: 'User'

5 ...

6 end

Now in order for the assignment of a user to the approver field to work, that user must be authorized. I’ll go ahead and add a spec that both indicate the intention of my code and show it in action. I turn my attention to spec/models/timesheet_spec.rb

1 require 'spec_helper'

2

3 describe Timesheet do

4 subject(:timesheet) { Timesheet.create }

5

6 describe '#approver' do

7 it 'may have a user associated as an approver' do

8 timesheet.approver = User.create(approver: true)

9 expect(timesheet.approver).to be

10 end

11 end

12 end

It’s a good start, but I also want to make sure something happens to prevent the system from assigning a non-authorized user to the approver field, so I add another spec:

1 it 'cannot be associated with a non-authorized user' do

2 timesheet.approver = User.create(approver: false)

3 expect(timesheet.approver).to_not be

4 end

I have my suspicions about the validity of that spec, though, and as I half-expected, it doesn’t really work the way I want it to work:

1) Timesheet#approver cannot be associated with a non-authorized user

Failure/Error: expect(timesheet.approver).to_not be

expected #<User id: 1, approver: false ...> to evaluate to false

The problem is that Active Record (for better or worse, probably worse) allows me to make the invalid assignment. The scope option only applies during the query to get the association back from the database. I’ll have some more work ahead of me to achieve the desired behavior, but I’ll go ahead and prove out Rails’ actual behavior by fixing my specs. I’ll do so by passing true to the approver method’s optional force_reload argument, which tells it to reload its target object:

1 describe Timesheet do

2 subject(:timesheet) { Timesheet.create }

3

4 describe '#approver' do

5 it 'may have a user associated as an approver' do

6 timesheet.approver = User.create(approver: true)

7 timesheet.save

8 expect(timesheet.approver(true)).to be

9 end

10

11 it 'cannot be associated with a non-authorized user' do

12 timesheet.approver = User.create(approver: false)

13 timesheet.save

14 expect(timesheet.approver(true)).to_not be

15 end

16 end

17 end

Those two specs do pass, but note that I went ahead saved the timesheet, since just assigning a value to it will not save the record. Then, as mentioned, I took advantage of the force_reload parameter to make Rails reload approver from the database, and not just simply give me the same instance I originally assigned to it.

The lesson to learn is that providing a scope on relationships never affect the assignment of associated objects, only how they’re read back from the database. To enforce the rule that a timesheet approver must be authorized, you’d need to add a before_save callback to the Timesheet class itself. Callbacks are covered in detail at the beginning of Chapter 9, “Advanced Active Record”.

7.3.4.2 includes

In previous versions of Rails, relationship definitions had an :include option, that would take a list of second-order association names (on the owning record) that should be eagerly-loaded when the current object was loaded. As of Rails 4, the way to do this is supplying an includes query method to the scope argument of a relationship.

belongs_to :post, -> { includes(:author) }

In general, this technique is used to knock N+1 select operations down to N plus the number associations being included. It is rare to use this technique on a belongs_to, rather than on the has_many side.

If necessary, due to conditions or orders referencing tables other than the main one, a SELECT statement with the necessary LEFT OUTER JOINS will be constructed on the fly so that all the data needed to construct a whole object graph is queried in one big database request.

With judicious use of using a relationship scope to include second-order associations and careful benchmarking, you can sometimes improve the performance of your application dramatically, mostly by eliminating N+1 queries. On the other hand, pulling lots of data from the database and instantiating large object trees can be very costly, so using an includes scope is no “silver bullet”. As they say, your mileage may vary.

7.3.4.3 select

Replaces the SQL select clause that normally generated when loading this association, which usually takes the form table_name.*. Just additional flexibility that it normally never needed.

7.3.4.4 readonly

Locks down the reference to the owning record so that you can’t modify it. Theoretically this might make sense in terms of constraining your programming contexts very specifically, but I’ve never had a use for it. Still, for illustrative purposes, here is an example where I’ve made the userassociation on Timesheet readonly:

1 class Timesheet < ActiveRecord::Base

2 belongs_to :user, ~> { readonly }

3 ...

4

5 >> t = Timesheet.first

6 => #<Timesheet id: 1, submitted: nil, user_id: 1...>

7

8 >> t.user

9 => #<User id: 1, login: "admin"...>

10

11 >> t.user.save

12 ActiveRecord::ReadOnlyRecord: ActiveRecord::ReadOnlyRecord

7.4 The has_many Association

Just like it sounds, the has_many association allows you to define a relationship in which one model has many other models that belong to it. The sheer readability of code constructs such as has_many is a major reason that people fall in love with Rails.

The has_many class method is often used without additional options. If Rails can guess the type of class in the relationship from the name of the association, no additional configuration is necessary. This bit of code should look familiar by now:

1 classUser < ActiveRecord::Base

2 has_many :timesheets

3 has_many :expense_reports

The names of the associations can be singularized and match the names of models in the application, so everything works as expected.

7.4.1 has_many Options

Despite the ease of use of has_many, there is a surprising amount of power and customization possible for those who know and understand the options available.

7.4.1.1 after_add: callback

Called after a record is added to the collection via the << method. Is not triggered by the collection’s create method, so careful consideration is needed when relying on association callbacks. A lambda callback will get called directly, versus a symbol, which correlates to a method on the owning record, which takes the newly-added child as a parameter. It’s also possible to pass an array of lambda or symbols.

Add callback method options to a has_many by passing one or more symbols corresponding to method names, or Proc objects. See Listing 7.2 in the :before_add option for an example.

7.4.1.2 after_remove: callback

Called after a record has been removed from the collection with the delete method. A lambda callback will get called directly, versus a symbol, which correlates to a method on the owning record, which takes the newly-added child as a parameter. It’s also possible to pass an array of lambda or symbols. See Listing 7.2 in the :before_add option for an example.

7.4.1.3 as: association_name

Specifies the polymorphic belongs_to association to use on the related class. (See Chapter 9, “Advanced Active Record” for more about polymorphic relationships.)

7.4.1.4 autosave: true

Whether to automatically save all modified records in an association collection when the parent is saved. Defaults to false, but note that normal Active Record behavior is to save new associations records automatically when the parent is saved.

7.4.1.5 before_add: callback

Triggered when a record is added to the collection via the << method. (Remember that concat and push are aliases of <<.)

A lambda callback will get called directly, versus a symbol, which correlates to a method on the owning record, which takes the newly-added child as a parameter. It’s also possible to pass an array of lambda or symbols.

Raising an exception in the callback will stop the object from getting added to the collection. (Basically, because the callback is triggered right after the type mismatch check, and there is no rescue clause to be found inside <<.)

Listing 7.2: A simple example of :before—add callback usage


1 has_many :unchangable_posts,

2 class_name: "Post",

3 before_add: :raise_exception

4

5 private

6

7 def raise_exception(object)

8 raise "You can't add a post"

9 end


Of course, that would have been a lot shorter code using a Proc since it’s a one liner. The owner parameter is the object with the association. The record parameter is the object being added.

has_many :unchangable_posts,

class_name: "Post",

before_add: ->(owner, record) { raise "Can't do it!" }

One more time, with a lambda, which doesn’t check the arity of block parameters:

has_many :unchangable_posts,

class_name: "Post",

before_add: lambda { raise "You can't add a post" }

7.4.1.6 before_remove: callback

Called before a record is removed from a collection with the delete method. See before_add for more information. As with :before_add, raising an exception stops the remove operation.

1 classUser < ActiveRecord::Base

2 has_many :timesheets,

3 before_remove: :check_timesheet_destruction,

4 dependent: :destroy

5

6 protected

7

8 def check_timesheet_destruction(timesheet)

9 if timesheet.submitted?

10 raise TimesheetError, "Cannot destroy a submitted timesheet."

11 end

12 end

Note that this is a somewhat contrived example, because it violates my sense of good object-oriented principles. The User class shouldn’t really be responsible for knowing when it’s okay to delete a timesheet or not. The check_timesheet_destruction method would more properly be added as a before_destroy callback on the Timesheet class.

7.4.1.7 :class_name

The :class_name option is common to all of the associations. It allows you to specify, as a string, the name of the class of the association, and is needed when the class name cannot be inferred from the name of the association itself.

has_many :draft_timesheets, -> { where(submitted: false) },

class_name: 'Timesheet'

7.4.1.8 dependent: :delete_all

All associated objects are deleted in fell swoop using a single SQL command. Note: While this option is much faster than :destroy, it doesn’t trigger any destroy callbacks on the associated objects—you should use this option very carefully. It should only be used on associations that depend solely on the parent object.

7.4.1.9 dependent: :destroy

All associated objects are destroyed along with the parent object, by iteratively calling their destroy methods.

7.4.1.10 dependent: :nullify

The default behavior when deleting a record with has_many associations is to leave those associated records alone. Their foreign key fields will still point at the record that was deleted. The :nullify option tells Active Record to nullify, or clear, the foreign key that joins them to the parent record.

7.4.1.11 dependent: :restrict_with_exception

If associated objects are present when the parent object is destroyed, Rails raises an ActiveRecord::DeleteRestrictionError exception.

7.4.1.12 dependent: :restrict_with_error

An error is added to the parent object if any associated objects are present, rolling back the deletion from the database.

7.4.1.13 foreign_key: column_name

Overrides the convention-based foreign key column name that would normally be used in the SQL statement that loads the association. Normally it would be the owning record’s class name with _id appended to it.

7.4.1.14 inverse_of: name_of_belongs_to_association

Explicitly declares the name of the inverse association in a bi-directional relationship. Considered an optimization, use of this option allows Rails to return the same instance of an object no matter which side of the relationship it is accessed from.

Consider the following, using our recurring example, without usage of inverse_of.

>> user = User.first

>> timesheet = user.timesheets.first

=> <Timesheet id: 1, user_id: 1...>

>> timesheet.user.equal? user

=> false

If we add :inverse_of to the association objection on User, like

has_many :timesheets, inverse_of: :user

then timesheet.user.equal? user will be true. Try something similar in one of your apps to see it for yourself.

7.4.1.15 primary_key: column_name

Specifies a surrogate key to use instead of the owning record’s primary key, whose value should be used when querying to fill the association collection.

7.4.1.16 :source and :source_type

Used exclusively as additional options to assist in using has_many :through associations with polymorphic belongs_to. Covered in detail later in this chapter.

7.4.1.17 through: association_name

Creates an association collection via another association. See the section in this chapter entitled “has_many :through” for more information.

7.4.1.18 validate: false

In cases where the child records in the association collection would be automatically saved by Active Record, this option (true by default) dictates whether to ensure that they are valid. If you always want to check the validity of associated records when saving the owning record, then usevalidates_associated :association_name.

7.4.2 has_many scopes

The has_many association provides the ability to customize the query used by the database to retrieve the association collection. This is achieved by passing a scope block to the has_many method definition using any of the standard Active Record query methods, as covered in Chapter 5, “Working with Active Record”. In this section, we’ll cover the most common scope methods used with has_many associations.

7.4.2.1 where(*conditions)

Using the query method where, one could add extra conditions to the Active Record-generated SQL query that brings back the objects in the association.

You can apply extra conditions to an association for a variety of reasons. How about approval of comments?

has_many :comments,

Plus, there’s no rule that you can’t have more than one has_many association exposing the same two related tables in different ways. Just remember that you’ll probably have to specify the class name too.

has_many :pending_comments, -> { where(approved: true) },

class_name: 'Comment'

7.4.2.2 extending(*extending_modules)

Specifies one or many modules with methods that will extend the association collection proxy. Used as an alternative to defining additional methods in a block passed to the has_many method itself. Discussed in the section “Association Extensions”.

7.4.2.3 group(*args)

Adds a GROUP BY SQL clause to the queries used to load the contents of the association collection.

7.4.2.4 having(*clauses)

Must be used in conjunction with the group query method and adds extra conditions to the resulting SQL query used to load the contents of the association collection.

7.4.2.5 includes(*associations)

Takes an array of second-order association names (as an array) that should be eager-loaded when this collection is loaded. With judicious use of the includes query method and careful benchmarking you can sometimes improve the performance of your application dramatically.

To illustrate, let’s analyze how includes affects the SQL generated while navigating relationships. We’ll use the following simplified versions of Timesheet, BillableWeek, and BillingCode:

1 classTimesheet < ActiveRecord::Base

2 has_many :billable_weeks

3 end

4

5 classBillableWeek < ActiveRecord::Base

6 belongs_to :timesheet

7 belongs_to :billing_code

8 end

9

10 classBillingCode < ActiveRecord::Base

11 belongs_to :client

12 has_many :billable_weeks

13 end

First, I need to set up my test data, so I create a timesheet instance and add a couple of billable weeks to it. Then I assign a billable code to each billable week, which results in an object graph (with four objects linked together via associations).

Next I do a fancy one-line collect, which gives me an array of the billing codes associated with the timesheet:

>> Timesheet.find(3).billable_weeks.collect(&:code)

=> ["TRAVEL", "DEVELOPMENT"]

Without the includes scope method set on the billable_weeks association of Timesheet, that operation cost me the following four database hits (copied from log/development.log, and prettied up a little):

Timesheet Load (0.3ms) SELECT timesheets.* FROM timesheets WHERE

(timesheets.id = 3) LIMIT 1

BillableWeek Load (1.3ms) SELECT billable_weeks.* FROM billable_weeks WHERE

(billable_weeks.timesheet_id = 3)

BillingCode Load (1.2ms) SELECT billing_codes.* FROM billing_codes WHERE

(billing_codes.id = 7) LIMIT 1

BillingCode Load (3.2ms) SELECT billing_codes.* FROM billing_codes WHERE

(billing_codes.id = 8) LIMIT 1

This demonstrates the so-called “N+1 select” problem that inadvertently plagues many systems. Anytime I need one billable week, it will cost me N select statements to retrieve its associated records. Now let’s provide the billable_weeks association a scope block using includes, after which the Timesheet class looks as follows:

1 classTimesheet < ActiveRecord::Base

2 has_many :billable_weeks, -> { includes(:billing_code) }

3 end

Simple! Rerunning our test statement yields the same results in the console:

>> Timesheet.find(3).billable_weeks.collect(&:code)

=> ["TRAVEL", "DEVELOPMENT"]

But look at how different the generated SQL is:

Timesheet Load (0.4ms) SELECT timesheets.* FROM timesheets WHERE (timesheets.id

= 3) LIMIT 1

BillableWeek Load (0.6ms) SELECT billable_weeks.* FROM billable_weeks WHERE

(billable_weeks.timesheet_id = 3)

BillingCode Load (2.1ms) SELECT billing_codes.* FROM billing_codes WHERE

(billing_codes.id IN (7,8))

Active Record smartly figures out exactly which BillingCode records it will need and pulls them in using one query. For large datasets, the performance improvement can be quite dramatic!

It’s generally easy to find N+1 select issues just by watching the log scroll by while clicking through the different screens of your application. (Of course, make sure that you’re looking at realistic data or the exercise will be pointless.) Screens that might benefit from eager loading will cause a flurry of single-row SELECT statements, one for each record in a given association being used.

If you’re feeling particularly daring (perhaps masochistic is a better term) you can try including a deep hierarchy of associations, by mixing hashes into your includes query method, like in this fictional example from a bulletin board:

has_many :posts, -> { includes([:author, {comments: {author: :avatar }}]) }

That example snippet will grab not only all the comments for a Post, but all their authors and avatar pictures as well. You can mix and match symbols, arrays and hashes in any combination to describe the associations you want to load.

The biggest potential problem with so-called “deep” includes is pulling too much data out of the database. You should always start out with the simplest solution that will work, then use benchmarking and analysis to figure out if optimizations such as eager-loading help improve your performance.

discussion

Wilson says…

Let people learn eager loading by crawling across broken glass, like we did. It builds character!

7.4.2.6 limit(integer)

Appends a LIMIT clause to the SQL generated for loading this association. This option is potentially useful in capping the size of very large association collections. Use in conjunction with the order query method to make sure your grabbing the most relevant records.

7.4.2.7 offset(integer)

An integer determining the offset from where the rows should be fetched when loading the association collection. I assume this is here mostly for completeness, since it’s hard to envision a valid use case.

7.4.2.8 order(*clauses)

Specifies the order in which the associated objects are returned via an “ORDER BY” sql fragment, such as "last_name, first_name DESC".

7.4.2.9 readonly

Sets all records in the association collection to read-only mode, which prevents saving them.

7.4.2.10 select(expression)

By default, this is * as in SELECT * FROM, but can be changed if you for example want to add additional calculated columns or “piggyback” additional columns from joins onto the associated object as its loaded.

7.4.2.11 distinct

Strips duplicate objects from the collection. Sometimes useful in conjunction with has_many :through.

7.5 Many-to-Many Relationships

Associating persistent objects via a join table can be one of the trickier aspects of object-relational mapping to implement correctly in a framework. Rails has a couple of techniques that let you represent many-to-many relationships in your model. We’ll start with the older and simplerhas_and_belongs_to_many and then cover the newer has_many :through.

7.5.1 has_and_belongs_to_many

Before proceeding with this section, I must clear my conscience by stating that has_and_belongs_to_many is practically obsolete in the minds of many Rails developers, including the authors of this book. Use has_many :through instead and your life should be a lot easier. The section is preserved in this edition almost exactly as it appeared in the previous editions, because it contains good techniques that enlighten the reader about nuances of Active Record behavior.

The has_and_belongs_to_many method establishes a link between two associated Active Record models via an intermediate join table. Unless the join table is explicitly specified as an option, Rails guesses its name by concatenating the table names of the joined classes, in alphabetical order and separated with an underscore.

For example, if I was using has_and_belongs_to_many (or habtm for short) to establish a relationship between Timesheet and BillingCode, the join table would be named billing_codes_timesheets and the relationship would be defined in the models. Both the migration class and models are listed:

1 classCreateBillingCodesTimesheets < ActiveRecord::Migration

2 def change

3 create_table :billing_codes_timesheets, id: false do |t|

4 t.references :billing_code, null: false

5 t.references :timesheet, null: false

6 end

7 end

8 end

9

10 classTimesheet < ActiveRecord::Base

11 has_and_belongs_to_many :billing_codes

12 end

13

14 classBillingCode < ActiveRecord::Base

15 has_and_belongs_to_many :timesheets

16 end

Note that an id primary key is not needed, hence the id: false option was passed to the create_table method. Also, since the foreign key columns are both needed, we pass them a null: false option. (In real code, you would also want to make sure both of the foreign key columns were indexed properly.)

tip

Kevin says…

A new migration method create_join_table was added to Rails 4 to create a join table using the order of the first two arguments. The migration in the preceding code example is equivalent to the following:

1 classCreateBillingCodesTimesheets < ActiveRecord::Migration

2 def change

3 create_join_table :billing_codes, :timesheets

4 end

5 end

7.5.1.1 Self-Referential Relationship

What about self-referential many-to-many relationships? Linking a model to itself via a habtm relationship is easy—you just have to provide explicit options. In Listing 7.3, I’ve created a join table and established a link between related BillingCode objects. Again, both the migration and model class are listed:

Listing 7.3: Related billing codes


1 classCreateRelatedBillingCodes < ActiveRecord::Migration

2 def change

3 create_table :related_billing_codes, id: false do |t|

4 t.column :first_billing_code_id, :integer, null: false

5 t.column :second_billing_code_id, :integer, null: false

6 end

7 end

8 end

9

10 classBillingCode < ActiveRecord::Base

11 has_and_belongs_to_many :related,

12 join_table: 'related_billing_codes',

13 foreign_key: 'first_billing_code_id',

14 association_foreign_key: 'second_billing_code_id',

15 class_name: 'BillingCode'

16 end


7.5.1.2 Bidirectional Relationships

It’s worth noting that the related relationship of the BillingCode in Listing 7.3 is not bidirectional. Just because you associate two objects in one direction does not mean they’ll be associated in the other direction. But what if you need to automatically establish a bidirectional relationship?

First let’s write a spec for the BillingCode class to prove our solution. When we add bidirectional, we don’t want to break the normal behavior, so at first my spec example establishes that the normal habtm relationship works:

1 describe BillingCode do

2 let(:travel_code) { BillingCode.create(code: 'TRAVEL') }

3 let(:dev_code) { BillingCode.create(code: 'DEV') }

4

5 it "has a working related habtm association" do

6 travel_code.related << dev_code

7 expect(travel_code.reload.related).to include(dev_code)

8 end

9 end

I run the spec and it passes. Now I can modify the example to prove that the bidirectional behavior that we’re going to add works. It ends up looking very similar to the first example.

1 describe BillingCode do

2 let(:travel_code) { BillingCode.create(code: 'TRAVEL') }

3 let(:dev_code) { BillingCode.create(code: 'DEV') }

4

5 it "has a bidirectional habtm association" do

6 travel_code.related << dev_code

7 expect(travel_code.reload.related).to include(dev_code)

8 expect(dev_code.reload.related).to include(travel_code)

9 end

Of course, the new version fails, since we haven’t added the new behavior yet. I’ll omit the output of running the spec, since it doesn’t tell us anything we don’t know already.

7.5.1.3 Extra Columns on has_and_belongs_to_many Join Tables

Rails won’t have a problem with you adding as many extra columns as you want to habtm’s join table. The extra attributes will be read in and added onto model objects accessed via the habtm association. However, speaking from experience, the severe annoyances you will deal with in your application code make it really unattractive to go that route.

What kind of annoyances? For one, records returned from join tables with additional attributes will be marked as read-only, because it’s not possible to save changes to those additional attributes.

You should also consider that the way that Rails makes those extra columns of the join table available might cause problems in other parts of your codebase. Having extra attributes appear magically on an object is kind of cool, but what happens when you try to access those extra properties on an object that wasn’t fetched via the habtm association? Kaboom! Get ready for some potentially bewildering debugging exercises.

Methods of the habtm proxy act just as they would for a has_many relationship. Similarly, habtm shares options with has_many; only its :join_table option is unique. It allows customization of the join table name.

To sum up, habtm is a simple way to establish a many-to-many relationship using a join table. As long as you don’t need to capture additional data about the relationship, everything is fine. The problems with habtm begin once you want to add extra columns to the join table, after which you’ll want to upgrade the relationship to use has_many :through instead.

7.5.1.4 “Real Join Models” and habtm

The Rails documentation advises readers that: “It’s strongly recommended that you upgrade any [habtm] associations with attributes to a real join model.” Use of habtm, which was one of the original innovative features in Rails, fell out of favor once the ability to create real join models was introduced via the has_many :through association.

Realistically, habtm is not going to be removed from Rails, for a couple of sensible reasons. First of all, plenty of legacy Rails applications need it. Second, habtm provides a way to join classes without a primary key defined on the join table, which is occasionally useful. But most of the time you’ll find yourself wanting to model many-to-many relationships with has_many :through.

7.5.2 has_many :through

Well-known Rails guy Josh Susser is considered the expert on Active Record associations, even his blog is called has_many :through. His description of the :through association, written back when the feature was originally introduced in Rails 1.1, is so concise and well-written that I couldn’t hope to do any better. So here it is:

The has_many :through association allows you to specify a one-to-many relationship indirectly via an intermediate join table. In fact, you can specify more than one such relationship via the same table, which effectively makes it a replacement for has_and_belongs_to_many. The biggest advantage is that the join table contains full-fledged model objects complete with primary keys and ancillary data. No more push_with_attributes; join models just work the same way all your other Active Record models do.

7.5.2.1 Join Models

To illustrate the has_many :through association, we’ll set up a Client model so that it has many Timesheet objects, through a normal has_many association named billable_weeks.

1 classClient < ActiveRecord::Base

2 has_many :billable_weeks

3 has_many :timesheets, through: :billable_weeks

4 end

The BillableWeek class was already in our sample application and is ready to be used as a join model:

1 classBillableWeek < ActiveRecord::Base

2 belongs_to :client

3 belongs_to :timesheet

4 end

We can also set up the inverse relationship, from timesheets to clients, like this.

1 classTimesheet < ActiveRecord::Base

2 has_many :billable_weeks

3 has_many :clients, through: :billable_weeks

4 end

Notice that has_many :through is always used in conjunction with a normal has_many association. Also, notice that the normal has_many association will often have the same name on both classes that are being joined together, which means the :through option will read the same on both sides.

through: :billable_weeks

How about the join model; will it always have two belongs_to associations? No.

You can also use has_many :through to easily aggregate has_many or has_one associations on the join model. Forgive me for switching to completely nonrealistic domain for a moment—it’s only intended to clearly demonstrate what I’m trying to describe:

1 classGrandparent < ActiveRecord::Base

2 has_many :parents

3 has_many :grand_children, through: :parents, source: :children

4 end

5

6 classParent < ActiveRecord::Base

7 belongs_to :grandparent

8 has_many :children

9 end

For the sake of clarity in later chapters, I’ll refer to this usage of has_many :through as aggregating.

Courtenay says…

We use has_many :through so much! It has pretty much replaced the old has_and_belongs_to_many, because it allows your join models to be upgraded to full objects.It’s like when you’re just dating someone and they start talking about the Relationship (or, eventually, Our Marriage). It’s an example of an association being promoted to something more important than the individual objects on each side.

7.5.2.2 Usage Considerations and Examples

You can use non-aggregating has_many :through associations in almost the same ways as any other has_many associations. For instance, appending an object to a has_many :through collection will save the object as expected:

>> c = Client.create(name: "Trotter's Tomahawks", code "ttom")

=> #<Client id: 5 ...>

>> c.timesheets << Timesheet.new

=> #<ActiveRecord::Associations::CollectionProxy [#<Timesheet id: 2 ...>]>

The main benefit of has_many :through is that Active Record takes care of managing the instances of the join model for you. If we call reload on the billable _weeks association, we’ll see that there was a billable week object created for us:

>> c.billable_weeks.reload.to_a

=> [#<BillableWeek id: 2, tuesday_hours: nil, start_date: nil,

timesheet_id: 2, billing_code_id: nil, sunday_hours: nil,

friday_hours: nil, monday_hours: nil, client_id: 2, wednesday_hours: nil,

saturday_hours: nil, thursday_hours: nil>]

The BillableWeek object that was created is properly associated with both the client and the Timesheet. Unfortunately, there are a lot of other attributes (e.g., start_date, and the hours columns) that were not populated.

One possible solution is to use create on the billable_weeks association instead, and include the new Timesheet object as one of the supplied properties.

>> bw = c.billable_weeks.create(start_date: Time.now,

timesheet: Timesheet.new)

7.5.2.3 Aggregating Associations

When you’re using has_many :through to aggregate multiple child associations, there are more significant limitations—essentially you can query to your hearts content using find and friends, but you can’t append or create new records through them.

For example, let’s add a billable_weeks association to our sample User class:

1 classUser < ActiveRecord::Base

2 has_many :timesheets

3 has_many :billable_weeks, through: :timesheets

4 ...

The billable_weeks association aggregates all the billable week objects belonging to all of the user’s timesheets.

1 classTimesheet < ActiveRecord::Base

2 belongs_to :user

3 has_many :billable_weeks, -> { include(:billing_code) }

4 ...

Now let’s go into the Rails console and set up some example data so that we can use the new billable_weeks collection (on User).

>> quentin = User.first

=> #<User id: 1, login: "quentin" ...>

>> quentin.timesheets.to_a

=> []

>> ts1 = quentin.timesheets.create

=> #<Timesheet id: 1 ...>

>> ts2 = quentin.timesheets.create

=> #<Timesheet id: 2 ...>

>> ts1.billable_weeks.create(start_date: 1.week.ago)

=> #<BillableWeek id: 1, timesheet_id: 1 ...>

>> ts2.billable_weeks.create(start_date: 2.week.ago)

=> #<BillableWeek id: 2, timesheet_id: 2 ...>

>> quentin.billable_weeks.to_a

=> [#<BillableWeek id: 1, timesheet_id: 1 ...>, #<BillableWeek id: 2,

timesheet_id: 2 ...>]

Just for fun, let’s see what happens if we try to create a BillableWeek with a User instance:

>> quentin.billable_weeks.create(start_date: 3.weeks.ago)

ActiveRecord::HasManyThroughCantAssociateThroughHasOneOrManyReflection:

Cannot modify association 'User#billable_weeks' because the source

reflection class 'BillableWeek' is associated to 'Timesheet' via :has_many.

There you go… since BillableWeek only belongs to a timesheet and not a user, Rails raises a HasManyThroughCantAssociateThroughHasOneOrManyReflection exception.

7.5.2.4 Join Models and Validations

When you append to a non-aggregating has_many :through association with <<, Active Record will always create a new join model, even if one already exists for the two records being joined. You can add validates_uniqueness_of constraints on the join model to keep duplicate joins from happening.

This is what such a constraint might look like on our BillableWeek join model.

validates_uniqueness_of :client_id, scope: :timesheet_id

That says, in effect: “There should only be one of each client per timesheet.”

If your join model has additional attributes with their own validation logic, then there’s another important consideration to keep in mind. Adding records directly to a has_many :through association causes a new join model to be automatically created with a blank set of attributes. Validations on additional columns of the join model will probably fail. If that happens, you’ll need to add new records by creating join model objects and associating them appropriately through their own association proxy.

timesheet.billable_weeks.create(start_date: 1.week.ago)

7.5.3 has_many :through Options

The options for has_many :through are the same as the options for has_many—remember that :through is just an option on has_many! However, the use of some of has_many’s options change or become more significant when :through is used.

First of all, the :class_name and :foreign_key options are no longer valid, since they are implied from the target association on the join model. The following are the rest of the options that have special significance together with has_many :through.

7.5.3.1 source: assocation_name

The :source option specifies which association to use on the associated class. This option is not mandatory because normally Active Record assumes that the target association is the singular (or plural) version of the has_many association name. If your association names don’t match up, then you have to set :source explicitly.

For example, the following code will use the BillableWeek’s sheet association to populate timesheets.

has_many :timesheets, through: :billable_weeks, source: :sheet

7.5.3.2 source_type: class_name

The :source_type option is needed when you establish a has_many :through to a polymorphic belongs_to association on the join model. Consider the following example concerning clients and contacts:

1 classClient < ActiveRecord::Base

2 has_many :client_contacts

3 has_many :contacts, through: :client_contacts

4 end

5

6 classClientContact < ActiveRecord::Base

7 belongs_to :client

8 belongs_to :contact, polymorphic: true

9 end

In this somewhat contrived example, the most important fact is that a Client has many contacts, through their polymorphic relationship to the join model, ClientContact. There isn’t a Contact class, we just want to be able to refer to contacts in a polymorphic sense, meaning either a Person or a Business.

1 classPerson < ActiveRecord::Base

2 has_many :client_contacts, as: :contact

3 end

4

5 classBusiness < ActiveRecord::Base

6 has_many :client_contacts, as: :contact

7 end

Now take a moment to consider the backflips that Active Record would have to perform in order to figure out which tables to query for a client’s contacts. Remember that there isn’t a contacts table!

>> Client.first.contacts

Active Record would theoretically need to be aware of every model class that is linked to the other end of the contacts polymorphic association. In fact, it cannot do those kinds of backflips, which is probably a good thing as far as performance is concerned:

>> Client.first.contacts

ActiveRecord::HasManyThroughAssociationPolymorphicSourceError: Cannot have a

has_many :through association 'Client#contacts' on the polymorphic object

'Contact#contact' without 'source_type'.

The only way to make this scenario work (somewhat) is to give Active Record some help by specifying which table it should search when you ask for the contacts collection, and you do that with the source_type option naming the target class, symbolized, like this:

1 classClient < ActiveRecord::Base

2 has_many :client_contacts

3 has_many :people, through: :client_contacts,

4 source: :contact, source_type: :person

5

6 has_many :businesses, through: :client_contacts,

7 source: :contact, source_type: :business

8 end

After the :source_type is specified, the association will work as expected, but sadly we don’t get a general purpose contacts collection to work with, as it seemed might be possible at first.

>> Client.first.people.create!

=> [#<Person id: 1>]

If you’re upset that you cannot associate people and business together in a contacts association, you could try writing your own accessor method for a client’s contacts:

1 classClient < ActiveRecord::Base

2 def contacts

3 people_contacts + business_contacts

4 end

5 end

Of course, you should be aware that calling that contacts method will result in at least two database requests and will return an Array, without the association proxy methods that you might expect it to have.

7.5.4 Unique Association Objects

The distinct scope method tells the association to include only unique objects. It is especially useful when using has_many :through, since two different BillableWeeks could reference the same Timesheet.

>> Client.first.timesheets.reload.to_a

[#<Timesheet id: 1...>, #<Timesheet id: 1...>]

It’s not extraordinary for two distinct model instances of the same database record to be in memory at the same time—it’s just not usually desirable.

1 classClient < ActiveRecord::Base

2 has_many :timesheets, -> { distinct }, through: :billable_weeks

3 end

After adding the distinct scope to the has_many :through association, only one instance per record is returned.

>> Client.first.timesheets.reload.to_a

=> [#<Timesheet id: 1...>]

7.6 One-to-One Relationships

One of the most basic relationship types is a one-to-one object relationship. In Active Record we declare a one-to-one relationship using the has_one and belongs_to methods together. As in the case of a has_many relationship, you call belongs_to on the model whose database table contains the foreign key column linking the two records together.

7.6.1 has_one

Conceptually, has_one works almost exactly like has_many does, except that when the database query is executed to retrieve the related object, a LIMIT 1 clause is added to the generated SQL so that only one row is returned.

The name of a has_one relationship should be singular, which will make it read naturally, for example: has_one :last_timesheet, has_one :primary_account, has_one :profile_photo, and so on. Let’s take a look at has_one in action by adding avatars for our users.

1 classAvatar < ActiveRecord::Base

2 belongs_to :user

3 end

4

5 classUser < ActiveRecord::Base

6 has_one :avatar

7 # ... the rest of our User code ...

8 end

That’s simple enough. Firing this up in rails console, we can look at some of the new methods that has_one adds to User.

>> u = User.first

>> u.avatar

=> nil

>> u.build_avatar(url: '/avatars/smiling')

=> #<Avatar id: nil, url: "/avatars/smiling", user_id: 1>

>> u.avatar.save

=> true

As you can see, we can use build_avatar to build a new avatar object and associate it with the user. While it’s great that has_one will associate an avatar with the user, it isn’t really anything that has_many doesn’t already do. So let’s take a look at what happens when we assign a new avatar to the user.

>> u = User.first

>> u.avatar

=> #<Avatar id: 1, url: "/avatars/smiling", user_id: 1>

>> u.create_avatar(url: '/avatars/frowning')

=> #<Avatar id: 2, url: "/avatars/4567", user_id: 1>

>> Avatar.all.to_a

=> [#<Avatar id: 1, url: "/avatars/smiling", user_id: nil>, #<Avatar id: 2, url:

"/avatars/4567", user_id: 1>]

The last line from that console session is the most interesting, because it shows that our initial avatar is now no longer associated with the user. Of course, the previous avatar was not removed from the database, which is something that we want in this scenario. So, we’ll use the dependent: :destroy option to force avatars to be destroyed when they are no longer associated with a user.

1 classUser < ActiveRecord::Base

2 has_one :avatar, dependent: :destroy

3 end

With some additional fiddling around in the console, we can verify that it works as intended. In doing so, you might notice that Rails only destroys the avatar that was just removed from the user, so bad data that was in your database from before will still remain. Keep this in mind when you decide to add dependent: :destroy to your code and remember to manually clear orphaned data that might otherwise remain.

7.6.1.1 Using has_one together with has_many

As I alluded to earlier, has_one is sometimes used to single out one record of significance alongside an already established has_many relationship. For instance, let’s say we want to easily be able to access the last timesheet a user was working on:

1 classUser < ActiveRecord::Base

2 has_many :timesheets

3

4 has_one :latest_sheet,

5 -> { order('created_at desc') },

6 class_name: 'Timesheet'

7 end

I had to specify a :class_name, so that Active Record knows what kind of object we’re associating. (It can’t figure it out based on the name of the association, :latest_sheet.)

When adding a has_one relationship to a model that already has a has_many defined to the same related model, it is not necessary to add another belongs_to method call to the target object, just for the new has_one. That might seem a little counterintuitive at first, but if you think about it, the same foreign key value is being used to read the data from the database.

7.6.1.2 has_one Options

The options for has_one associations are similar to the ones for has_many. For your convenience, we briefly cover the most relevant ones here.

7.6.1.3 :as

Allows you to set up a polymorphic association, covered in Chapter 9, “Advanced Active Record”.

7.6.1.4 :class_name

Allows you to specify the class this association uses. When you’re doing has_one :latest_timesheet, class_name: 'Timesheet', class_name: 'Timesheet' specifies that latest_timesheet is actually the last Timesheet object in the database that is associated with this user. Normally, this option is inferred by Rails from the name of the association.

7.6.1.5 :dependent

The :dependent option specifies how Active Record should treat associated objects when the parent object is deleted. (The default is to do nothing with associated objects, which will leave orphaned records in the database.) There are a few different values that you can pass and they work just like the :dependent option of has_many. If you pass :destroy to it, you tell Rails to destroy the associated object when it is no longer associated with the primary object. Setting the :dependent option to :delete will destroy the associated object without calling any of Rails’ normal hooks. Passing :restrict_with_exception causes Rails to throw an exception if there is any associated object present, while :restrict_with_error adds an error to the owner object causing validations to fail before saving. Finally, :nullify will simply set the foreign key values to nil so that the relationship is broken.

7.6.2 has_one scopes

The scopes for has_one associations are similar to the ones for has_many. For your convenience, we briefly cover the most relevant ones here.

7.6.2.1 where(*conditions)

Allows you to specify conditions that the object must meet to be included in the association.

1 classUser < ActiveRecord::Base

2 has_one :manager, -> ( where(type: 'manager')),

3 class_name: 'Person'

Here manager is specified as a person object that has type = 'manager'. I tend to almost always a where scope block in conjunction with has_one. When Active Record loads the association, it’s grabbing one of potentially many rows that have the right foreign key. Absent some explicit conditions (or perhaps an order scope), you’re leaving it in the hands of the database to pick a row.

7.6.2.2 order(*clauses)

Allows you to specify a SQL fragment that will be used to order the results. This is an especially useful option with has_one when trying to associate the latest of something or another.

1 classUser < ActiveRecord::Base

2 has_one :latest_timesheet,

3 -> { order('created_at desc') },

4 class_name: 'Timesheet'

5 end

7.6.2.3 readonly

Sets the record in the association to read-only mode, which prevents saving it.

7.7 Working with Unsaved Objects and Associations

You can manipulate objects and associations before they are saved to the database, but there is some special behavior you should be aware of, mostly involving the saving of associated objects. Whether an object is considered unsaved is based on the result of calling new_record?

7.7.1 One-to-One Associations

Assigning an object to a belongs_to association does not save the parent or the associated object.

Assigning an object to a has_one association automatically saves that object and the object being replaced (if there is one), so that their foreign key fields are updated. The exception to this behavior is if the parent object is unsaved, since that would mean that there is no foreign key value to set. If save fails for either of the objects being updated (due to one of them being invalid) the assignment operation returns false and the assignment is cancelled. That behavior makes sense (if you think about it), but it can be the cause of much confusion when you’re not aware of it. If you have an association that doesn’t seem to work, check the validation rules of the related objects.

7.7.2 Collections

Adding an object to has_many and has_and_belongs_to_many collections automatically saves it, unless the parent object (the owner of the collection) is not yet stored in the database.

If objects being added to a collection (via << or similar means) fail to save properly, then the addition operation will return false. If you want your code to be a little more explicit, or you want to add an object to a collection without automatically saving it, then you can use the collection’sbuild method. It’s exactly like create, except that it doesn’t save.

Members of a collection are automatically saved or updated when their parent is saved or updated, unless autosave: false is set on the association.

7.7.3 Deletion

Associations that are set with an autosave: true option are also afforded the ability to have their records deleted when an inverse record is saved. This is to allow the records from both sides of the association to get persisted within the same transaction, and is handled through themark_for_destruction method. Consider our User and Timesheet models again:

1 classUser < ActiveRecord::Base

2 has_many :timesheets, autosave: true

3 end

If I would like to have a Timesheet destroyed when the User is saved, mark it for destruction.

1 user = User.where(name: "Durran")

2 timesheet = user.timesheets.closed

3 timesheet.mark_for_destruction # => Flags timesheet

4 user.save # => The timesheet gets deleted

Since both are persisted in the same transaction, if the operation were to fail the database would not be in an inconsistent state. Do note that although the child record did not get deleted in that case, it still would be marked for destruction and any later attempts to save the inverse would once again attempt to delete it.

7.8 Association Extensions

The proxy objects that handle access to associations can be extended with your own application code. You can add your own custom finders and factory methods to be used specifically with a particular association.

For example, let’s say you wanted a concise way to refer to an account’s people by name. You may create an extension on the association like the following:

Listing 7.4: An association extension on a people collection


1 classAccount < ActiveRecord::Base

2 has_many :people do

3 def named(full_name)

4 first_name, last_name = full_name.split(" ", 2)

5 where(first_name: first_name, last_name: last_name).first_or_create

6 end

7 end

8 end


Now we have a named method available to use on the people collection.

1 account = Account.first

2 person = account.people.named("David Heinemeier Hansson")

3 person.first_name # => "David"

4 person.last_name # => "Heinemeier Hansson"

If you need to share the same set of extensions between many associations, you can specify an extension module, instead of a block with method definitions. Here is the same feature shown in Listing 7.4, except broken out into its own Ruby module:

1 moduleByNameExtension

2 def named(full_name)

3 first_name, last_name = full_name.split(" ", 2)

4 where(first_name: first_name, last_name: last_name).first_or_create

5 end

6 end

Now we can use it to extend many different relationships, as long as they’re compatible. (Our contract in the example consists of a model with columns first_name and last_name.)

1 classAccount < ActiveRecord::Base

2 has_many :people, -> { extending(ByNameExtension) }

3 end

4

5 classCompany < ActiveRecord::Base

6 has_many :people, -> { extending(ByNameExtension) }

7 end

If you need to use multiple named extension modules, you can pass an array of modules to the extending query method instead of a single module, like this:

has_many :people, -> { extending(ByNameExtension, ByRecentExtension) }

In the case of name conflicts, methods contained in modules added later in the array supercede those earlier in the array.

Consider a class method instead

Unless you have a valid reason to reuse the extension logic with more than one type of model, you’re probably better off leveraging the fact that class methods are automatically available on has_many associations.

1 classPerson < ActiveRecord::Base

2 belongs_to :account

3

4 defself.named(full_name)

5 first_name, last_name = full_name.split(" ", 2)

6 where(first_name: first_name, last_name: last_name).first_or_create

7 end

8 end

7.9 The CollectionProxy Class

CollectionProxy, the parent of all association proxies, contributes a handful of useful methods that apply to most kinds of associations and can come into play when you’re writing association extensions.

7.9.0.1 owner, reflection, and target

The owner method provides a reference to the parent object holding the association.

The reflection object is an instance of ActiveRecord::Reflection::AssociationReflection and contains all of the configuration options for the association. That includes both default settings and those that were passed to the association method when it was declared.

Finally, the target is the associated collection of objects (or associated object itself in the case of belongs_to and has_one).

It might not appear sane to expose these attributes publicly and allow their manipulation. However, without access to them it would be much more difficult to write advanced association extensions. The loaded?, loaded, target, and target= methods are public for similar reasons.

The following code sample demonstrates the use of owner within a published_prior_to extension method, originally contributed by Wilson Bilkovich:

1 classArticleCategory < ActiveRecord::Base

2 has_ancestry

3

4 has_many :articles do

5 def published_prior_to(date, options = {})

6 if owner.is_root?

7 Article.where('published_at < ? and category_id = ?', date, proxy_owner)

8 else

9 # self is the 'articles' association here so we inherit its scope

10 self.all(options)

11 end

12 end

13 end

14 end

The has_ancestry Active Record extension gem adds the ability to organize Active Record models as a tree structure. The self-referential association based on a ancestry string column. The owner reference is used to check if the parent of this association is a “top-level” node in the tree.

7.9.0.2 reload and reset

The reset method puts the association proxy back in its initial state, which is unloaded (cached association objects are cleared). The reload method invokes reset, and then loads associated objects from the database.

7.10 Conclusion

The ability to model associations is what makes Active Record more than just a data-access layer. The ease and elegance with which you can declare those associations are what make Active Record more than your ordinary object-relational mapper.

In this chapter, we covered the basics of how Active Record associations work. We started by taking a look at the class hierarchy of associations classes, starting with CollectionProxy. Hopefully, by learning about how associations work under the hood, you’ve picked up some enhanced understanding about their power and flexibility.

Finally, the options and methods guide for each type of association should be a good reference guide for your day-to-day development activities.