XML - The Rails 4 Way (2014)

The Rails 4 Way (2014)

Chapter 22. XML

Structure is nothing if it is all you got. Skeletons spook people if they try to walk around on their own. I really wonder why XML does not.

—Erik Naggum

XML doesn’t get much respect from the Rails community. It’s enterprisey. In the Ruby world that other markup language YAML (YAML Ain’t Markup Language) and data interchange format JSON (JavaScript Object Notation) get a heck of a lot more attention. However, use of XML is a fact of life for many projects, especially when it comes to interoperability with legacy systems. Luckily, Ruby on Rails gives us some pretty good functionality related to XML.

This chapter examines how to both generate and parse XML in your Rails applications, starting with a thorough examination of the to_xml method that most objects have in Rails.

22.1 The to_xml Method

Sometimes you just want an XML representation of an object, and Active Record models provide easy, automatic XML generation via the to_xml method. Let’s play with this method in the console and see what it can do.

I’ll fire up the console for my book-authoring sample application and find an Active Record object to manipulate.

>> User.find_by(login: 'obie')

=> #<User id: 8, login: "obie", email: "obie@example.com",

crypted_password: "4a6046804fc4dc3183ad9012fbfee91c85723d8c",

salt: "399754af1b01cf3d4b87da5478d82674b0438eb8",

created_at: "2010-05-18 19:31:40", updated_at: "2010-05-18 19:31:40",

remember_token: nil, remember_token_expires_at: nil,

authorized_approver: true, client_id: nil, timesheets_updated_at: nil>

There we go, a User instance. Let’s see that instance as its generic XML representation.

>> User.find_by(login: 'obie').to_xml

=> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<user>\n

<authorized-approver type=\"boolean\">true</authorized-approver>\n


<created-at type=\"datetime\">2010-05-18T19:31:40Z</created-at>\n


</crypted-password>\n <remember-token-expires-at type=\"datetime\"


<updated-at type=\"datetime\">2010-05-18T19:31:40Z</updated-at>\n

<id type=\"integer\">8</id>\n <client-id type=\"integer\"

nil=\"true\"></client-id>\n <remember-token nil=\"true\">

</remember-token>\n <login>obie</login>\n

<email>obie@example.com</email>\n <timesheets-updated-at

type=\"datetime\" nil=\"true\"></timesheets-updated-at>\n</user>\n"

Ugh, that’s ugly. Ruby’s print function might help us out here.

>> print User.find_by(login: 'obie').to_xml

1 <?xml version="1.0" encoding="UTF-8"?>

2 <user>

3 <authorized-approver type="boolean">true</authorized-approver>

4 <salt>399754af1b01cf3d4b87da5478d82674b0438eb8</salt>

5 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>


7 <crypted-password>4a6046804fc4dc3183ad9012fbfee91c85723d8c

8 </crypted-password>

9 <remember-token-expires-at type="datetime" nil="true">

10 </remember-token-expires-at>

11 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

12 <id type="integer">8</id>

13 <client-id type="integer" nil="true"></client-id>

14 <remember-token nil="true"></remember-token>

15 <login>obie</login>

16 <email>obie@example.com</email>

17 <timesheets-updated-at type="datetime" nil="true"></timesheets-updated-at>

18 </user>

Much better! So what do we have here? Looks like a fairly straightforward serialized representation of our User instance in XML.

22.1.1 Customizing to_xml Output

The standard processing instruction is at the top, followed by an element name corresponding to the class name of the object. The properties are represented as subelements, with non-string data fields including a type attribute. Mind you, this is the default behavior and we can customize it with some additional parameters to the to_xml method.

We’ll strip down that XML representation of a user to just an email and login using the only parameter. It’s provided in a familiar options hash, with the value of the :only parameter as an array:

>> print User.find_by(login: 'obie').to_xml(only: [:email, :login])

1 <?xml version="1.0" encoding="UTF-8"?>

2 <user>

3 <login>obie</login>

4 <email>obie@example.com</email>

5 </user>

Following the familiar Rails convention, the only parameter is complemented by its inverse, except, which will exclude the specified properties. What if I want my user’s email and login as a snippet of XML that will be included in another document? Then let’s get rid of that pesky instruction too, using the skip_instruct parameter.

>> print User.find_by(login: 'obie').to_xml(only: [:email, :login], skip_instruct: true)

1 <user>

2 <login>obie</login>

3 <email>obie@example.com</email>

4 </user>

We can change the root element in our XML representation of User and the indenting from two to four spaces by using the root and indent parameters respectively.

>> print User.find_by(login: 'obie').to_xml(root: 'employee', indent: 4)

1 <?xml version="1.0" encoding="UTF-8"?>

2 <employee>

3 <authorized-approver type="boolean">true</authorized-approver>

4 <salt>399754af1b01cf3d4b87da5478d82674b0438eb8</salt>

5 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>

6 <crypted-password>4a6046804fc4dc3183ad9012fbfee91c85723d8c</crypted-password>

7 <remember-token-expires-at type="datetime" nil="true"></remember-token-expires-at>

8 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

9 <id type="integer">8</id>

10 <client-id type="integer" nil="true"></client-id>

11 <remember-token nil="true"></remember-token>

12 <login>obie</login>

13 <email>obie@example.com</email>

14 <timesheets-updated-at type="datetime" nil="true"></timesheets-updated-at>

15 </employee>

By default Rails converts CamelCase and underscore attribute names to dashes as in created-at and client-id. You can force underscore attribute names by setting the dasherize parameter to false.

>> print User.find_by(login: 'obie').to_xml(dasherize: false,

only: [:created_at, :client_id])

1 <?xml version="1.0" encoding="UTF-8"?>

2 <user>

3 <created_at type="datetime">2010-05-18T19:31:40Z</created_at>

4 <client_id type="integer" nil="true"></client_id>

5 </user>

In the preceding output, the attribute type is included. This too can be configured using the skip_types parameter.

>> print User.find_by(login: 'obie').to_xml(skip_types: true,

only: [:created_at, :client_id])

1 <?xml version="1.0" encoding="UTF-8"?>

2 <user>

3 <created-at>2010-05-18T19:31:40Z</created-at>

4 <client-id nil="true"></client-id>

5 </user>

22.1.2 Associations and to_xml

So far we’ve only worked with a base Active Record and not with any of its associations. What if we wanted an XML representation of not just a book but also its associated chapters? Rails provides the :include parameter for just this purpose. The :include parameter will also take an array or associations to represent in XML.

>> print User.find_by(login: 'obie').to_xml(include: :timesheets)

1 <?xml version="1.0" encoding="UTF-8"?>

2 <user>

3 <authorized-approver type="boolean">true</authorized-approver>

4 <salt>399754af1b01cf3d4b87da5478d82674b0438eb8</salt>

5 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>

6 <crypted-password>

7 4a6046804fc4dc3183ad9012fbfee91c85723d8c

8 </crypted-password>

9 <remember-token-expires-at type="datetime"

10 nil="true"></remember-token-expires-at>

11 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

12 <id type="integer">8</id>

13 <client-id type="integer" nil="true"></client-id>

14 <remember-token nil="true"></remember-token>

15 <login>obie</login>

16 <email>obie@example.com</email>

17 <timesheets-updated-at type="datetime" nil="true"></timesheets-updated-at>

18 <timesheets type="array">

19 <timesheet>

20 <created-at type="datetime">2010-05-04T19:31:40Z</created-at>

21 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

22 <lock-version type="integer">0</lock-version>

23 <id type="integer">8</id>

24 <user-id type="integer">8</user-id>

25 <submitted type="boolean">true</submitted>

26 <approver-id type="integer">7</approver-id>

27 </timesheet>

28 <timesheet>

29 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>

30 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

31 <lock-version type="integer">0</lock-version>

32 <id type="integer">9</id>

33 <user-id type="integer">8</user-id>

34 <submitted type="boolean">false</submitted>

35 <approver-id type="integer" nil="true"></approver-id>

36 </timesheet>

37 <timesheet>

38 <created-at type="datetime">2010-05-11T19:31:40Z</created-at>

39 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

40 <lock-version type="integer">0</lock-version>

41 <id type="integer">10</id>

42 <user-id type="integer">8</user-id>

43 <submitted type="boolean">false</submitted>

44 <approver-id type="integer" nil="true"></approver-id>

45 </timesheet>

46 </timesheets>

47 </user>

Rails has a much more useful to_xml method on core classes. For example, arrays are easily serializable to XML, with element names inferred from the name of the Ruby type:

>> print ['cat', 'dog', 'ferret'].to_xml

1 <?xml version="1.0" encoding="UTF-8"?>

2 <strings type="array">

3 <string>cat</string>

4 <string>dog</string>

5 <string>ferret</string>

6 </strings>

If you have mixed types in the array, this is also reflected in the XML output:

>> print [3, 'cat', 'dog', :ferret].to_xml

1 <?xml version="1.0" encoding="UTF-8"?>

2 <objects type="array">

3 <object type="integer">3</object>

4 <object>cat</object>

5 <object>dog</object>

6 <object type="symbol">ferret</object>

7 </objects>

To construct a more semantic structure, the root option on to_xml triggers more expressive element names:

>> print ['cat', 'dog', 'ferret'].to_xml(root: 'pets')

1 <?xml version="1.0" encoding="UTF-8"?>

2 <pets type="array">

3 <pet>cat</pet>

4 <pet>dog</pet>

5 <pet>ferret</pet>

6 </pets>

Ruby hashes are naturally representable in XML, with keys corresponding to element names, and their values corresponding to element contents. Rails automatically calls to_s on the values to get string values for them:

>> print({owners: ['Chad', 'Trixie'], pets: ['cat', 'dog', 'ferret'],

id: 123}.to_xml(root: 'registry'))

1 <?xml version="1.0" encoding="UTF-8"?>

2 <registry>

3 <pets type="array">

4 <pet>cat</pet>

5 <pet>dog</pet>

6 <pet>ferret</pet>

7 </pets>

8 <owners type="array">

9 <owner>Chad</owner>

10 <owner>Trixie</owner>

11 </owners>

12 <id type="integer">123</id>

13 </registry>


Josh G says…

This simplistic serialization may not be appropriate for certain interoperability contexts, especially if the output must pass XML Schema (XSD) validation when the order of elements is often important. In Ruby 1.9.x and 2.0, the Hash class uses insertion order. This may not be adequate for producing output that matches an XSD. The section “The XML Builder” will discussBuilder::XmlMarkup to address this situation.

The :include option of to_xml is not used on Array and Hash objects.

22.1.3 Advanced to_xml Usage

By default, Active Record’s to_xml method only serializes persistent attributes into XML. However, there are times when transient, derived, or calculated values need to be serialized out into XML form as well. For example, our User model has a method that returns only draft timesheets:

1 classUser < ActiveRecord::Base

2 ...

3 def draft_timesheets

4 timesheets.draft

5 end

6 ...

7 end

To include the result of this method when we serialize the XML, we use the :methods parameter:

>> print User.find_by(login: 'obie').to_xml(methods: :draft_timesheets)

1 <?xml version="1.0" encoding="UTF-8"?>

2 <user>

3 <id type="integer">8</id>

4 ...

5 <draft-timesheets type="array">

6 <draft-timesheet>

7 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>

8 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

9 <lock-version type="integer">0</lock-version>

10 <id type="integer">9</id>

11 <user-id type="integer">8</user-id>

12 <submitted type="boolean">false</submitted>

13 <approver-id type="integer" nil="true"></approver-id>

14 </draft-timesheet>

15 <draft-timesheet>

16 <created-at type="datetime">2010-05-11T19:31:40Z</created-at>

17 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

18 <lock-version type="integer">0</lock-version>

19 <id type="integer">10</id>

20 <user-id type="integer">8</user-id>

21 <submitted type="boolean">false</submitted>

22 <approver-id type="integer" nil="true"></approver-id>

23 </draft-timesheet>

24 </draft-timesheets>

25 </user>

We could also set the methods parameter to an array of method names to be called.

22.1.4 Dynamic Runtime Attributes

In cases where we want to include extra elements unrelated to the object being serialized, we can pass to_xml a block, or use the :procs option.

If we are using the same logic applied to different to_xml calls, we can construct lambdas ahead of time and use one or more of them in the :procs option. They will be called with to_xml’s option hash, through which we access the underlying XmlBuilder. (XmlBuilder provides the principal means of XML generation in Rails.

>> current_user = User.find_by(login: 'admin')

>> generated_at = lambda { |opts| opts[:builder].tag!('generated-at',

Time.now.utc.iso8601) }

>> generated_by = lambda { |opts| opts[:builder].tag!('generated-by',

current_user.email) }

>> print(User.find_by(login: 'obie').to_xml(procs: [generated_at,


1 <?xml version="1.0" encoding="UTF-8"?>

2 <user>

3 ...

4 <id type="integer">8</id>

5 <client-id type="integer" nil="true"></client-id>

6 <remember-token nil="true"></remember-token>

7 <login>obie</login>

8 <email>obie@example.com</email>

9 <timesheets-updated-at type="datetime" nil="true"></timesheets-updated-at>

10 <generated-at>2010-05-18T19:33:49Z</generated-at>

11 <generated-by>admin@example.com</generated-by>

12 </user>

>> print Timesheet.all.to_xml(procs: [generated_at, generated_by])

1 <?xml version="1.0" encoding="UTF-8"?>

2 <timesheets type="array">

3 <timesheet>

4 ...

5 <id type="integer">8</id>

6 <user-id type="integer">8</user-id>

7 <submitted type="boolean">true</submitted>

8 <approver-id type="integer">7</approver-id>

9 <generated-at>2010-05-18T20:18:30Z</generated-at>

10 <generated-by>admin@example.com</generated-by>

11 </timesheet>

12 <timesheet>

13 ...

14 <id type="integer">9</id>

15 <user-id type="integer">8</user-id>

16 <submitted type="boolean">false</submitted>

17 <approver-id type="integer" nil="true"></approver-id>

18 <generated-at>2010-05-18T20:18:30Z</generated-at>

19 <generated-by>admin@example.com</generated-by>

20 </timesheet>

21 <timesheet>

22 ...

23 <id type="integer">10</id>

24 <user-id type="integer">8</user-id>

25 <submitted type="boolean">false</submitted>

26 <approver-id type="integer" nil="true"></approver-id>

27 <generated-at>2010-05-18T20:18:30Z</generated-at>

28 <generated-by>admin@example.com</generated-by>

29 </timesheet>

30 </timesheets>

Note that the :procs are applied to each top-level resource in the collection (or the single resource if the top level is not a collection). Use the sample application to compare the output with the output from the following:

>> print User.all.to_xml(include: :timesheets, procs: [generated_at, generated_by])

To add custom elements only to the root node, to_xml will yield an XmlBuilder instance when given a block:

>> print(User.all.to_xml { |xml| xml.tag! 'generated-by', current_user.email })

1 <?xml version="1.0" encoding="UTF-8"?>

2 <users type="array">

3 <user>...</user>

4 <user>...</user>

5 <generated-by>admin@example.com</generated-by>

6 </users>

Unfortunately, both :procs and the optional block are hobbled by a puzzling limitation: The record being serialized is not exposed to the procs being passed in as arguments, so only data external to the object may be added in this fashion.

To gain complete control over the XML serialization of Rails objects, you need to override the to_xml method and implement it yourself.

22.1.5 Overriding to_xml

Sometimes you need to do something out of the ordinary when trying to represent data in XML form. In those situations you can create the XML by hand.

1 classUser < ActiveRecord::Base

2 ...

3 def to_xml(options = {}, &block)

4 xml = options[:builder] || ::Builder::XmlMarkup.new(options)

5 xml.instruct! unless options[:skip_instruct]

6 xml.user do

7 xml.tag!(:email, email)

8 end

9 end

10 ...

11 end

This would give the following result:

>> print User.first.to_xml

1 <?xml version="1.0" encoding="UTF-8"?><user><email>admin@example.com</email></user>

Of course, you could just go ahead and use good Object Oriented design and use a class responsible for translating between your model and an external representation.

22.2 The XML Builder

Builder::XmlMarkup is the class used internally by Rails when it needs to generate XML. When to_xml is not enough and you need to generate custom XML, you will use Builder instances directly. Fortunately, the Builder API is one of the most powerful Ruby libraries available and is very easy to use, once you get the hang of it.

The API documentation says: “All (well, almost all) methods sent to an XmlMarkup object will be translated to the equivalent XML markup. Any method with a block will be treated as an XML markup tag with nested markup in the block.”

That is a very concise way of describing how Builder works, but it is easier to understand with some examples, again taken from Builder’s API documentation. The xm variable is a Builder::XmlMarkup instance:

1 xm.em("emphasized") # => <em>emphasized</em>

2 xm.em { xm.b("emp & bold") } # => <em><b>emph & bold</b></em>


4 xm.a("foo", "href"=>"http://foo.org")

5 # => <a href="http://foo.org">foo</a>


7 xm.div { br } # => <div><br/></div>


9 xm.target("name"=>"foo", "option"=>"bar")

10 # => <target name="foo" option="bar"/>


12 xm.instruct! # <?xml version="1.0" encoding="UTF-8"?>


14 xm.html { # <html>

15 xm.head { # <head>

16 xm.title("History") # <title>History</title>

17 } # </head>

18 xm.body { # <body>

19 xm.comment! "HI" # <!-- HI -->

20 xm.h1("Header") # <h1>Header</h1>

21 xm.p("paragraph") # <p>paragraph</p>

22 } # </body>

23 } # </html>

A common use for Builder::XmlBuilder is to render XML in response to a request. Previously we talked about overriding to_xml on Active Record to generate our custom XML. Another way, though not as recommended, is to use an XML template.

We could alter our UsersController#show method to use an XML template by changing it from:

1 def UsersController < ApplicationController

2 ...

3 def show

4 @user = User.find(params[:id])

5 respond_to do |format|

6 format.html

7 format.xml { render xml: @user.to_xml }

8 end

9 end

10 ...

11 end


1 def UsersController < ApplicationController

2 ...

3 def show

4 @user = User.find(params[:id])

5 respond_to do |format|

6 format.html

7 format.xml

8 end

9 end

10 ...

11 end

Now Rails will look for a file called show.xml.builder in the app/views/users directory. That file contains Builder::XmlMarkup code like

1 xml.user { # <user>

2 xml.email @user.email # <email>...</email>

3 xml.timesheets { # <timesheets>

4 @user.timesheets.each { |timesheet| #

5 xml.timesheet { # <timesheet>

6 xml.draft timesheet.submitted? # <draft>true</draft>

7 } # </timesheet>

8 } #

9 } # </timesheets>

10 } # </user>

In this view the variable xml is an instance of Builder::XmlMarkup. Just as in views, we have access to the instance variables we set in our controller, in this case @user. Using the Builder in a view can provide a convenient way to generate XML.

22.3 Parsing XML

Ruby has a full-featured XML library named Nokogiri, and covering it in any level of detail is outside the scope of this book. If you have basic parsing needs, such as parsing responses from web services, you can use the simple XML parsing capability built into Rails.

22.3.1 Turning XML into Hashes

Rails lets you turn arbitrary snippets of XML markup into Ruby hashes, with the from_xml method that it adds to the Hash class.

To demonstrate, we’ll throw together a string of simplistic XML and turn it into a hash:

>> xml = <<-XML







1 >> Hash.from_xml(xml)

2 => {"pets"=>{"cat"=>"Franzi", "dog"=>"Susie", "horse"=>"Red"}}

There are no options for from_xml. You can also pass it an IO object:

>> Hash.from_xml(File.new('pets.xml'))

=> {"pets"=>{"cat"=>"Franzi", "dog"=>"Susie", "horse"=>"Red"}}

22.3.2 Typecasting

Typecasting is done by using a type attribute in the XML elements. For example, here’s the auto-generated XML for a User object.

>> print User.first.to_xml

1 <?xml version="1.0" encoding="UTF-8"?>

2 <user>

3 <authorized-approver type="boolean">true</authorized-approver>

4 <salt>034fbec79d0ca2cd7d892f205d56ea95174ff557</salt>

5 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>

6 <crypted-password>98dfc463d9122a1af0a5dc817601de437c69f365

7 </crypted-password>

8 <remember-token-expires-at type="datetime" nil="true" />

9 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>

10 <id type="integer">7</id>

11 <client-id type="integer" nil="true" />

12 <remember-token nil="true" />

13 <login>admin</login>

14 <email>admin@example.com</email>

15 <timesheets-updated-at type="datetime" nil="true" />

16 </user>

As part of the to_xml method, Rails sets attributes called type that identify the class of the value being serialized. If we take this XML and feed it to the from_xml method, Rails will typecast the strings to their corresponding Ruby objects:

>> Hash.from_xml(User.first.to_xml)

=> {"user"=>{"salt"=>"034fbec79d0ca2cd7d892f205d56ea95174ff557",


"created_at"=>Tue May 18 19:31:40 UTC 2010, "remember_token_expires_at"=>nil,


"updated_at"=>Tue May 18 19:31:40 UTC 2010, "id"=>7, "client_id"=>nil,

"remember_token"=>nil, "login"=>"admin", "timesheets_updated_at"=>nil,


22.4 Conclusion

In practice, the to_xml and from_xml methods meet the XML handling needs for most situations that the average Rails developer will ever encounter. Their simplicity masks a great degree of flexibility and power, and in this chapter we attempted to explain them in sufficient detail to inspire your own exploration of XML handling in the Ruby world.