Programming Ruby 1.9 & 2.0: The Pragmatic Programmers’ Guide (2013)

Part 2. Ruby in Its Setting

Chapter 16. Namespaces, Source Files, and Distribution

As your programs grow (and they all seem to grow over time), you’ll find that you’ll need to start organizing your code—simply putting everything into a single huge file becomes unworkable (and makes it hard to reuse chunks of code in other projects). So, we need to find a way to split our project into multiple files and then to knit those files together as our program runs.

There are two major aspects to this organization. The first is internal to your code: how do you prevent different things with the same name from clashing? The second area is related: how do you conveniently organize the source files in your project?

16.1 Namespaces

We’ve already encountered a way that Ruby helps you manage the names of things in your programs. If you define methods or constants in a class, Ruby ensures that their names can be used only in the context of that class (or its objects, in the case of instance methods):

	class Triangle
	SIDES = 3
	def area
	# ..
	end
	end

	class Square
	SIDES = 4
	def initialize(side_length)
	@side_length = side_length
	end
	def area
	@side_length * @side_length
	end
	end

	puts "A triangle has #{Triangle::SIDES} sides"

	sq = Square.new(3)
	puts "Area of square = #{sq.area}"

Produces:

	A triangle has 3 sides
	Area of square = 9

Both classes define a constant called SIDES and an instance method area , but these things don’t get confused. You access the instance method via objects created from the class, and you access the constant by prefixing it with the name of the class followed by a double colon. The double colon (::) is Ruby’s namespace resolution operator. The thing to the left must be a class or module, and the thing to the right is a constant defined in that class or module.[72]

So, putting code inside a module or class is a good way of separating it from other code. Ruby’s Math module is a good example—it defines constants such as Math::PI and Math::E and methods such as Math.sin and Math.cos . You can access these constants and methods via the Math module object:

	Math::E # => 2.718281828459045
	Math.sin(Math::PI/6.0) # => 0.49999999999999994

(Modules have another significant use—they implement Ruby’s mixin functionality, which we discussed Section 5.3, Mixins.)

Ruby has an interesting little secret. The names of classes and modules are themselves just constants.[73] And that means that if you define classes or modules inside other classes and modules, the names of those inner classes are just contants that follow the same namespacing rules as other constants:

	module Formatters
	class Html
	# ...
	end
	class Pdf
	# ...
	end
	end

	html_writer = Formatters::Html.new

You can nest classes and modules inside other classes and modules to any depth you want (although it’s rare to see them more than three deep).

So, now we know that we can use classes and modules to partition the names used by our programs. The second question to answer is, what do we do with the source code?

16.2 Organizing Your Source

This section covers two related issues: how do we split our source code into separate files, and where in the file system do we put those files?

Some languages, such as Java, make this easy. They dictate that each outer-level class should be in its own file and that file should be named according to the name of the class. Other languages, such as Ruby, have no rules relating source files and their content. In Ruby, you’re free to organize your code as you like.

But, in the real world, you’ll find that some kind of consistency really helps. It will make it easier for you to navigate your own projects, and it will also help when you read (or incorporate) other people’s code.

So, the Ruby community is gradually adopting a kind of de facto standard. In many ways, it follows the spirit of the Java model, but without some of the inconveniences suffered by our Java brethren. Let’s start with the basics.

Small Programs

Small, self-contained scripts can be in a single file. However, if you do this, you won’t easily be able to write automated tests for your program, because the test code won’t be able to load the file containing your source without the program itself running. So, if you want to write a small program that also has automated tests, split that program into a trivial driver that provides the external interface (the command-line part of the code) and one or more files containing the rest. Your tests can then exercise these separate files without actually running the main body of your program.

Let’s try this for real. Here’s a simple program that finds anagrams in a dictionary. Feed it one or more words, and it gives you the anagrams of each. Here’s an example:

	$ ruby anagram.rb teaching code
	Anagrams of teaching: cheating, teaching
	Anagrams of code: code, coed

If we were typing in this program for casual use, we might just enter it into a single file (perhaps anagram.rb). It would look something like this:[74]

packaging/anagram.rb
	#!/usr/bin/env ruby

	require 'optparse'

	dictionary = "/usr/share/dict/words"

	OptionParser.new do \|opts\|

	opts.banner = "Usage: anagram [ options ] word..."

	opts.on("-d", "--dict path", String, "Path to dictionary") do \|dict\|
	dictionary = dict
	end

	opts.on("-h", "--help", "Show this message") do
	puts opts
	exit
	end


	begin
	ARGV << "-h" if ARGV.empty?
	opts.parse!(ARGV)
	rescue OptionParser::ParseError => e
	STDERR.puts e.message, "\n", opts
	exit(-1)
	end
	end

	# convert "wombat" into "abmotw". All anagrams share a signature
	def signature_of(word)
	word.unpack("c").sort.pack("c")
	end

	signatures = Hash.new

	File.foreach(dictionary) do \|line\|
	word = line.chomp
	signature = signature_of(word)
	(signatures[signature] \|\|= []) << word
	end

	ARGV.each do \|word\|
	signature = signature_of(word)
	if signatures[signature]
	puts "Anagrams of #{word}: #{signatures[signature].join(', ')}"
	else
	puts "No anagrams of #{word} in #{dictionary}"
	end
	end

Then someone asks us for a copy, and we start to feel embarrassed. It has no tests, and it isn’t particularly well packaged.

Looking at the code, there are clearly three sections. The first twenty-five or so lines do option parsing, the next ten or so lines read and convert the dictionary, and the last few lines look up each command-line argument and report the result. Let’s split our file into four parts:

An option parser
A class to hold the lookup table for anagrams
A class that looks up words given on the command line
A trivial command-line interface

The first three of these are effectively library files, used by the fourth.

Where do we put all these files? The answer is driven by some strong Ruby conventions, first seen in Minero Aoki’s setup.rb and later enshrined in the RubyGems system. We’ll create a directory for our project containing (for now) three subdirectories:

	anagram/ <- top-level
	bin/ <- command-line interface goes here
	lib/ <- three library files go here
	test/ <- test files go here

Now let’s look at the library files. We know we’re going to be defining (at least) three classes. Right now, these classes will be used only inside our command-line program, but it’s conceivable that other people might want to include one or more of our libraries in their own code. This means that we should be polite and not pollute the top-level Ruby namespace with the names of all our classes and so on. We’ll create just one top-level module, Anagram, and then place all our classes inside this module. This means that the full name of (say) our options-parsing class will be Anagram::Options.

This choice informs our decision on where to put the corresponding source files. Because class Options is inside the module Anagram, it makes sense to put the corresponding file, options.rb, inside a directory named anagram/ in the lib/ directory. This helps people who read your code in the future; when they see a name like A::B::C, they know to look for c.rb in the b/ directory in the a/ directory of your library. So, we can now flesh out our directory structure with some files:

	anagram/
	bin/
	anagram <- command-line interface
	lib/
	anagram/
	finder.rb
	options.rb
	runner.rb
	test/
	... various test files

Let’s start with the option parser. Its job is to take an array of command-line options and return to us the path to the dictionary file and the list of words to look up as anagrams. The source, in lib/anagram/options.rb, looks like this: Notice how we define the Options class inside a top-level Anagram module.

packaging/anagram/lib/anagram/options.rb
	require 'optparse'

	module Anagram
	class Options
	DEFAULT_DICTIONARY = "/usr/share/dict/words"
	attr_reader :dictionary, :words_to_find

	def initialize(argv)
	@dictionary = DEFAULT_DICTIONARY
	parse(argv)
	@words_to_find = argv
	end

	private

	def parse(argv)
	OptionParser.new do \|opts\|
	opts.banner = "Usage: anagram [ options ] word..."

	opts.on("-d", "--dict path", String, "Path to dictionary") do \|dict\|
	@dictionary = dict
	end

	opts.on("-h", "--help", "Show this message") do
	puts opts
	exit
	end

	begin
	argv = ["-h"] if argv.empty?
	opts.parse!(argv)
	rescue OptionParser::ParseError => e
	STDERR.puts e.message, "\n", opts
	exit(-1)
	end
	end
	end
	end
	end

Let’s write some unit tests. This should be fairly easy, because options.rb is self-contained—the only dependency is to the standard Ruby OptionParser. We’ll use the Test::Unit framework, extended with the Shoulda gem.[75] We’ll put the source of this test in the file test/test_options.rb:

packaging/anagram/test/test_options.rb
	require 'test/unit'
	require 'shoulda'
	require_relative '../lib/anagram/options'

	class TestOptions < Test::Unit::TestCase

	context "specifying no dictionary" do
	should "return default" do
	opts = Anagram::Options.new(["someword"])
	assert_equal Anagram::Options::DEFAULT_DICTIONARY, opts.dictionary
	end
	end

	context "specifying a dictionary" do
	should "return it" do
	opts = Anagram::Options.new(["-d", "mydict", "someword"])
	assert_equal "mydict", opts.dictionary
	end
	end

	context "specifying words and no dictionary" do
	should "return the words" do
	opts = Anagram::Options.new(["word1", "word2"])
	assert_equal ["word1", "word2"], opts.words_to_find
	end
	end

	context "specifying words and a dictionary" do
	should "return the words" do
	opts = Anagram::Options.new(["-d", "mydict", "word1", "word2"])
	assert_equal ["word1", "word2"], opts.words_to_find
	end
	end
	end

The line to note in this file is as follows:

require_relative '../lib/anagram/options'

This is where we load the source of the Options class we just wrote. We use require_relative , as it always loads from a path relative to the directory of the file that invokes it.

	$ ruby test/test_options.rb
	Run options:
	# Running tests:
	....
	Finished tests in 0.010588s, 377.7862 tests/s, 377.7862 assertions/s.
	4 tests, 4 assertions, 0 failures, 0 errors, 0 skips
	ruby -v: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.2.0]

The finder code (in lib/anagram/finder.rb) is modified slightly from the original version. To make it easier to test, we’ll have the default constructor take a list of words, rather than a filename. We’ll then provide an additional factory method, from_file , that takes a filename and constructs a new Finder from that file’s contents:

packaging/anagram/lib/anagram/finder.rb
	module Anagram
	class Finder

	def self.from_file(file_name)
	new(File.readlines(file_name))
	end

	def initialize(dictionary_words)
	@signatures = Hash.new
	dictionary_words.each do \|line\|
	word = line.chomp
	signature = Finder.signature_of(word)
	(@signatures[signature] \|\|= []) << word
	end
	end

	def lookup(word)
	signature = Finder.signature_of(word)
	@signatures[signature]
	end

	def self.signature_of(word)
	word.unpack("c").sort.pack("c")
	end
	end
	end

Again, we embed the Finder class inside the top-level Anagram module. And, again, this code is self-contained, allowing us to write some simple unit tests:

packaging/anagram/test/test_finder.rb
	require 'test/unit'
	require 'shoulda'
	require_relative '../lib/anagram/finder'

	class TestFinder < Test::Unit::TestCase

	context "signature" do
	{ "cat" => "act", "act" => "act", "wombat" => "abmotw" }.each do
	\|word, signature\|
	should "be #{signature} for #{word}" do
	assert_equal signature, Anagram::Finder.signature_of(word)
	end
	end
	end

	context "lookup" do
	setup do
	@finder = Anagram::Finder.new(["cat", "wombat"])
	end

	should "return word if word given" do
	assert_equal ["cat"], @finder.lookup("cat")
	end

	should "return word if anagram given" do
	assert_equal ["cat"], @finder.lookup("act")
	assert_equal ["cat"], @finder.lookup("tca")
	end

	should "return nil if no word matches anagram" do
	assert_nil @finder.lookup("wibble")
	end
	end

	end

These go in test/test_finder.rb:

	$ ruby test/test_finder.rb
	Run options:
	# Running tests:
	......
	Finished tests in 0.009453s, 634.7191 tests/s, 740.5057 assertions/s.
	6 tests, 7 assertions, 0 failures, 0 errors, 0 skips
	ruby -v: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.2.0]

We now have all the support code in place. We just need to run it. We’ll make the command-line interface—the thing the end user actually executes—really thin. It’s in the bin/ directory in a file called anagram (no rb extension, because that would be unusual in a command).[76]

packaging/anagram/bin/anagram
	#! /usr/local/rubybook/bin/ruby
	require 'anagram/runner'

	runner = Anagram::Runner.new(ARGV)
	runner.run

The code that this script invokes (lib/anagram/runner.rb) knits our other libraries together:

packaging/anagram/lib/anagram/runner.rb
	require_relative 'finder'
	require_relative 'options'

	module Anagram
	class Runner
	def initialize(argv)
	@options = Options.new(argv)
	end

	def run
	finder = Finder.from_file(@options.dictionary)
	@options.words_to_find.each do \|word\|
	anagrams = finder.lookup(word)
	if anagrams
	puts "Anagrams of #{word}: #{anagrams.join(', ')}"
	else
	puts "No anagrams of #{word} in #{@options.dictionary}"
	end
	end
	end
	end
	end

In this case, the two libraries finder and options are in the same directory as the runner, so require_relative finds them perfectly.

Now that all our files are in place, we can run our program from the command line:

	$ ruby -I lib bin/anagram teaching code
	Anagrams of teaching: cheating, teaching
	Anagrams of code: code, coed

There’s nothing like a cheating coed teaching code.

16.3 Distributing and Installing Your Code

Now that we have our code a little tidier, it would be nice to be able to distribute it to others. We could just zip or tar it up and send them our files, but then they’d have to run the code the way we do, remembering to add the correct -I lib options and so on. They’d also have some problems if they wanted to reuse one of our library files—it would be sitting in some random directory on their hard drive, not in a standard location used by Ruby. Instead, we’re looking for a way to take our little application and install it in a standard way.

Now, Ruby already has a standard installation structure on your computer. When Ruby is installed, it puts its commands (ruby, ri, irb, and so on) into a directory of binary files. It puts its libraries into another directory tree and documentation somewhere else. So, one option would be to write an installation script that you distribute with your code that copies components of your application to the appropriate directories on the system that’s installing it.

Being a Good Packaging Citizen

So, I’ve ignored some stuff that you’d want to do before distributing your code to the world. Your distributed directory tree really should have a README file, outlining what it does and probably containing a copyright statement; an INSTALL file, giving installation instructions; and a LICENSE file, giving the license it is distributed under.

You’ll probably want to distribute some documentation, too. This would go in a directory called doc/, parallel with the bin and lib directories.

You might also want to distribute native C-language extensions with your library. These extensions would go into your project’s ext/ directory.

Using RubyGems

The RubyGems package management system (which is also just called Gems) has become the standard for distributing and managing Ruby code packages. As of Ruby 1.9, it comes bundled with Ruby itself.[77]

RubyGems is also a great way to package your own code. If you want to make your code available to the world, RubyGems is the way to go. Even if you’re just sending code to a few friends or within your company, RubyGems gives you dependency and installation management—one day you’ll be grateful for that.

RubyGems needs to know information about your project that isn’t contained in the directory structure. Instead, you have to write a short RubyGems specification: a GemSpec. Create this in a separate file named project-name.gemspec in the top-level directory of your application (in our case, the file is anagram.gemspec):

packaging/anagram/anagram.gemspec
	Gem::Specification.new do \|s\|
	s.name = "anagram"
	s.summary = "Find anagrams of words supplied on the command line"
	s.description = File.read(File.join(File.dirname(__FILE__), 'README'))
	s.requirements =
	[ 'An installed dictionary (most Unix systems have one)' ]
	s.version = "0.0.1"
	s.author = "Dave Thomas"
	s.email = "dave@pragprog.com"
	s.homepage = "http://pragdave.pragprog.com"
	s.platform = Gem::Platform::RUBY
	s.required_ruby_version = '>=1.9'
	s.files = Dir['/']
	s.executables = [ 'anagram' ]
	s.test_files = Dir["test/test.rb"*]
	s.has_rdoc = false
	end

The first line of the spec gives our gem a name. This is important—it will be used as part of the package name, and it will appear as the name of the gem when installed. Although it can be mixed case, we find that confusing, so do our poor brains a favor and use lowercase for gem names.

The version string is significant, because RubyGems will use it both for package naming and for dependency management. Stick to the x.y.z format.[78]

The platform field tells RubyGems that (in this case) our gem is pure Ruby code. It’s also possible to package (for example) Windows exe files inside a gem, in which case you’d use Gem::Platform::Win32.

The next line is also important (and oft-forgotten by package developers). Because we use require_relative , our gem will run only with Ruby 1.9 and newer.

We then tell RubyGems which files to include when creating the gem package. Here we’ve been lazy and included everything. You can be more specific.

The s.executables line tells RubyGems to install the anagram command-line script when the gem gets installed on a user’s machine.

To save space, we haven’t added RDoc documentation comments to our source files (RDoc is described in Chapter 19, Documenting Ruby). The last line of the spec tells RubyGems not to try to extract documentation when the gem is installed.

Obviously I’ve skipped a lot of details here. A full description of GemSpecs is available online,[79] along with other documents on RubyGems.[80]

Packaging Your RubyGem

Once the gem specification is complete, you’ll want to create the packaged gem file for distribution. This is as easy as navigating to the top level of your project and typing this:

	$ gem build anagram.gemspec
	WARNING: no rubyforge_project specified
	Successfully built RubyGem
	Name: anagram
	Version: 0.0.1
	File: anagram-0.0.1.gem

You’ll find you now have a file called anagram-0.0.1.gem.

	$ *ls gem**
	anagram-0.0.1.gem

You can install it:

	$ sudo gem install pkg/anagram-0.0.1.gem
	Successfully installed anagram-0.0.1
	1 gem installed

And check to see that it is there:

	$ gem list anagram -d
	* LOCAL GEMS *
	anagram (0.0.1)
	Author: Dave Thomas
	Homepage: http://pragdave.pragprog.com
	Installed at: /usr/local/lib/ruby/gems/1.9.0

	Find anagrams of words supplied on the command line

Now you can send your gem file to friends and colleagues or share it from a server. Or, you could go one better and share it from a RubyGems server.

If you have RubyGems installed on your local box, you can share them over the network to others. Simply run this:

	$ gem server
	Server started at http://[::ffff:0.0.0.0]:8808
	Server started at http://0.0.0.0:8808

This starts a server (by default on port 8808, but the --port option overrides that). Other people can connect to your server to list and retrieve RubyGems:

	$ gem list --remote --source http://dave.local:8808
	* REMOTE GEMS *
	anagram (0.0.1)
	builder (2.1.2, 0.1.1)
	..

This is particularly useful in a corporate environment.

You can speed up the serving of gems by creating a static index—see the help for gem generate_index for details.

Serving Public RubyGems

RubyGems.org ( http://rubygems.org ) has become the main repository for public Ruby libraries and projects. And, if you create a RubyGems.org account, you can push your gem file to their public servers.

	$ gem push anagram-0.0.1.gem
	Enter your RubyGems.org credentials.
	Email: dave@pragprog.com
	Password:
	Pushing gem to RubyGems.org...
	Successfully registered gem: anagram (0.0.1)

And, at that point, any Ruby user in the world can do this:

	$ gem search -r anagram
	* REMOTE GEMS *
	anagram (0.0.1)

and, even better, can do this:

$ gem install anagram

Adding Even More Automation

The Jeweler library[81] can create a new project skeleton that follows the layout guidelines in this chapter. It also provides a set of Rake tasks that will help create and manage your project as a gem.

If you’re a Rails user, you’ll have come across bundler, a utility that manages the gems used by your application. Bundler is more general than this: it can be used to manage the gems used by any piece of Ruby code.

Some folks like the extra features of these utilities, while others prefer the leaner “roll-your-own” approach. Whatever route you take, taking the time to package your applications and libraries will pay you back many times over.

See You on GitHub

Finally, if you’re developing a Ruby application or library that you’ll be sharing, you’ll probably want to store it on GitHub.[82] Although it started as a public Git repository, GitHub is now a community in its own right. It’s a home away from home for many in the Ruby community.

Footnotes

[72]	The thing to the right of the :: can also be a class or module method, but this use is falling out of favor—using a period makes it clearer that it’s just a regular old method call.
[73]	Remember that we said that most everything in Ruby is an object. Well, classes and modules are, too. The name that you use for a class, such as String, is really just a Ruby constant containing the object representing that class.
[74]	You might be wondering about the line word.unpack("c").sort.pack("c"). This uses the function unpack to break a string into an array of characters, which are then sorted and packed back into a string.
[75]	We talk about Shoulda in the Unit Testing chapter.
[76]	If you’re on Windows, you might want to wrap the invocation of this in a cmd file.
[77]	Prior to RubyGems, folks often distibuted a tool called setup.rb with their libraries. This would install the library into the standard Ruby directory structure on a user’s machine.
[78]	And read http://www.rubygems.org/read/chapter/7 for information on what the numbers mean.
[79]	http://www.rubygems.org/read/book/4
[80]	http://www.rubygems.org/
[81]	http://github.com/technicalpickles/jeweler
[82]	http://github.com