Test, Test, Test - Build Awesome Command-Line Applications in Ruby 2: Control Your Computer, Simplify Your Life (2013)

Build Awesome Command-Line Applications in Ruby 2: Control Your Computer, Simplify Your Life (2013)

Chapter 8. Test, Test, Test

Writing perfect code isn’t easy. It might even be impossible. That’s why seasoned developers—including command-line application developers—write tests. Tests are the best tool we have to make sure our applications perform flawlessly, and the Ruby community is especially friendly to testing, thanks to the culture and tools established by Ruby on Rails.

Command-line applications often interact with various systems and environments, which produces a unique set of challenges for testing. If you’ve done any web application development, you are probably accustomed to having different “tiers,” such as a development tier and a testing tier. These tiers are complete systems, often using many servers to provide an environment for isolated testing. This is impractical for command-line apps, so we tend to develop command-line apps on the system where they are intended to run. Therefore, if we wanted to test adding a task to our task list using todo, it would add a task to our actual task list, if we didn’t take steps to keep it from doing so.

What we’ll learn here is how to write and run tests for our command-line apps, as well as some techniques to keep our tests from causing problems on our system. We’ll do this by combining two types of tests: unit tests and acceptance tests. You are probably familiar with unit testing, which is useful in testing the small bits of logic that comprise our application. Ruby’s standard library includes all the tools we’ll need.

Acceptance testing takes the opposite approach. Acceptance tests simulate real user behavior and exercise the entire system. We’ll learn about this type of testing by using the popular Cucumber testing tool, along with Aruba, which is an extension to Cucumber designed to help test command-line applications. Acceptance tests are a good place to start because of the user-centered approach, so let’s jump in and see how they work. After we have a good set of acceptance tests, we’ll turn our attention to unit tests to test edge cases that might be hard to simulate using acceptance tests.

8.1 Testing User Behavior with Acceptance Tests

Unlike web applications, command-line apps have simple user interfaces, and their output is less complex. This makes the job of simulating input and testing output relatively simple. The problems come in when we consider what our apps do. db_backup.rb takes a long time to do its work and requires access to a database. todo makes irreversible changes to its task list. We need a way to run our apps that mimics as closely as possible the “real-world” scenarios they were written to solve but in a repeatable and predictable way that doesn’t cause permanent changes to our environment.

Rather than solve this problem at the same we learn about the mechanics of testing, let’s take things one step at a time. If you recall, todo takes a global option that controls where the task list is. We can use that to keep our tests from messing with our personal task list in our home directory. Let’s use that to get some tests going. After we see how to test our app in general, we’ll discuss some techniques to deal with the “tests messing with our personal task list” issue.

Understanding Acceptance Tests

Acceptance tests are tests that we can use to confirm that an app properly implements certain features, from a user’s perspective. Acceptance tests typically test only the subset of the actions users are likely to attempt (the so-called happy path), and they don’t cover uncommon edge cases. What acceptance tests should do is to simulate the ways users are most likely to employ our tool to do their job. A todo user, for example, is likely to do the following:

· Add a new task

· List tasks

· Complete a task

· Get help

Each task maps to a command that todo accepts. To test these, we need to use todo just as a user would. We can also use todo to verify its own behavior. For example, we can execute a todo list, capture the list of tasks, add a new task via todo new, and then list the tasks again, this time looking for our new task. We could use the same technique to test todo done.

While we could create another command-line app to run these tests, we don’t need to do so. The acceptance testing tool Cucumber (explained in great detail in The RSpec Book [CADH09]) can handle the basic infrastructure for running tests, and the Cucumber add-on library Aruba will provide us with the tools we need to run our app and verify its behavior. First we’ll see how these tools work, then we’ll set them up, and finally we’ll write our tests.

Understanding Cucumber and Aruba

Acceptance tests written in Cucumber don’t look like the tests you might be used to seeing. They’re written in what looks like plain English.[45] With Cucumber, you describe the behavior you want to test, in English, and then write code that runs under the covers to execute the procedure you have specified.

Now, you can’t just write free-form text; there is a structure that you’ll need to follow to make this work. Cucumber delineates tests into features , which contain multiple scenarios . A feature is what it sounds like: a feature of your application. “Adding a task” is a feature. A scenario exercises an aspect of a feature. For example, we might have a scenario to add a task to an existing task list and another to add a task for the very first time.

In Cucumber, you describe the feature in free-form English; this part of the test is mere documentation. The scenarios, however, must follow a strict format. Each scenario has a one-line description and is followed by steps . Steps start with “Given,” “When,” “Then,” “And,” or “But,” for example “Given the file /tmp/todo.txt exists” or “Then the output should contain Hello.” These steps are what Cucumber will actually execute to run your test. Before we get into the weeds of how this works under the covers, let’s look at a simple feature and scenario for todo:

tolerate_gracefully/todo/features/todo.feature

Feature: We can add new tasks

As a busy developer with a lot of things to do

I want to keep a list of tasks I need to work on

Scenario: Add a new task

Given the file "/tmp/todo.txt" doesn't exist

When I successfully run `todo -f /tmp/todo.txt new 'Some new task'`

Then I successfully run `todo -f /tmp/todo.txt list`

And the stdout should contain "Some new task"

As you can see, we’ve followed Cucumber’s particular format, but the test is written in plain English. You could take these instructions and manually execute them on the command line to check that adding a task works correctly.

Seeing the text of this test gives us some insight as to the use of “Given,” “When,” and “Then.” These three words help to differentiate the three main parts of any good test: setup, action, and verification. More precisely:

Given

This sets up the conditions of the test. Most tests operate under a set of assumptions, and each “Given” step establishes what those are. In our case, we are establishing that the task list doesn’t exist so that when we later add a task to it, we can be sure that our action had an effect (if we didn’t have this bit of setup, a tasklist might exist that contains the task we’re adding, resulting in our test always passing, even if our app was broken).

When

This performs the action or actions under test. In our case, we run todo new to, ideally, add a new task.

Then

This verifies that our action taken in a “When” had the desired outcome. For our earlier scenario, we verify that todo new worked by running todo list and examining its output.

And or But

These two words can be used anywhere and “extend” the keyword they follow. We’re using that in our “Then” section, because we need two steps to perform a full verification. The use of “And” over “But” is purely cosmetic; use whatever “reads” the best to you.

Back to our scenario, we now have a set of steps that, as we mentioned, we could manually execute to test our app. Of course, we don’t want to have to run our tests manually; we’re using Cucumber to automate all of this, so let’s dig a bit deeper to see how Cucumber can run this test. We’ll get into the specifics of where files go and what their names are, so for now, let’s assume that we can ask Cucumber to run this feature for us.

The first step of our scenario is Given the file "/tmp/todo.txt" doesn’t exist. This is a setup step that ensures we don’t have a tasklist sitting around from a previous test run that might already have our about-to-be-added task in it. Cucumber doesn’t know how to perform this setup step, so we need to give it the code to do so. We’ll define the step using the method Given , provided by Cucumber, that takes a regexp and a block. If the text of our step matches that regexp, the block is executed. Let’s see this step’s definition:

tolerate_gracefully/todo/features/step_definitions/cli_steps.rb

Given /^the file "([^"]*)" doesn't exist$/ do |file|

FileUtils.rm(file) if File.exists? file

end

You’ll notice that the regexp has a capture in it (this part: ([^"]*)). Cucumber will extract whatever matched that capture and provide it as an argument to our block. This means we could write another step, Given the file "/tmp/some_other_file.txt" doesn’t exist, and this step definition would work for it, too. Using this technique, you could build up a library of reusable step definitions. This is exactly what Aruba is: a set of general-purpose step definitions for writing scenarios to test command-line apps.

Using Aruba on Windows

As of this writing, Aruba does not work “out of the box” on Windows. Discussions on the Internet involve varying degrees of functionality, but there doesn’t seem to be a strong consensus that Aruba works on Windows.

If you are adventurous, I encourage you to attempt to get the Cucumber tests in this section passing on Windows and submit a patch. Windows is an important operating system for the Ruby community that is, currently, sorely underrepresented in compatibility. This particular application, spawning processes and monitoring their behavior, is particularly tricky, because of the large differences between UNIX and Windows. Since OS X is based on UNIX, Macs tend to work just fine.

I still encourage you to follow along with this section, because it will teach you solid principles on acceptance testing your apps; you may need to take a different approach until Aruba is more compatible with Windows.

The next two steps of our scenario, both of the form “I successfully run ‘some command,’” are defined by Aruba, meaning we don’t have to provide step definitions for those steps. This also demonstrates an interesting aspect of Cucumber. At a code level, Given, When, Then, And, and But are all treated the same. We could have every step start with “And” and Cucumber wouldn’t care; these keywords are for human eyes, not the computer. Further, when we define steps, Cucumber provides Given , When , and Then to do so, but the effect is the same regardless of which one we use. This is how we’re able to use the same step as both a “When” and a “Then.”

Coming back to the Aruba-provided step “I successfully run ‘some command,’” this does two things: it executes the command in the backticks, and it checks its exit status. If it’s zero, the test will proceed. If it’s nonzero, the test will halt with a failure (there is a less stringent version provided, “I run ‘some_command,’” that will just run the command and not check the exit status). The final step is also defined by Aruba and allows us to assert that the standard output of our app’s run contains a particular string.

Now that you have an idea of what Cucumber is and a general sense of how it works, let’s actually get it set up for our project so we can run this feature.

Installing and Setting Up Cucumber and Aruba

To install Cucumber and Aruba, we need only add the aruba gem to our gemspec (as a development dependency; our app doesn’t require aruba to run). Because Aruba depends on Cucumber, when you update your gems with Bundler, Cucumber will be installed automatically (see the note on Windows at Using Aruba on Windows). Here’s the updated gemspec:

tolerate_gracefully/todo/todo.gemspec

spec = Gem::Specification.new do |s|

s.name = 'todo'

s.version = Todo::VERSION

# rest of the gemspec...

s.bindir = 'bin'

s.executables << 'todo'

*

s.add_development_dependency('aruba', '~> 0.5.3')

s.add_dependency('gli')

end

Now, we tell Bundler to make sure our development dependencies are up-to-date:

$ bundle install

Resolving dependencies...

Using ffi (1.9.0)

Using childprocess (0.3.9)

Using builder (3.2.2)

Using diff-lcs (1.2.4)

Using multi_json (1.8.0)

Using gherkin (2.12.1)

Using multi_test (0.0.2)

Using cucumber (1.3.8)

Using rspec-expectations (2.14.3)

Using aruba (0.5.3)

Using gli (2.8.0)

Using todo (0.0.1) from source at .

Using bundler (1.3.5)

Your bundle is complete!

Use `bundle show [gemname]` to see where a bundled gem is installed.

Cucumber has a conventional file structure where the directory features (off of the project’s root directory) contains the feature files containing our scenarios. Inside that directory, the directory step_definitions contains the Ruby code to define our steps. The names of the files don’t matter; everything with an rb extension will be loaded. Finally, any Cucumber configuration goes in support/env.rb. We’ll see what this file is for later.

Finally, we need to add a task to our Rakefile to allow us to run the Cucumber scenarios:

tolerate_gracefully/todo/Rakefile

require 'cucumber'

require 'cucumber/rake/task'

Cucumber::Rake::Task.new(:features) do |t|

t.cucumber_opts = "features --format pretty -x"

t.fork = false

end

Now that we’ve seen what our tests look like, implemented the needed steps, and installed all the software we need, let’s run our tests.

Running Cucumber Tests

The task we added to our Rakefile creates a task named “features” that will run our Cucumber tests. Let’s run it now:

$ rake features

Feature: We can add new tasks

As a busy developer with a lot of things to do

I want to keep a list of tasks I need to work on

Scenario: Add a new task

Given the file "/tmp/todo.txt" doesn't exist

When I successfully run

`todo --filename=/tmp/todo.txt new 'Some new todo item'`

And I successfully run `todo --filename=/tmp/todo.txt list`

Then the stdout should contain "Some new todo item"

1 scenarios (1 passed)

4 steps (4 passed)

0m0.563s

If you’re running this locally, you’ll notice that the output is green. This means that everything is working and our test passed. Let’s introduce a bug to see what happens when our tests fail. Here’s our original, correct, CSV formatting code that we saw in Chapter 4, Play Well with Others:

play_well/todo/bin/todo

complete_flag = completed ? "C" : "U"

printf("%d,%s,%s,%s,%s\n",index,name,complete_flag,created,completed)

We’ll uppercase the name of the task in our CSV output, which should cause our test to fail, like so:

elsif options[:format] == 'csv'

# Use the machine-readable CSV format

complete_flag = completed ? "C" : "U"

*

printf("%d,%s,%s,%s,%s\n",index,name.upcase,complete_flag,created,completed)

end

Now, when we run our feature, one of the steps in our scenario fails:

$ rake features

Scenario: Add a new task

Given the file "/tmp/todo.txt" doesn't exist

When I successfully run

`todo --filename=/tmp/todo.txt new 'Some new todo item'`

And I successfully run `todo --filename=/tmp/todo.txt list`

Then the stdout should contain "Some new todo item"

expected "1,SOME NEW TODO ITEM,U,Thu Sep 22 08:44:02 -0400 2011,\n"

to include "Some new todo item"

Diff:

@@ -1,2 +1,2 @@

-Some new todo item

+1,SOME NEW TODO ITEM,U,Thu Sep 22 08:44:02 -0400 2011,

(RSpec::Expectations::ExpectationNotMetError)

features/todo.feature:17:in `Then the stdout should

contain "Some new todo item"'

Failing Scenarios:

cucumber features/todo.feature:13 # Scenario: Add a new task

1 scenarios (1 failed)

4 steps (1 failed, 3 passed)

0m0.576s

rake aborted!

Cucumber failed

If you’re running this locally, all of the error text will be displayed in red, giving a clear indication that something is wrong. If you look closely, you’ll notice that Aruba has provided us with a diff of the expected output and received output, making it fairly easy to see the problem that caused our test to fail.

Testing Complex Behavior

Now that we have a basic path tested, let’s see how to test something a bit trickier: the default location of the task list. In our previous scenario, we used the --filename option to explicitly control where todo looked for the task list. This is important, because we don’t want our tests to mess with our actual task list, which lives in our home directory. Nevertheless, we do need to test that todo correctly uses the task list in our home directory by default. This presents us with a problem.

Testing db_backup.rb presents a similar problem; we need a real database to back up, and backing up a database potentially takes a long time. These are two examples of the challenges we face when testing command-line apps. There’s no silver bullet to solve these, but if we think creatively, we can handle most of them. To gain insight into how to approach problems like this in the future, let’s write tests for both todo’s task list defaulting to our home directory and db_backup.rb backing up a real database.

Testing Access to the Home Directory

It’s great that we have the --filename flag to todo; we can get a lot of test coverage without worrying about files in our actual home directory. We do need to verify that the default location for the task list gets used. How can we do this without having our tests modify our actual task list?

First let’s write the scenario to test what we want and work from there.

tolerate_gracefully/todo/features/todo.feature

Scenario: The task list is in our home directory by default

Given there is no task list in my home directory

When I successfully run `todo new 'Some new todo item'`

Then the task list should exist in my home directory

When I successfully run `todo list`

Then the stdout should contain "Some new todo item"

This should be a good test of the default location; we omit the --filename options, check that a file exists in our home directory, and then use the list command to make sure we’re reading from the right place. Before we see how to keep our home directory safe from our tests, let’s define all of our steps.

There are two steps in this scenario that we don’t have defined. We have steps similar to them; we’ve implemented the file "xxx" doesn’t exist, and Aruba provides the step a file named "xxx" should exist. We can use these steps when making our own, like so:

tolerate_gracefully/todo/features/step_definitions/cli_steps.rb

Given /^there is no task list in my home directory$/ do

step %(the file "#{ENV['HOME']}/.todo.txt" doesn't exist)

end

Then /^the task list should exist in my home directory$/ do

step %(a file named "#{ENV['HOME']}/.todo.txt" should exist)

end

You’ll note that we’re using ENV[’HOME’], which is how we access the HOME environment variable. The system sets this variable to the user’s home directory (even on Windows). Assuming that our app uses this to access the user’s home directory, we can change its value to another directory that we control. Our tests and the app are still accessing “the user’s home directory” in a canonical way, but we can control the contents of that location.

Since apps that Aruba runs inherit the environment of the tests, all we need to do is modify the value of ENV[’HOME’] before our scenario runs (and restore its correct value after the scenario exits). Cucumber provides hooks to do just that. The methods Before and After both accept blocks that will execute before and after (respectively) every scenario.

tolerate_gracefully/todo/features/support/env.rb

require 'aruba/cucumber'

require 'fileutils'

Before do

@real_home = ENV['HOME']

fake_home = File.join('/tmp','fake_home')

FileUtils.rm_rf fake_home, secure: true

ENV['HOME'] = fake_home

end

After do

ENV['HOME'] = @real_home

end

As you can see, we create a fresh, empty directory and point ENV[’HOME’] to it. Our app uses that same variable like so:

tolerate_gracefully/todo/bin/todo

desc "Path to the todo file"

arg_name "todo_file"

*

default_value File.join(ENV['HOME'],'.todo.txt')

flag [:f,:filename]

So, we’re able to verify the logic of the task list defaulting to our home directory, without actually using our home directory. Since the location of the “home directory” is really just shorthand for “whatever directory is in ENV[’HOME’],” we now have test coverage without worrying that our personal task list will be touched. Let’s run our new test and make sure it passes. To prove that we aren’t touching our home directory, we’ll list our actual task list before and after running our feature.

$ todo list

1 - Design database schema

Created: Sun Oct 02 08:06:12 -0500 2011

2 - Get access to production logs

Created: Sun Oct 02 08:06:12 -0500 2011

3 - Code Review

Created: Sun Oct 02 08:06:12 -0500 2011

$ rake features

Feature: We can add new tasks

As a busy developer with a lot of things to do

I want to keep a list of tasks I need to work on

Scenario: The task list is in our home directory by default

Given there is no task list in my home directory

When I successfully run `todo new 'Some new todo item'`

Then the task list should exist in my home directory

When I successfully run `todo list`

Then the stdout should contain "Some new todo item"

1 scenarios (1 passed)

5 steps (5 passed)

0m0.793s

$ todo list

1 - Design database schema

Created: Sun Oct 02 08:06:12 -0500 2011

2 - Get access to production logs

Created: Sun Oct 02 08:06:12 -0500 2011

3 - Code Review

Created: Sun Oct 02 08:06:12 -0500 2011

Our test passed, but our task list wasn’t modified—everything worked!

Manipulating the environment is a great technique for testing behavior like this; the environment works as a “middleman” that allows us to change things (like the location of the user’s home directory) without affecting the code of our tests or our app. What about testing that certain external commands were called, as in the case of db_backup.rb?

Testing Execution of External Commands

db_backup.rb is basically a specialized wrapper around mysqldump. If we ran db_backup.rb from a Cucumber test as is, it would require a live database and would perform an actual backup. This could be a problem, especially if we ask it to back up a particularly large database.

We could use environment variables again, by setting a special variable that tells db_backup.rb to not actually call mysqldump but instead just print out the command it would normally run.

def run(command,exit_on_error_with)

puts "Running '#{command}'"

*

unless ENV['DONT_RUN']

stdout_str, stderr_str, status = Open3.capture3(command)

puts stdout_str

unless status.success?

STDERR.puts "There was a problem running '#{command}'"

STDERR.puts stderr_str

exit exit_on_error_with

end

end

end

This isn’t a very good technique; we want to test that we’re calling mysqldump appropriately, and doing something like this skips it entirely. We really should test the entire system from end to end.

An acceptance test of the complete system will give you the best idea of how the app will behave in the hands of users. It’s also the most difficult to set up, since it requires a completely controlled testing environment. For the purposes of db_backup.rb, we can set up a database for testing, populate it with a small amount of data, and then run our app, checking that it did what we expected.

First let’s write out our scenarios. In this case, we’ll run two tests: one for the normal use (where the backup is compressed) and one where we do not compress the backup.

tolerate_gracefully/db_backup/features/system_test.feature

Scenario: End-to-end test using a real database

Given the database backup_test exists

When I successfully run `db_backup.rb --force -u root backup_test`

Then the backup file should be gzipped

Scenario: End-to-end test using a real database, skipping gzip

Given the database backup_test exists

When I successfully run `db_backup.rb --force -u root --no-gzip backup_test`

Then the backup file should NOT be gzipped

This should look familiar by now. As we’ve seen, Aruba provides the second step of these scenarios for us, but the rest we have to define. First we’ll define our “Given,” which sets the stage for our test. We need to set up an entire database in MySQL with some data in it. To do that, we’ll create a sql file that will set up our database, create a table, and insert some data:

tolerate_gracefully/db_backup/setup_test.sql

dropdatabaseifexists backup_test;

createdatabase backup_test;

use backup_test;

createtable test_table(

id int,

name varchar(255)

);

insertinto test_table(id,name) values (1,'Dave'), (2, 'Amy'), (3,'Rudy');

We include that in our project and can now reference it in our step definition, which looks like so:

tolerate_gracefully/db_backup/features/step_definitions/system_test_steps.rb

MYSQL = ENV['DB_BACKUP_MYSQL'] || '/usr/local/bin/mysql'

USER = ENV['DB_BACKUP_USER'] || 'root'

Given /^the database backup_test exists$/ do

test_sql_file = File.join(File.dirname(__FILE__),'..','..','setup_test.sql')

command = "#{MYSQL} -u#{USER} < #{test_sql_file}"

stdout,stderr,status = Open3.capture3(command)

unless status.success?

raise "Problem running #{command}, stderr was:\n#{stderr}"

end

end

The first two lines set up some defaults for how we’re going to run mysql to load the database. In our case, we set the location of the mysql executable and the username we’d like to use when loading the data. We allow this to be overridden via an environment variable so that other developers who might have a different setup can run these tests. This shows some of the complication involved in doing true end-to-end tests. These environment variables should definitely be documented in our README file.

Next, we need to define the steps for our two “Thens” (that the backup file should, or should not, be gzipped). To do that, we’ll construct the filename we expect db_backup.rb to output, using a gzip extension or not, depending on the test. Since the filename will contain the current date, we’ll need to construct the filename dynamically:

tolerate_gracefully/db_backup/features/step_definitions/system_test_steps.rb

def expected_filename

now = Time.now

sprintf("backup_test-%4d-%02d-%02d.sql",now.year,now.month,now.day)

end

Then /^the backup file should be gzipped$/ do

step %(a file named "#{expected_filename}.gz" should exist)

end

Then /^the backup file should NOT be gzipped$/ do

now = Time.now

step %(a file named "#{expected_filename}" should exist)

end

Using the expected_filename method, we can defer to an Aruba-provided step a file named "xxx" should exist to perform the actual check. Now, when we run our tests, everything passes:

$ rake features

Feature: Do a complete system test

Scenario: End-to-end test using a real database

Given the database backup_test exists

When I successfully run `db_backup.rb --force -u root backup_test`

Then the backup file should be gzipped

Scenario: End-to-end test using a real database, skipping gzip

Given the database backup_test exists

When I successfully run `db_backup.rb --force -u root --no-gzip backup_test`

Then the backup file should NOT be gzipped

2 scenarios (2 passed)

6 steps (6 passed)

0m0.745s

It’s good to be able to actually run our apps in the exact way a user would; however, it’s not always going to be possible. Even for something as simple to set up as db_backup.rb, it’s still difficult. Another developer will need to set up a MySQL database and make sure it’s running, just to run our tests.

You may be creating an even more complex app that interacts with other systems. If you can’t figure out a way to set up a good test environment with Aruba, it’s still worth writing the automated test but keeping it out of the list of features you run regularly. (Setting this up can be done with the “tags” feature of Cucumber. The documentation[46] should be able to get you started.) Whenever you are ready to release a new version or just want to do a full system test, you can manually set up the proper conditions and have Cucumber run your system test. It’s not ideal, but it works for complex apps that can’t easily be tested like this.

Everything we’ve talked about up to now has focused on testing our command-line apps by running them the way a user would. This gives us a clear picture of how an app is supposed to work, and we could even use our Cucumber features as supplemental documentation! Where the use of Cucumber starts to break down is when we need to test edge cases. Consider todo. What if the to-do list isn’t formatted correctly? What if the task list file isn’t writable when we add a new task? What would the app do? What should it do?

We went through a fair amount of effort faking out our home directory in testing todo (not to mention the effort we went to in order to test db_backup.rb!). It’s going to be even more difficult to set up the conditions that simulate every edge case; it may not even be possible. We still want to test these edge cases, but we aren’t going to be able to do it at the acceptance test level using Cucumber. We need to break down our code into smaller, testable units. When we do that, we can test bits of logic in isolation and can more easily simulate some strange error conditions simply in code. These types of tests are called unit tests .

8.2 Testing in Isolation with Unit Tests

In addition to allowing greater flexibility in simulating edge cases, unit tests have two other advantages: they run very quickly, since they don’t require any setup outside of Ruby (such as files, databases, and so on), and they force us to organize our code into small, testable units. Faster tests are good, since we can more quickly see the health of our app and more quickly and frequently run tests when writing new features and fixing bugs. Having small testable units is good, too, because it means our app will be easier to understand and maintain; instead of a big long block of code, we’ll have small units that do simple things, all glued together to form the app.

To run unit tests, we’ll need to break our code into units that can be tested. Since we have a few tests in place via Cucumber, we’ll have some assurance that the code changes we’re about to make didn’t break anything. This process is called refactoring and is very difficult to do without a suite of tests. Let’s focus on todo and extract code out of bin/todo and into some files in lib that bin/todo can include. Our soon-to-be-created unit tests can then include these files for testing without having to execute todo.

Extracting Units from Existing Code

The source code for todo is organized around the GLI command methods and blocks. Each action block contains the core logic of our to-do app. This is the logic we need to extract so our unit tests can execute it. Here’s the action block for the new command:

tolerate_gracefully/todo/bin/todo

c.action do |global_options,options,task_names|

File.open(global_options[:filename],'a+') do |todo_file|

if task_names.empty?

puts "Reading new tasks from stdin..."

task_names = STDIN.readlines.map { |a| a.chomp }

end

tasks = 0

task_names.each do |task|

todo_file.puts [task,Time.now].join(',')

tasks += 1

end

if tasks == 0

raise "You must provide tasks on the command-line or standard input"

end

end

end

Since Ruby is an object-oriented language, it makes sense to put the code currently in the action block into a class or module and then use it inside the action block. Ultimately, we’d want a class named Task that handled all things task-related, but let’s take things one step at a time. All we need is to move the code out of our executable, so we’ll create a module named Todo and create a method called new_task inside it. new_task will be a straight copy of the code from our action block:

tolerate_gracefully/todo_unit_tests/lib/todo/todo.rb

module Todo

def new_task(filename,task_names)

File.open(filename,'a+') do |todo_file|

tasks = 0

task_names.each do |task|

todo_file.puts [task,Time.now].join(',')

tasks += 1

end

if tasks == 0

raise "You must provide tasks on the command-line or standard input"

end

end

end

end

In Ruby, a module can be used for many things, but here, we’re using it as a place where code can live that isn’t naturally part of a class. Later in the book, we’ll make a proper class and have a better OO design for todo, but for now, a module will accomplish our goal of unit testing this code.

Next, we need to remove this code from bin/todo, require lib/todo/todo.rb, and include the Todo module so we can access our new_task method.

tolerate_gracefully/todo_unit_tests/bin/todo

*

$LOAD_PATH << File.expand_path(File.dirname(__FILE__) + '/../lib')

require 'gli'

require 'todo/version'

*

require 'todo/todo.rb'

*

include Todo

The first highlighted line isn’t new (GLI included it for us when we first generated our project), but it’s worth pointing out because this is how bin/todo will be able to access our newly extracted code that lives in lib/todo/task.rb. All we’re doing is placing our lib directory into the load path (accessible in Ruby via $LOAD_PATH).

The next highlighted line requires our code, while the following includes into the current context. This means that any method defined in the Todo module is now available directly for use in our code. Now we can use it in the action block for the new command:

tolerate_gracefully/todo_unit_tests/bin/todo

c.action do |global_options,options,task_names|

if task_names.empty?

puts "Reading new tasks from stdin..."

task_names = STDIN.readlines.map { |a| a.chomp }

end

*

new_task(global_options[:filename],task_names)

When we run our Cucumber tests via rake features, we’ll see that all the tests are still green (and thus still passing). This means that our app is still working in light of this fairly major change in its structure. Now, we can start testing this code.

Setting Up Our Environment to Run Unit Tests

GLI gave us the files and Rake tasks we need to start running unit tests, but let’s go over the basics so you can set up unit testing for any project. We’ll need to do two things: configure our Rakefile to run unit tests and create one or more files that contain our unit tests.

Setting it up in our Rakefile is simple, since rake includes the class Rake::TestTask, which sets up a rake task for running unit tests. We simply require the right module and set it up like so:

tolerate_gracefully/todo_unit_tests/Rakefile

require 'rake/testtask'

Rake::TestTask.new do |t|

t.libs << "test"

t.test_files = FileList['test/tc_*.rb']

end

We can now run unit tests in any file in the directory test that starts with the prefix tc_ by typing rake test.

Next, we need to create at least one file to run. All we have to do is create a file that contains a class that extends Test::Unit::TestCase and has at least one method that starts with the prefix test_. When we do this, rake test will run each test_ method as a unit test. Let’s see it in action:

require 'test/unit'

class TaskTest < Test::Unit::TestCase

def test_that_passes

assert true

end

def test_that_fails

assert false

end

end

Now, we run our two tests and see what happens:

$ rake test

Started

F.

Finished in 0.003429 seconds.

1) Failure:

test_that_fails(TaskTest)

./test/tc_task.rb:36:in `test_that_fails'

<false> is not true.

2 tests, 2 assertions, 1 failures, 0 errors

rake aborted!

(See full trace by running task with --trace)

One test passed, and the other failed. This gives us an idea of what to expect when writing and running unit tests. Let’s remove these fake tests and write a real test for the code we just extracted.

Writing Unit Tests

Unlike our acceptance tests, our unit tests should not interact with the outside world; we want to test our code in complete isolation. This is the only way we can be sure that every aspect of the tests we write can be controlled. For our purposes here, it means we have to figure out how to prevent the call to File.open from opening an actual file on the filesystem.

What we’ll do is stub the open call. Stubbing is a way to change the behavior of a method temporarily so that it behaves in a predictable way as part of a unit test. The open source library Mocha[47] allows us to do just that. When we include it in our tests, Mocha adds a stubs method to every single object in Ruby (including the class object File) that allows us to replace the default behavior of any method with new behavior.

To see this in action, let’s test that add_task raises an exception when no tasks are passed in. Since File.open takes a block and we need that block to execute (that’s where all the code in add_task is), we’ll use the method yields , provided by Mocha, to stub open so that it simply executes the block it was given. We’ll then pass in an empty array to new_task and use the method assert_raises , provided by Test::Unit, to assert that new_task raises an exception.

tolerate_gracefully/todo_unit_tests/test/tc_task.rb

include Todo

def test_raises_error_when_no_tasks

File.stubs(:open).yields("")

ex = assert_raises RuntimeError do

new_task("foo.txt",[])

end

expected = "You must provide tasks on the command-line or standard input"

assert_equal expected, ex.message

end

Notice how we also include the Todo module here so that our tests have access to the method we’re testing. Back to the code, the first line of our test method does the stubbing. This tells File that when someone calls open to yield the empty string to the block given to open , instead of doing what open normally does. In effect, the variable todo_file in new_task will be set to the empty string instead of a File. This isn’t a problem, since the path through the code we’re simulating won’t call any methods on it. Instead, new_task will realize that no tasks were added and raise an exception.

assert_raises verifies that the code inside the block given to it raises a RuntimeError. It also returns the instance of the exception that was thrown. We then make sure that the message of that exception matches the message we expect.

Since we’ve replaced open with our stub during the test, we also need to restore it back to normal once our test has run. Test::Unit will run the method teardown in our test class after each run (even if the test itself fails), so we can put this code there, using Mocha’s unstub method to remove any stubs we’ve created.

tolerate_gracefully/todo_unit_tests/test/tc_task.rb

def teardown

File.unstub(:open)

end

Now we can run our test and see what happens:

$ rake test

Started

.

Finished in 0.000577 seconds.

1 tests, 2 assertions, 0 failures, 0 errors

Everything passed! Now that we know how to fake out File.open , we can write a complete set of tests for our new_task method. There are two major cases we need to cover: the normal execution of adding a new task and the case where we don’t have permissions to read the file.

To test the normal case, we need to verify that the tasks we pass to new_task are written to the file. Since we don’t want to actually write to the file, we’ll use our newfound ability to stub the File.open method to capture what new_task writes out. We’ll do this by yielding an instance of StringIO to the block given to File.open . StringIO looks and acts just like a real file, but it saves its data internally and not on the filesystem. We can pull that data out and examine it. That’s exactly what we need to do, so let’s see the test:

tolerate_gracefully/todo_unit_tests/test/tc_task.rb

def test_proper_working

string_io = StringIO.new

File.stubs(:open).yields(string_io)

new_task("foo.txt",["This is a task"])

assert_match /^This is a task,/,string_io.string

end

When we call add_task now, the file that gets yielded will be our variable string_io. When add_task calls puts on it, it saves the string internally, which we can then examine via the string method on string_io. We assert that that string matches a regular expression containing our task name (we use a regexp here because the current date/time will also be written out).

Let’s run this test and see what happens:

$ rake test

Started

..

Finished in 0.001007 seconds.

2 tests, 3 assertions, 0 failures, 0 errors

This test also passed. To prove that File.open did not create a file, we’ll see if foo.txt is in our current directory:

$ ls foo.txt

ls: foo.txt: No such file or directory

The last case is the trickiest one and is the reason we’ve started writing unit tests; we want to make sure add_task gives a reasonable error message when the task list file cannot be written to. If this happened in real life, File.open would throw an Errno::EPERM exception. This exception gets its name from the C standard library’s constant for a lack of permissions. We’ll stub File.open to throw that error. We don’t want add_task to throw that exception, however. We want it to throw a RuntimeError, and we want that exception to have a useful message, including the message from the underlying exception. Here’s the test:

tolerate_gracefully/todo_unit_tests/test/tc_task.rb

def test_cannot_open_file

ex_msg = "Operation not permitted"

File.stubs(:open).raises(Errno::EPERM.new(ex_msg))

ex = assert_raises RuntimeError do

new_task("foo.txt",["This is a task"])

end

assert_match /^Couldn't open foo.txt for appending: #{ex_msg}/,ex.message

end

Now, when we run our unit test, it fails:

$ rake test

Started

F..

Finished in 0.008249 seconds.

1) Failure:

test_error(TaskTest)

[./test/tc_task.rb:44:in `test_error'

mocha/integration/test_unit/ruby_version_186_and_above.rb:22:in `__send__'

mocha/integration/test_unit/ruby_version_186_and_above.rb:22:in `run']:

<RuntimeError> exception expected but was

Class: <Errno::EPERM>

Message: <"Operation not permitted">

---Backtrace---

lib/mocha/exception_raiser.rb:12:in `evaluate'

lib/mocha/return_values.rb:20:in `next'

lib/mocha/expectation.rb:472:in `invoke'

lib/mocha/mock.rb:157:in `method_missing'

lib/mocha/class_method.rb:46:in `open'

./lib/todo/task.rb:4:in `new_task'

./test/tc_task.rb:45:in `test_error'

./test/tc_task.rb:44:in `test_error'

mocha/integration/test_unit/ruby_version_186_and_above.rb:22:in `__send__'

mocha/integration/test_unit/ruby_version_186_and_above.rb:22:in `run'

---------------

3 tests, 4 assertions, 1 failures, 0 errors

rake aborted!

We get a big, nasty backtrace, and we see that instead of getting a RuntimeError, we got an Errno::EPERM. This isn’t surprising, since our test forced that to happen. What’s missing here is the code to translate that exception into a RuntimeError. We’ll fix it by catching SystemCallError (which is the superclass of all Errno::-style errors) and throwing a RuntimeError with a more helpful message.

tolerate_gracefully/todo_unit_tests/lib/todo/todo.rb

def new_task(filename,task_names)

File.open(filename,'a+') do |todo_file|

tasks = 0

task_names.each do |task|

todo_file.puts [task,Time.now].join(',')

tasks += 1

end

if tasks == 0

raise "You must provide tasks on the command-line or standard input"

end

end

*

rescue SystemCallError => ex

*

raise RuntimeError,"Couldn't open #{filename} for appending: #{ex.message}"

end

Now, our test passes with flying colors:

$ rake test

Started

...

Finished in 0.00192 seconds.

3 tests, 5 assertions, 0 failures, 0 errors

We’ve covered all the paths through this method. To continue testing todo with unit tests, we’ll continue extracting code into testable units and writing tests. The ability to stub out methods is very powerful and enables us to get very good test coverage. This is one of the benefits of working with a dynamic language like Ruby.

8.3 A Word About Test-Driven Development

We’ve built our apps a bit backward from the accepted practice in the Ruby community. You should be writing your tests first, using them to drive the development of features. We didn’t do that; we started with code and added tests afterward. This was done purely to make it easier to learn concepts about command-line app development. We needed to know what to test before learning how to test. To be clear, we are not endorsing “test-last development.”

To write command-line apps using Test-Driven Development (TDD), you can apply the same principles we’ve learned here, but just start with the tests instead of the code. (See Kent Beck’s Book on TDD [Bec02].) The simplest thing to do is to start using Cucumber and Aruba to identify the user interface and user-facing features of your app. Write one scenario at a time, get that working, and move on to a new scenario. Repeat this process until you have the basic “happy paths” through your app working. Simulate a few error cases if you can, but at that point, you’ll want to turn your attention to unit tests, extracting your mostly working code into testable units and putting them through the ringer to iron out all the edge cases.

8.4 Moving On

We only scratched the surface of the art of software testing, but we went through a whirlwind tour of everything you’ll need to get started for testing command-line apps. We saw some real challenges with testing our apps, as well as several techniques to deal with them. By manipulating the environment, setting up test-specific infrastructure, and mocking system calls, we can simulate almost anything that might happen when our app runs.

Toward the end, when we learned about unit testing, we talked briefly about refactoring. Refactoring is difficult without tests, but with a good suite of tests, we can safely change the internal design of our code. We got a taste of that when we extracted our business logic out of bin/todo and put it into lib/todo/task.rb so we could unit test it. In the next chapter, we’ll learn some patterns and techniques for organizing our code so that it’s easy to maintain, test, and enhance.

Footnotes

[45]

Cucumber supports other human languages as well, from Arabic to Vietnamese.

[46]

https://github.com/cucumber/cucumber/wiki/Tags

[47]

http://mocha.rubyforge.org/