THE RUBY WAY, Third Edition (2015)

Chapter 14. Scripting and System Administration

Thus spake the master programmer: “Though a program be but three lines long, someday it will have to be maintained.”

—Geoffrey James, The Tao of Programming

Programmers often need to “glue” programs together with little scripts that talk to the operating system at a fairly high level and run external programs. This is especially true in the UNIX world, which daily relies on shell scripts for countless tasks.

For programmers already proficient in Ruby, it can be an extremely convenient way to create small, useful scripts. In fact, one of Matz’s original motivations when creating Ruby, inspired by Perl, was to make it easier to create small scripts to accomplish useful tasks.

In many cases, you might just as well use one of the more traditional languages for this purpose. The advantage that Ruby has, of course, is that it really is a general-purpose, full-featured language and one that’s truly object oriented. Because some people want to use Ruby to talk to the OS at this level, we present here a few tricks that might prove useful.

Much of what could be covered in this chapter is actually dealt with in other chapters entirely. Refer in particular to Chapter 10, “I/O and Data Storage,” which covers file I/O and attributes of files; these features are frequently used in scripts of the kind discussed in the present chapter.

14.1 Running External Programs

A language can’t be a glue language unless it can run external programs. Ruby offers more than one way to do this.

I can’t resist mentioning here that if you are going to run an external program, make sure that you know what that program does. I’m thinking about viruses and other potentially destructive programs. Don’t just run any old command string, especially if it came from a source outside the program. This is true regardless of whether the application is web based.

14.1.1 Using system and exec

The system method (in Kernel) is equivalent to the C call of the same name. It will execute the given command in a subshell:

system("date")
# Output goes to stdout as usual...

Note that additional parameters, if present, will be used as a list of arguments; in most cases, the arguments can also be specified as part of the command string with the same effect. The only difference is that filename expansion is done on the first string but not on the others:

system("rm", "/tmp/file1")
system("rm /tmp/file2")
# Both the above work fine.

# However, below, there's a difference...
system("echo *") # Print list of all files
system("echo", "*") # Print an asterisk (no filename
# expansion done)

# More complex command lines also work.
system("ls -l | head -n 1")

Note that if you want to capture the output (for example, in a variable), system isn’t the right way. See the next section.

I’ll also mention exec here. The exec method behaves much the same as system, except that the new process actually replaces the current one. Thus, any code following the exec won’t be executed:

puts "Here's a directory listing:"
exec("ls", "-l")

puts "This line is never reached!"

14.1.2 Capturing Command Output

The simplest way to capture command output is to use the backtick (also called backquote or grave accent) to delimit the command. Here are a couple of examples:

listing = 'ls -l' # Multiple lines in one string
now = 'date' # "Mon Mar 12 16:50:11 CST 2001"

The generalized delimiter %x calls the backquote operator (which is really a kernel method). It works essentially the same way:

listing = %x(ls -l)
now = %x(date)

The %x form is often useful when the string to be executed contains characters such as single and double quotes.

Because the backquote method really is (in some sense) a method, it is possible to override it. Here, we change the functionality so that we return an array of lines rather than a single string (of course, we have to save an alias to the old method so that we can call it):

alias old_execute '

def '(cmd)
out = old_execute(cmd) # Call the old backtick method
out.split("\n") # Return an array of strings!
end

entries = 'ls -l /tmp'
num = entries.size # 95

first3lines = %x(ls -l | head -n 3)
how_many = first3lines.size # 3

Note that, as shown here, the functionality of %x is affected when we perform this redefinition.

In the following example, we append a “shellism” to the end of the command to ensure that standard error is mixed with standard output:

alias old_execute '

def '(cmd)
old_execute(cmd + " 2>&1")
end

entries = 'ls -l /tmp/foobar'
# "/tmp/foobar: No such file or directory\n"

Of course, most of the time it is more useful to create a method of our own that implements desired behaviors rather than overwrite the backquote method itself.

14.1.3 Manipulating Processes

We discuss process manipulation in this section, even though a new process might not involve calling an external program. The principal way to create a new process is the fork method, which takes its name from the UNIX tradition’s idea of a fork in the path of execution, like a fork in the road. (Note, however, that Ruby does not support the fork method on Windows platforms.)

The fork method in Kernel (also found in the Process module) shouldn’t, of course, be confused with the Thread instance method of the same name.

There are two ways to invoke the fork method. The first is the more UNIX-like way: Simply call it and test its return value. If that value is nil, we are in the child process; otherwise, we execute the parent code. The value returned to the parent is actually the process ID (or pid) of the child:

pid = fork
if (pid == nil)
puts "Ah, I must be the child."
puts "I guess I'll speak as a child."
else
puts "I'm the parent."
puts "Time to put away childish things."
end

In this unrealistic example, the output might be interleaved, or the parent’s output might appear first. For the purposes of this example, it’s irrelevant.

We should also note that the child process might outlive the parent. We’ve seen that this isn’t the case with Ruby threads, but system-level processes are entirely different.

The second form of fork takes a block. The code in the block comprises the child process. The previous example could thus be rewritten in this simpler way:

fork do
puts "Ah, I must be the child."
puts "I guess I'll speak as a child."
end

puts "I'm the parent."
puts "Time to put away childish things."

The pid is still returned, of course. We just don’t show it in the example.

When we want to wait for a process to finish, we can call the wait method in the Process module. It waits for any child to exit and returns the process ID of that child. The wait2 method behaves similarly except that it returns a two-value array consisting of the pid and aProcess::Status object with the pid and exit status code:

pid1 = fork { sleep 2; exit 3 }
pid2 = fork { sleep 1; exit 3 }

pid2_again = Process.wait # Returns pid2
pid1_and_status = Process.wait2 # Returns [pid1, #<Process::Status exit 3>]

To wait for a specific child, use waitpid and waitpid2, respectively:

pid3 = fork { sleep 2; exit 3 }
pid4 = fork { sleep 1; exit 3 }

sleep 3 # Give the child processes time to finish

pid4_again = Process.waitpid(pid4, Process::WNOHANG)
pid3_array = Process.waitpid2(pid3, Process::WNOHANG)
# pid3_array is now [pid3, #<Process::Status exit 3>]

If the second parameter is unspecified, the call might block (if no such child exists). It might be ORed logically with Process::WUNTRACED to catch child processes that have been stopped. This second parameter is rather OS sensitive; experiment before relying on its behavior.

The exit method can be passed true, false, or an integer. The UNIX standard is to exit with status 0 for success, and 1 or greater for failure. Thus, passing true will exit with a status of 0, and false will exit with a status of 1. Passing any integer will exit with a status of that value.

The exit! method exits immediately from a process (bypassing any exit handlers). Any given integer will be used as the exit code, but the default is 1 (not 0):

pid1 = fork { exit! } # Return 1 exit code
pid2 = fork { exit! 0 } # Return 0 exit code

The pid and ppid methods will return the process ID of the current process and the parent process, respectively:

proc1 = Process.pid
fork do
if Process.ppid == proc1
puts "proc1 is my parent" # Prints this message
else
puts "What's going on?"
end
end

The kill method can be used to send a UNIX-style signal to a process. The first parameter can be an integer, a POSIX signal name including the SIG prefix, or a non-prefixed signal name. The second parameter represents a pid; if it is zero, it refers to the current process:

Process.kill(1, pid1) # Send signal 1 to process pid1
Process.kill("HUP", pid2) # Send SIGHUP to pid2
Process.kill("SIGHUP", pid2) # Send SIGHUP to pid3
Process.kill("SIGHUP", 0) # Send SIGHUP to self

The Kernel.trap method can be used to handle such signals. It typically takes a signal number or name and a block to be executed:

trap(1) do
puts "OUCH!"
puts "Caught signal 1"
end

Process.kill(1,0) # Send to self

The trap method can be used to allow complex control of your process. For more information, consult Ruby and UNIX references on process signals.

The Process module also has methods for examining and setting such attributes as userid, effective userid, priority, and others. Consult any Ruby reference for details.

14.1.4 Manipulating Standard Input and Output

You saw how IO.popen and IO.pipe work in Chapter 10, but there is a little library we haven’t looked at that can prove handy at times.

The Open3 library contains a method called popen3 that will return an array of three IO objects. These objects correspond to the standard input, standard output, and standard error for the process kicked off by the popen3 call. Here’s an example:

require "open3"

filenames = %w[ file1 file2 this that another one_more ]
output, errout = [], []

Open3.popen3("xargs", "ls", "-l") do |inp, out, err|
filenames.each { |f| inp.puts f } # Write to the process's stdin
inp.close # Close is necessary!

output = out.readlines # Read from its stdout
errout = err.readlines # Also read from its stderr
end

puts "Sent #{filenames.size} lines of input."
puts "Got back #{output.size} lines from stdout"
puts "and #{errout.size} lines from stderr."

This contrived example does an ls -l on each of the specified filenames and captures the standard output and standard error separately. Note that closing the input is needed so that the subprocess will be aware that the input is complete. Also note that Open3 uses fork, which doesn’t exist on Windows; on that platform, you will have to use the win32-open3 library (written and maintained by Daniel Berger and Park Heesob).

See also Section 14.3, “The Shell Library.”

14.2 Command-Line Options and Arguments

Rumors of the death of the command line are greatly exaggerated. Although we live in the age of the GUI, every day thousands of us use older text-based interfaces for one reason or another.

Ruby has many of its roots in UNIX, as we’ve said. Yet even in the Windows world, there is such a thing as a command line, and, frankly, we don’t see it going away any time soon.

When operating at this level, you use parameters and switches to communicate with the program at the time of its invocation. This section shows how to deal with these parameters (or arguments) and switches (or options).

14.2.1 Working with ARGV

The global constant ARGV represents the list of arguments passed to the Ruby program via the command line. This is essentially just an array:

n = ARGV.size
argstr = %{"#{ARGV * ', '}"}
puts "I was given #{n} arguments..."
puts "They are: #{argstr}"
puts "Note that ARGV[0] = #{ARGV[0]}"

Assume that we invoke this program with the argument string red green blue on the command line. It then produces this output:

I was given 3 arguments...
They are: "red, green, blue"
Note that ARGV[0] = red

Where ARGV in some languages would also supply a count of arguments, there is no need for that in Ruby because that information is part of the array.

Another thing that might trip up old-timers is the assignment of the zeroth argument to an actual argument (rather than, for example, the script name). The arguments themselves are zero-based rather than one-based as in C and the various shell languages.

14.2.2 Working with ARGF

The special global constant ARGF represents the pseudo-file resulting from a concatenation of every file named on the command line. It behaves like an IO object in most ways.

When you have a “bare” input method (without a receiver), you are typically using a method mixed in from the Kernel module. (Examples are gets and readlines.) The actual source of input will default to STDIN if no files are on the command line. If there are files, however, input will be taken from them. End of file will of course be reached only at the end of the last file.

If you prefer, you can access ARGF explicitly using the following fragment:

# Copy named files to stdout, just like 'cat'
puts ARGF.readlines

Perhaps contrary to expectations, end of file is set after each file. The previous code fragment will output all the files. This one will output only the first:

puts ARGF.gets until ARGF.eof?

Whether this is a bug or a feature, we will leave it to you to decide. Of course, other unexpected surprises might actually be pleasant. The input isn’t simply a stream of bytes flowing through our program; we can actually perform operations such as seek and rewind on ARGF as though it were a “real file.”

There is also a file method associated with ARGF; it returns an IO object corresponding to the file currently being processed. As such, the value it returns will change as the files on the command line are processed in sequence.

What if we don’t want command-line arguments to be interpreted as files? The solution is to not use the “bare” (receiverless) call of the input methods. If you want to read standard input, call methods on STDIN, and all will work as expected.

14.2.3 Parsing Command-Line Options

Ruby has had a long and varied history of command-line parsing libraries, and today it provides the OptionParser library. Essentially, OptionParser gives you a simple domain-specific language (DSL) for describing just how you want the arguments to your program parsed.

This is probably best explained with an example. Suppose we have a tool with these options: -h or —help will print help information; -f or —file will specify a filename argument; -l or —lines will truncate the output after the specified number of lines (defaulting to 100). We could begin in this way:

require 'optparse'

args = {lines: 100}

OptionParser.new do |opts|
opts.banner = "Usage: tool [options] COMMAND"

opts.on("-f", "—file FILE") do |file|
args[:file] = file
end

opts.on("-l", "—lines [LINES]", Integer,
"Number of lines to output (default 100)"
) do |lines|
args[:lines] = lines
end

opts.on_tail("-h", "—help", "Show this help") do
puts opts
exit
end
end.parse!

p args
p ARGV.first

With this code saved into a file named tool.rb, running it produces the output that one would (hopefully) expect:

$ ruby tool.rb -h
Usage: tool [options] COMMAND
-f, —file FILE
-l, —lines [LINES] Number of lines to output (default 100)
-h, —help Show this help

$ ruby tool.rb —file book.txt
{:lines=>100, :file=>"book.txt"}
[]

$ ruby tool.rb -f book.txt —lines 10 print
{:lines=>10, :file=>"book.txt"}
["print"]

As you can see from the example, the work involved in using OptionParser is mostly about building a new instance, and most of that goes into calls to the on method. The idea is that each call to on describes one option that the parser will recognize.

You need to supply two key bits to the on method: First, you need the name of the option, which can be either short (as in -f) or long (as in —file). In the second string, surrounding the name of the argument with square brackets indicates to OptionParser that the option is, well, optional, and doesn’t need to be given.

As you can see in the lines parameter, it is possible to give both a type and a description of a particular option. The string from the command-line argument will be converted into the given type, and the description will be printed as part of the help message.

Second, you need to provide a block of code, which will be executed any time OptionParser sees that option. The code block can either do something directly, as the -h option does in the example, or simply save some data for later, as the —file and —lines blocks do.

When the parse! method is finally called, the parser examines the contents of ARGV, removes any entries that it recognizes as options, and runs the blocks for those options. Afterwards, the args hash contains the options that were passed, and the ARGV array contains only arguments that were not part of an option.

Many gems are available that provide various approaches to parsing command-line flags and arguments. You may find highline, slop, cocaine, thor, or some other library more suitable for your use case.

14.3 The Shell Library

Ruby isn’t necessarily convenient to use as a scripting language in every situation. For example, a bash script can execute external programs simply by naming them, with no extraneous syntax.

The power and flexibility of Ruby has given it a more complex syntax than the average shell language. Additionally, its functionality is segmented into different classes, modules, and libraries.

This situation motivated the creation of the Shell library. This library makes it easier to do things such as connecting commands with pipes and redirecting output to files. It also consolidates functionality from several different sources so that they are transparently accessible from aShell object. (It doesn’t always work well on Windows.)

14.3.1 Using Shell for I/O Redirection

The Shell class has two methods—new and cd—for instantiating a new object. The former creates a shell object associated with the current directory; the latter creates a shell object whose working directory will be the one specified:

require "shell"

sh1 = Shell.new # Work in the current directory
sh2 = Shell.cd("/tmp/hal") # Work in /tmp/hal

The Shell library defines a few built-in commands as methods, such as echo, cat, and tee. These always return objects of class Filter (as do the user-defined commands that we’ll look at shortly).

The nice thing about a Filter is that it understands I/O redirection. The methods (or operators) <, >, and | are defined so that they behave more or less as we expect after using them in shell scripts.

If a redirection method has a string as a parameter, that string is taken to be the name of a file. If it has an IO object as a parameter, that object is used for the input or output operation. Here are some small examples:

sh = Shell.new

# Print the readme.txt file to stdout
sh.cat("readme.txt") > STDOUT

# Print it again
(sh.cat < "readme.txt") > STDOUT
(sh.echo "This is a test") > "myfile.txt"

# Cat two files to stdout, tee-ing to a third
(sh.cat "myfile.txt", "readme.txt") | (sh.tee "file3.txt") > STDOUT

Note that the > operator binds tightly. The parentheses that you see in the preceding code are necessary in most cases. Here are two correct usages and one incorrect one:

# Ruby parser understands this...
sh.cat("readme.txt") > STDOUT

# ...and this also.
(sh.cat "readme.txt") > STDOUT

# But not this: TypeError! (a precedence problem)
sh.cat "readme.txt" > STDOUT

Note that it’s also possible to add system commands of your own choosing. The method def_system_command will accomplish this. For example, here we define two methods—ls and ll—which will list files in the current directory (short and long listings, respectively):

# Method name is identical to command...
# only one parameter necessary
Shell.def_system_command "ls"

# Two parameters needed here
Shell.def_system_command "ll", "ls -l"

sh = Shell.new
sh.ls > STDOUT # Short listing
sh.ll > STDOUT # Long listing

You will notice that in many cases, we explicitly send output to STDOUT. This is because output from a Shell command doesn’t automatically go anywhere. It’s simply associated with the Filter object until that object is connected to a file or an IO object.

14.3.2 Other Notes on Shell

The transact method will execute a block in the context of the shell instance. Therefore, we can use the following shorthand:

sh = Shell.new
sh.transact do
echo("A line of data") > "somefile.txt"
cat("somefile.txt", "otherfile.txt") > "thirdfile"
cat("thirdfile") | tee("file4") > STDOUT
end

The iterator foreach will take either a file or a directory as a parameter. If it is a file, it will iterate over the lines of that file; if it is a directory, it will iterate over the filenames in that directory:

sh = Shell.new

# List all lines in /tmp/foo
sh.foreach("/tmp/foo") {|l| puts l }

# List all files in /tmp
sh.foreach("/tmp") {|f| puts f }

The pushdir and popdir methods will save and restore the current directory, respectively. Aliases are pushd and popd. The method pwd will determine the current working directory; aliases are getwd, cwd, and dir:

sh = Shell.cd "/home"

puts sh.pwd # /home
sh.pushd "/tmp"
puts sh.pwd # /tmp

sh.popd
puts sh.pwd # /home

For convenience, numerous methods are imported into Shell from various sources, including File and FileUtils. This saves the trouble of doing requires, includes, creating objects, qualifying method calls, and so on:

sh = Shell.new
flag1 = sh.exist? "myfile" # Test file existence
sh.delete "somefile" # Delete a file

There are other features of the Shell library that we don’t cover here. See the class documentation for more details.

14.4 Accessing Environment Variables

Occasionally we need to access environment variables as a link between our program and the outer world. An environment variable is essentially a label referring to a piece of text (typically a small piece); environment variables can be used to store configuration information such as paths, usernames, and so on.

The notion of an environment variable is common in the UNIX world. The Windows world has borrowed it from UNIX (by way of MS-DOS), so the code we show here should run on variants of both Windows and UNIX.

14.4.1 Getting and Setting Environment Variables

The global constant ENV can be used as a hash for the purposes of retrieving and assigning values. In the following code, we retrieve the value of an environment variable. (You would use a semicolon rather than a colon on Windows.)

mypath = ENV["PATH"]
# Let's get an array now...
dirs = mypath.split(":")

Here’s an example of setting a variable. We take the trouble to fork another process to illustrate two facts. First, a child process inherits the environment variables that its parent knows. Second, an environment variable set by a child is not propagated back up to the parent:

ENV["alpha"] = "123"
ENV["beta"] = "456"
puts "Parent: alpha = #{ENV['alpha']}"
puts "Parent: beta = #{ENV['beta']}"

fork do # Child code...
x = ENV["alpha"]
ENV["beta"] = "789"
y = ENV["beta"]
puts " Child: alpha = #{x}"
puts " Child: beta = #{y}"
end

Process.wait
a = ENV["alpha"]
b = ENV["beta"]
puts "Parent: alpha = #{a}"
puts "Parent: beta = #{b}"

Here is the output:

Parent: alpha = 123
Parent: beta = 456
Child: alpha = 123
Child: beta = 789
Parent: alpha = 123
Parent: beta = 456

There is a consequence of the fact that parent processes don’t know about their children’s variables. Because a Ruby program is typically run in a subshell, any variables changed during execution will not be reflected in the current shell after execution has terminated.

14.4.2 Storing Environment Variables as an Array or Hash

It’s important to realize that ENV isn’t really a hash; it just looks like one. For example, we can’t call the invert method on it; it gives us a NameError because there is no such method. The reason for this implementation is the close tie between the ENV object and the underlying operating system; setting a value has an actual impact on the OS, a behavior that a mere hash can’t mimic.

However, we can call the to_hash method to give us a real live hash:

envhash = ENV.to_hash
val2var = envhash.invert

Of course, after we have a hash, we can convert it to any other form we prefer (for example, an array):

envarr = ENV.to_hash.to_a

It’s not possible to directly reassign a hash to ENV, but we can fake it easily if we need to:

envhash = ENV.to_hash
# Manipulate as needed... then assign back.
envhash.each {|k,v| ENV[k] = v }

14.5 Working with Files, Directories, and Trees

A broad area of everyday scripting is to work with files and directories, including entire subtrees of files. We mentioned some ways to work with files in Chapter 4, “Internationalization in Ruby,” so we will provide more depth here.

Because I/O is a fairly system-dependent thing, many tricks will vary from one operating system to another. When in doubt, check the documentation and always try experimenting.

14.5.1 A Few Words on Text Filters

Many tools we use every day (both vendor supplied and homegrown) are simply text filters—that is, they accept textual input, process or transform it in some way, and output it again. Classic examples of text filters in the UNIX world are sort and uniq, among others.

Sometimes a file is small enough to be read into memory. This allows processing that might otherwise be difficult:

lines = File.open(filename){|f| f.readlines }
# Manipulate as needed...
lines.each {|x| puts x }

Sometimes we’ll need to process it a line at a time:

File.open(filename) do |file|
file.each_line do |line|
# Manipulate as needed...
puts line
end
end

Finally, don’t forget that any filenames on the command line are automatically gathered into ARGF, representing a concatenation of all input (see Section 14.2.2, “Working with ARGF”). In this case, we can use calls such as ARGF.readlines just as if ARGF were an IO object. All output will go to standard output, as usual.

14.5.2 Copying a Directory Tree

Suppose that you want to copy an entire directory structure to a new location. There are various ways of performing this operation, but if the tree has internal symbolic links, it becomes more difficult.

Listing 14.1 shows a recursive solution with a little added user-friendliness. It is smart enough to check the most basic error conditions and also to print a usage message.

Listing 14.1 Copying a Directory Tree

require "fileutils"

def recurse(src, dst)
Dir.mkdir(dst)
Dir.foreach(src) do |e|
# Don't bother with . and ..
next if [".",".."].include? e
fullname = src + "/" + e
newname = fullname.sub(Regexp.new(Regexp.escape(src)), dst)
if File.directory?(fullname)
recurse(fullname, newname)
elsif File.symlink?(fullname)
linkname = 'ls -l #{fullname}'.sub(/.* -> /,"").chomp
newlink = linkname.dup
n = newlink.index($oldname)
next if n == nil
n2 = n + $oldname.length - 1
newlink[n..n2] = $newname
newlink.sub!(/\/\//,"/")
newlink = linkname.sub(Regexp.new(Regexp.escape(src)), dst)
File.symlink(newlink, newname)
elsif File.file?(fullname)
FileUtils.copy(fullname, newname)
else
puts "??? : #{fullname}"
end
end
end

# "Main"

if ARGV.size != 2
puts "Usage: copytree oldname newname"
exit
end

oldname = ARGV[0]
newname = ARGV[1]

if !File.directory?(oldname)
puts "Error: First parameter must be an existing directory."
exit
end

if File.exist?(newname)
puts "Error: #{newname} already exists."
exit
end

oldname = File.expand_path(oldname)
newname = File.expand_path(newname)

$oldname=oldname
$newname=newname

recurse(oldname, newname)

Whereas modern UNIX variants such as Mac OS X provide a cp -R command that will preserve symlinks, older UNIX variants did not. Listing 14.1 was written to address that need in a real-life situation.

14.5.3 Deleting Files by Age or Other Criteria

Imagine that you want to scan through a directory and delete the oldest files. This directory might be some kind of repository for temporary files, log files, browser cache files, or similar data.

Here, we present a little code fragment that will remove all the files older than a certain timestamp (passed in as a Time object):

def delete_older(dir, time)
Dir.chdir(dir) do
Dir.foreach(".") do |entry|
# We're not handling directories here
next if File.stat(entry).directory?
# Use the modification time
File.delete(entry) if File.mtime(entry) < time
end
end
end

delete_older("/tmp", Time.local(2014,1,1,0,0,0))

This is nice, but let’s generalize it. Let’s make a similar method called delete_if that takes a block that will evaluate to true or false. Let’s then delete the file only if it fits the given criteria:

def delete_if(dir)
Dir.chdir(dir) do
Dir.foreach(".") do |entry|
# We're not handling directories here
next if File.stat(entry).directory?
File.delete(entry) if yield entry
end
end
end

# Delete all files over 300 megabytes
delete_if("/tmp") { |f| File.size(f) > 300*1024*1024 }

# Delete all files with extensions LOG or BAK
delete_if("/tmp") { |f| f =~ /\.(log|bak)$/i }

14.5.4 Determining Free Space on a Disk

Suppose that you want to know how many gigabytes are free on a certain drive. The following code example is a crude way of doing this, by running a system utility:

def freespace(device=".")
lines = %x(df -k #{device}).split("\n")
n = (lines.last.split[3].to_f / 1024 / 1024).round(2)
end

puts freespace("/") # 48.7

On Windows, there is a somewhat more elegant solution (supplied by Daniel Berger):

require 'Win32API'

GetDiskFreeSpaceEx = Win32API.new('kernel32', 'GetDiskFreeSpaceEx',
'PPPP', 'I')

def freespace(dir=".")
total_bytes = [0].pack('Q')
total_free = [0].pack('Q')
GetDiskFreeSpaceEx.call(dir, 0, total_bytes, total_free)

total_bytes = total_bytes.unpack('Q').first
total_free = total_free.unpack('Q').first
end

puts freespace("C:") # 5340389376

14.6 Other Scripting Tasks

The remaining tasks did not have a section of their own, so we have grouped them together here in the uncreatively named “other” section.

14.6.1 Distributing Ruby Programs

There are occasions where you might want to distribute your Ruby program so that others can use it. If your intended recipients are running Mac OS X, you’re in luck, because Apple includes Ruby with Mac OS X (version 2.0.0 as of this writing).

For Windows users, rubyinstaller.org provides a Ruby installer package, also version 2.0.0. Any Linux user should be able to install Ruby via their distribution’s package manager.

Keep in mind, however, that these options limit you to older versions of Ruby (version 2.1.2 is the latest, and these options mostly only offer 2.0.0). To install Ruby for development work, refer to Section 21.6, “Ruby Version Managers,” in Chapter 21, “Ruby Development Tools.”

Besides asking end users to install their own copies of Ruby, there is a tool called Omnibus (by the creators of the Chef configuration management system) that allows building an entire self-contained package that includes Ruby itself, and any Rubygems that are needed as well.

Beyond Omnibus, other options are available on the Web. Each option has various tradeoffs, and I encourage you to investigate before deciding on one for yourself.

14.6.2 Piping into the Ruby Interpreter

Because the Ruby interpreter is a single-pass translator, it is possible to pipe code into it and have it executed. One conceivable purpose for this is to use Ruby for more complex tasks when you are required by circumstance to work in a traditional scripting language such as bash.

Listing 14.2, for example, is a bash script that uses Ruby (via a here-document) to calculate the elapsed time in seconds between two dates. The Ruby program prints a single value to standard output, which is then captured by the shell script.

Listing 14.2 bash Scripts Invoking Ruby

# Let bash find the difference in seconds
# between two dates using Ruby...

export time1="2007-04-02 15:56:12"
export time2="2007-12-08 12:03:19"

export time1="2007-04-02 15:56:17"

#cat <<EOF | ruby | read elapsed
cat <<EOF | ruby
require "time"

time1 = ENV["time1"]
time2 = ENV["time2"]

t1 = Time.parse(time1)
t2 = Time.parse(time2)

diff = t2 - t1
puts diff
EOF

echo "Elapsed seconds = " $elapsed

Note that the two input values in this case are passed as environment variables (which must be exported). The two lines that retrieve these values could also be coded in this way:

time1="$time1" # Embed the shell variable directly
time2="$time2" # into a string...

However, the difficulties are obvious. It could get very confusing whether a certain string represents a bash variable or a Ruby global variable, and there could be a host of problems with quoting and escaping.

It’s also possible to use a Ruby “one-liner” with the -e option. Here’s a little script that reverses a string using Ruby:

#!/usr/bin/bash
string="Francis Bacon"
reversed=$(ruby -e "puts '$string'.reverse")
echo $reversed # "nocaB sicnarF"

In fact, Ruby provides multiple options for one-liners. To automatically run the given code once per line of input, add the -n option. That allows us to reverse each line of input:

$ echo -e "Knowledge\nis\npower\n" | ruby -ne 'print $_.reverse'

egdelwonK
si
rewop

Simplifying things even further, the -p option acts like -n, but with an added print statement for each line after the code has run:

$ echo -e "France\nis\nBacon\n" | ruby -pe '$_.reverse! '

ecnarF
si
nocaB

UNIX geeks will note that awk has been used in a similar way since time immemorial.

14.6.3 Testing Whether a Program Is Running Interactively

A good way to determine whether a program is interactive is to test its standard input. The method tty? (historically, a “teletype”) will tell us whether the device is an interactive one as opposed to a disk file or socket (though this is not available on Windows):

if STDIN.tty?
puts "Hi! Looks like you're typing at me."
else
puts "Input is not from a keyboard."
end

14.6.4 Determining the Current Platform or Operating System

If a program wants to know what operating system it’s running on, it can use the RbConfig::CONFIG hash to retrieve the 'host_os'. This will return a semi-cryptic string (like darwin13.3.0, cygwin, or solaris2) indicating the operating system where this copy of Ruby was built.

Ruby runs primarily on Mac OS X (Darwin), UNIX variants like Linux and Solaris, or Windows (whether XP, Vista, 7, or 8). As a result, it is possible to distinguish between platforms using a very simple regular expression:

Of course, this is only a clumsy way of determining OS-specific information. Even if you correctly determine the OS family, that might not always imply the availability (or absence) of any specific feature.

14.6.5 Using the Etc Module

The Etc module retrieves useful information from the /etc/passwd and /etc/group files (which, to be fair, is only useful in a UNIX environment).

The getlogin method will return the login name of the user. If it fails, getpwuid might work (taking an optional parameter, which is the uid):

require 'etc'

myself = Etc.getlogin # That's me!
root_name = Etc.getpwuid(0).name # Root's name

# Without a parameter, getpwuid calls
# getuid internally...
me2 = Etc.getpwuid.name # Me again!

The getpwnam method returns a passwd struct, which contains relevant entries such as name, dir (home directory), shell (login shell), and others:

rootshell = Etc.getpwnam("root").shell # /sbin/sh

At the group level, getgrgid and getgrnam behave similarly. They will return a group struct consisting of group name, group passwd, and so on.

The iterator passwd will iterate over all entries in the /etc/passwd file. Each entry passed into the block is a passwd struct:

require 'etc'

all_users = []
Etc.passwd { |entry| all_users << entry.name }

There is an analogous iterator group for group entries.

14.7 Conclusion

That ends our discussion of Ruby scripting for everyday automation tasks. We’ve seen how to get information in and out of a program by way of environment variables and standard I/O. We’ve seen how to perform many common “glue” operations to get other pieces of software to talk to each other. We’ve also looked at how to interact with the operating system at various levels.

Because much of this material is operating system dependent, I urge you to experiment on your own. There are differences between Mac OS X, Linux, and Windows, and there are even differences in behavior depending on the particular version involved.

Our next topic is a similarly broad one. We’ll look at using Ruby to process various kinds of data formats, from image files to XML.