C and C++ Programming Tools - Tools and Programming - UNIX: The Complete Reference (2007)

UNIX: The Complete Reference (2007)

Part V: Tools and Programming

24: C and C++ Programming Tools

Overview

This chapter describes the tools that C and C++ programmers need to develop software under UNIX. It assumes that you already know either C or C++ but need to learn the tools for compiling, debugging, and project management. Unlike C and C++ development under other operating systems, UNIX software development typically involves using several different command-line programs. Learning the syntax and arguments for these commands can be intimidating at first, but they have become the standard because they are highly configurable, are quick and efficient to execute, and have benefited from years of open-source support. Once you have mastered the command-line tools, you will be able to use the knowledge on any UNIX system, across many platforms. Even if you decide to use a custom IDE, it will almost certainly use some of the command-line tools behind the scenes. If you know how they work, you can take advantage of this to configure the command-line tools within your IDE. This chapter shows you how to

§ Obtain C/C++ development tools

§ Compile C and C++ programs with gcc

§ Manage compiling large projects with make

§ Debug your programs with gdb

§ Manage your source files with cvs

§ Write your own man pages

Obtaining C/C++ Development Tools

The three main tools that you need to develop C or C++ software under UNIX are gcc, make, and gdb. gcc is a collection of compilers, make is a tool for handling dependencies in large projects, and gdb is a debugger. All three are open source and are distributed from the Free Software Foundation under the GNU public license. The GNU tools are widely used and have an active community constantly improving and fixing them.

You can download and install gcc, gdb, and make from http://www.gnu.org/ or http://prep.ai.mit.edu/. Most Linux distributions come with these tools as part of their standard installation. On other UNIX systems, you might have to download and install them yourself.

While this chapter focuses on the three GNU tools, there are many other development tools available for UNIX, such as the compiler cc. You could, for example, substitute cc for gcc, as much of the command-line syntax is the same.

The gcc Compiler

gcc is the “GNU Compiler Collection.” gcc started out as a C compiler but now supports languages such as C++, Java, Fortran, and Ada as well. gcc runs C and C++ source code through a preprocessor, syntactic analyzer, compiler, optimizer, assembler, and linker to generate an executable that you can run on your machine.

Compiling C Programs Using gcc

We’ll start off with an example of how to use gcc to compile a simple C program. Using your favorite text editor, create a file called hello.c with the following contents:

#include <stdio.h>

int main()

{

printf ("Hello, world.\n");

return 0;

}

You can then create an executable program with the gcc command by typing

$ gcc hello.c

This will compile hello.c as a C file and will link in the standard C libraries to create an executable called a.out. You can run the program by typing

$ ./a.out

Hello, world.

The name a.out (for assembler output) is a historical convention. You can specify a name with the -o option. The command

$ gcc -o hello hello.c

will create an executable program called hello.

Most programs have code spread over many source files. gcc can compile multiple files at once. The command

$ gcc -o hello hello.c file2.c file3.c

will compile the three source files (hello.c, file2.c, and file3.c) and produce an executable called hello.

You can call gcc with the -c argument to compile source files into object files. An object file is an intermediate file format that stores a compiled form of your source. gcc’s linker can then combine these object files together with source files to form your executable. Using object files allows you to compile your program in stages, via multiple calls to gcc. These stages allow you to recompile only those files that you have changed. In large projects, not having to recompiling everything can save you a great deal of time. However, if you modify a source file and forget to update the corresponding object file, the code changes will not take effect. make, which is described later in this chapter, can help you with this problem.

The command

$ gcc -c file2.c file3.c

will generate two binary files called file2.o and file3.o. You can now generate the same hello executable using these object files instead of the source files. The gcc command

$ gcc -o hello hello.c file2.o file3.o

will compile hello.c, link it along with the object files file2.o and file3.o, and generate an executable called hello.

Compiling C++ Programs Using gcc and g++

gcc will compile files with the extensions .C, .cc, .cpp, .c++, or .cxx as C++ files. However, if you create a file called hello.cpp with the contents

#include <iostream>

int main()

{

std::cout << "Hello, world.\n";

return 0;

}

and run the command

$ gcc -o helloc++ hello.cpp

then you will get a series of link errors. This is because gcc doesn’t link against the standard C++ library by default. You can tell gcc that you would like to link against the C++ library by including the argument -lstdc++. You will be able to successfully build the program using the command

$ gcc -o helloc++ -lstdc++ hello.cpp

Running helloc++ will give you the expected output:

$ ./helloc++

Hello, world.

Though this will work for most C++ programs, the GCC installation also has a C++-specific compiler called g++. You could also generate helloc++ with the command

$ g++ -o helloc++ hello.cpp

While you can use either gcc or g++ to compile C++ programs, g++ is recommended because it sets the default environment to C++ and automatically links against the standard C++ library.

Useful gcc Options

gcc and g++ support over a thousand command-line options. You can customize your settings for the programming language, warnings, debug information, code generation and optimization, preprocessor, assembler, linker, directories, target binary, and target machine. There is a list of the options in the GNU online documentation at http://gcc.gnu.org/onlinedocs/gcc/Option-Index.html, and in the gcc man page. You can also get an abbreviated list with the command

$ gcc --help

Table 24–1 lists some of the most useful command line options. The -Wall command line option is strongly recommended. By default, gcc will suppress compiler warnings.

Table 24–1: gcc Command Line Options

Option

Description

-c

Compile/assemble source files, but do not link. This generates object files

-D

#define a macro with a default value of 1.

-E

Stop after preprocessing, do not compile. The output goes to standard output.

-g

Generate debug information.

-l

Include a directory to the header file search path.

-l

Include a library when linking.

-L

Add a directory to the library search path.

-o

Specify output file-name, regardless of the type of output.

−0

Generate optimized code. There are several levels, from the default −00 to −03.

-S

Stop after compiling but do not assemble. This generates assembly files.

-v

Shows verbose information on standard error.

-Wall

Generate all compile warnings.

Creating and Including Libraries

All UNIX platforms support statically linked libraries, and most support dynamically linked libraries. When you generate a program, the linker copies the code out of static libraries and puts them into your program’s binary The contents of dynamically linked libraries, on the other hand, are loaded when your binary is run.

Statically Linked Libraries

Statically linked libraries are also called archives (.a). Most of the libraries that you generate for your programs will be static libraries. The ar command allows you to group multiple object files (.o) into an archive. For example, you can generate an archive called libRoutine.a that contains two object files called routine1.o and routine2.o with the command

$ ar crs libRoutine.a routine1.o routine2.o

The c parameter tells ar to create the archive if it doesn’t exist, the r parameter tells ar to replace existing members, and the s parameter tells ar to regenerate the symbol table. (On some systems you may have to use ranlib to generate the symbols.)

Once you create an archive, you can link the library on the gcc or g++ command line in the same way that you would include a object file (.o). For example, you could use the command

$ g++ -Wall -o BinaryName main.o libRoutine.a

More often, however, you will include libraries from other directories using the -L and -l command-line options. In order for -l to work, the library name must start with the prefix lib. If you had libRoutine.a in a subdirectory called routineDir, you could replace the preceding example with

$ g++ -Wall -o BinaryName main.o -LroutineDir -IRoutine

Dynamically Linked Libraries

Dynamically linked libraries are also called shared objects (.so). gcc will by default build your application using dynamically linked libraries for the standard C/C++ libraries, since they can be shared by many processes. This is recommended because it saves considerable disk space and memory However, you can override the use of shared libraries in gcc with the -static command-line argument. You can view an executable’s shared library dependencies using the Idd command followed by the executable name.

In order to generate a dynamically linked library, you must first compile your object files (.o) with the -fPIC flag. This tells the compiler to generate position-independent code for dynamic linking. You can then generate the shared objects (.so) with the -shared option. For example,

$ gcc -shared -fPIC -Wall -o libRoutine.so routine1.o routine2.o

will generate a shared library called libRoutine.so. You then generate an executable in the same manner as you would for a static library by using the (.so) shared object on the command line, or using the -l and -L compile options.

When you run the executable, you will likely see

$ ./BinaryName

./BinaryName: error while loading shared libraries: libRoutine.so: cannot

open shared object file: No such file or directory

This is because the system doesn’t know where to find the shared library. You have to add the directory that contains your shared object to the LD_LIBRARY_PATH environment variable. To add /path/to/so, you would run the following command in csh or tcsh:

% setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/path/to/so

If you are running ksh or bash, then the commands would be

$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/so

Inside gcc

As we mentioned earlier, the gcc command runs the source code through a preprocessor, syntactic analyzer, compiler, optimizer, assembler, and linker to generate your program. This section describes each of these steps in a little more detail.

Preprocessor

The preprocessor strips out comments, reads in files specified in #include directives, and substitutes and evaluates #define and -D macros. cpp is a stand-alone preprocessor command that you can run on your source files. It should give you the same output as running gcc with the -E argument. While gcc and cpp share the same preprocessor source code, currently gcc doesn’t directly invoke cpp.

Syntactic Analyzer

The syntactic analyzer parses and evaluates the source code. Early UNIX C compilers weren’t nearly as robust as gcc is today They missed function type mismatches and didn’t provide warnings. In order to check their source code, developers often had to run a separate command called lint before compiling. Because gcc provides much of the same functionality, lint is now largely obsolete.

By default, the compiler mainly displays errors. To turn on compiler warnings, use the -Wall command-line option. Alternatively, specific warnings can be enabled individually gcc has support for some warnings that are not included in -Wall. These warnings are excluded because they may flag valid code.

Compiler

gcc next compiles the syntactic analyzer’s output into assembly language. You can make gcc generate assembly language files and exit with the -S option. Assembly language files have the .s suffix. This is not likely to be useful unless you need to port your code to a hardware device that isn’t supported by gcc.

Optimizer

gcc supports four levels of optimization. By default, it runs with no optimization, which is equivalent to -O0. When given the -O or -O1 option, gcc invokes an optimizer that makes the resulting executable run faster and more efficiently -O2 and -O3 provide higher levels of optimization. -Os will optimize for size. There are also individual command-line arguments that allow you to pick and choose your optimizations. Turning on optimization may make debugging more difficult.

Assembler

The next step is for gcc to convert the assembly language into machine language by calling as, the assembler. The result of the as step is an object file whose filename is based on the original source file, but with the .o suffix. For example, the source file hello.c results in an object file hello.o. The -c argument can be used to make gcc stop after this step.

Linker

gcc completes the final step by using the linker, Id. The linker, which can also be invoked separately, creates a single executable that combines the code from all the object files (.o) and libraries (.a). It edits the object code, replacing external function references with their actual addresses in the executable program. If you find that you’re having trouble tracking down link errors, you can view the symbols in an object file or library by using the nm command.

Makefiles

make is a tool that helps you compile and track dependencies for projects with many source files. To use make, you must specify your program’s dependencies and compile options in a file. By default, makelooks for a file called makefile or Makefile in the current directory (You can use a different filename by passing make the -f argument along with the filename.)

make allows you to compile your project with a single short command, and saves you time by recompiling only the source files that have changed since your last compile. make will also recompile any files that you told it depend on changed files. make determines whether or not a file should be recompiled by comparing a file’s modification date with the modification date of the files on which it depends. While there are multiple versions of the make utility, this section focuses on GNU make.

A Short Makefile

Here is an example of a makefile for a program containing a single C file and a single header:

# makefile for a single C file and a single header

SOURCES=hello.c

INCLUDE=include/hello.h

PRODUCT=$(HOME)/bin/hello

CC=gcc

CFLAGS=-g -o

# Running the command "make" will use this rule

$ (PRODUCT): $ (SOURCES) $ (INCLUDE)

$(CC) $ (CFLAGS) -o $ (PRODUCT) $ (SOURCES)

# Running the command "make clean" will use this rule

.PHONY: clean

clean:

rm -f *.o $ (PRODUCT)

Running make on this makefile will use gcc with debugging symbols and optimization to compile the file hello.c and create a binary called ~/bin/hello.

If you save this file as makefile in the same directory as a file called hello.c, place a header file called hello.h in the subdirectory include, and run the command make,

$ make

gcc -g -o -o /home/nate/bin/hello hello.c

it will display and execute a line like the one shown and generate ~/bin/hello. If you run the make command a second time, then you will get a response like

$ make

make: '/home/nate/bin/hello' is up to date.

because your executable will be up to date. If you execute the command

$ make clean

rm -f *.o /home/nate/bin/hello

then it will delete any object files in this directory as well as the generated program file.

While generating a makefile is a lot more work than just typing in a command to compile hello.c, the time investment really pays off as your project grows. You can also reuse your makefiles for new projects, just by changing the file names.

Makefile Syntax

Makefile syntax is similar to that of shell scripts. Makefiles can contain comments, variables, dependencies, and commands.

Comments

You can insert comments into a makefile with a # (pound sign). Everything on the line after the # is ignored by make.

Variables

The make program allows you to define named variables similar to those used in the shell. For example, if you define SOURCES=hello.c, the value of that variable, $ (SOURCES), is hello.c.

You can also do pattern replacement in variable assignments. The assignment

OBJECTS=${SOURCES:.cpp=.o}

will take a list of files from the variable SOURCES and will assign it to OBJECTS with the .cpp extensions replaced with .o.

The make program has some built-in knowledge about program development. You can get a listing of the built-in rules and variable values by running make -p. make knows that files ending in a .c suffix should be built as C source files, those ending in .cpp, .cc, or .C are C++ source files, those ending in .o are object files, and those ending in .a are assembler files. Although make allows you to choose your own variable names, it will use default values for variables such as CC and CFLAGS if they are not defined.

Dependencies

After assigning variables, our example specifies dependencies. You specify dependencies by placing a target filename on the left, followed by a colon, and then a list of the filenames on which the target file depends. In our example,

$ (PRODUCT): $ (SOURCES) $ (INCLUDE)

the “PRODUCT” variable depends on the “SOURCES” variable and on the “INCLUDE” variable. After substituting the variables, this says that the file $HOME/bin/hello depends on the files hello.c and include/hello.h. If the target file doesn’t exist or is older than a file that it depends on, then make will attempt to rebuild the target.

It’s also possible to create target names that don’t generate a file. These targets are called phony. If you mark a target as .PHONY, then it will run without checking dependencies. For example,

.PHONY: clean

will cause the clean target to run even if there is a file named clean in the directory that is up to date.

Dependencies combined with commands, which are described in the section that follows, form rules.

Commands

The dependency line can be followed by one or more shell command lines that are executed if any of the dependencies have changed. These commands are often used to rebuild the target. For example,

gcc -g -o -o hello hello.c

is a command to build hello.

Command lines must be indented at least one tab stop from the left margin. (Tabs are required; the equivalent number of spaces won’t work.) Indenting these lines with spaces will result in an error message:

Makefile:8: *** missing separator (did you mean TAB instead of 8 spaces?). Stop.

A rule consists of a dependency line and the commands that follow it. They are often used to remake a target file, but they can also perform an arbitrary command. For example,

clean:

rm -f *.o $ (PRODUCT)

executes the command rm when make clean is run.

Makefiles with Multiple Dependencies

If you have multiple source files, you could extend the preceding example by adding the extra .c files to the end of the SOURCES line. Unfortunately, this approach would force a full recompile whenever any of the source files are changed, even if you only modify a single source file. It is more efficient to instead make the program target depend on object files (.o), so that make can reuse objects if their corresponding sources have not changed.

If you leave out a rule to explain how to get an object file (.o) from your source C or C++ files, then make will use a built-in implicit rule to automatically build your object files. For C files, the built-in command would look like

$ (CC) $ (CFLAGS) $ (CPPFLAGS) $ (TARGET_ARCH) -c

and for C++ files, the line would look like

$ (CXX) $ (CXXFLAGS) $ (CPPFLAGS) $ (TARGET_ARCH) -c

If you want to specify how to build the object files (.o), you could manually type in a rule for each object file/source file pair, or you could include in your makefile an explicit rule for building all files of that type. For C and C++ files, the explicit rules would look something like

.C. O:

$ (CC) -c $ (CFLAGS) -o $@ $<

.cpp.O:

$ (CXX) -c $ (CXXFLAGS) -o $@ $<

The .c.o and .cpp.o dependency lines are suffix rules that tell make that their rule applies to object files (.o) generated from .c and .cpp source files respectively $@ and $< are automatic variables. The $@variable stands for the filename of the target, and the $< variable stands for the name of the out-of-date dependency file. So, for example, if make were compiling the file hello.cpp, it would use the second rule, and $@ would be hello.o and $< would be hello.cpp.

Table 24–2 lists some of the automatic variables used inside of makefiles.

Table 24–2: Makefile Automatic Variables

Variable

Description

$@

The target filename.

$<

The name of the dependent file that is out of date.

$*

The stem of the dependent filename without the pattern matching elements.

$?

The list of all out-of-date dependent files. (All those that must be recompiled.)

$^

The list of all unique dependent files

$+

The list of all dependent files with duplicates in the order they were given.

$%

Only applies to library archives. The target member name when the target is an archive.

Regardless of whether you use an implicit or an explicit rule to generate your object files (.o), you will still need to explicitly list header file dependencies. To tell make that an object file must be rebuilt if a specific header file changes, you just add another dependency line without a trailing command. For example, the line

hello.o: headerfile1.h headerfile2.h

will tell make that hello.o should be rebuilt if either headerfile1.h or headerfile2.h have changed. Since manually entering header file dependencies is time consuming and error prone, there are utilities available such as makedepend, which will parse your source files and recursively follow #include directives to generate the header file dependency lines for your makefile.

A Complex C++ Makefile

What follows is an example of a more complex C++ makefile that generates a program called Executable. The program contains two source files, main.cpp and rest.cpp; a header file, private.h, which is in a subdirectory called include; and a library, libRoutines.a, with source files routine1.cpp and routine2.cpp.

# A more complicated makefile to combine c++ sources

# private header files, and libraries.

HEADERS=include/private.h

SOURCES=main.cpp rest.cpp

OBJECTS=${SOURCES:.cpp=.o}

PRODUCT=Executable

LIB=libRoutines.a

LIBSOURCES=routinel.cpp routine2.cpp

LIBOBJECTS=${LIBSOURCES:.cpp=.o}

INCLUDE=include

CXX=g++

CXXFLAGS=-g -Wall -o

all: $ (PRODUCT)

$(PRODUCT): $ (OBJECTS) $ (LIB)

$(CXX) $(CXXFLAGS) -o $(PRODUCT) $(OBJECTS) $(LIB)

.cpp.O:

$(CXX) -c $(CXXFLAGS) -I$(INCLUDE) $<

$(LIB): $(LIBOBJECTS)

ar crs $(LIB) $^

$(OBJECTS): $(HEADERS)

.PHONY: clean

clean:

rm -f *.o $ (PRODUCT) $ (LIB)

If you have all of these files and run make, then the output would be

$ make

g++ -c -g -Wall -o -Iinclude main.cpp

g++ -c -g -Wall -o -Iinclude rest.cpp

g++ -c -g -Wall -o -Iinclude routine1.cpp

g++ -c -g -Wall -o -Iinclude routine2.cpp

ar crs libRoutines.a routine1.o routine2.o

g++ -g -Wall -o -o Executable main.o rest.o libRoutines.a

Non-Programming Makefiles

The make command can be used to update other types of projects that have internal dependencies, such as documentation. Here’s a sample makefile that shows the basic structure to use make in a text writing project:

# Makefile for book version

PRINTER = lp

FILES = intro chap1 chap2 chap3 chap4 chap5 chap6 appendix glossary

book:

troff -Tpost -mm $ (FILES) | $ (PRINTER)

draft: $ (FILES)

nl -bt -p $? | pr -d $ (PRINTER)

To print the current version of the complete document, you would type make book or make draft.

The gdb Debugger

Most programs have bugs. You can attempt to prevent them by developing your code iteratively, and you can attempt to track them down by code inspection, using printf(), or log files. Often times, however, you have no option besides looking at a crash or stepping through your code in a debugger. gdb is a command-line debugger that allows you to do just that.

If you want to be able to debug your code, you need to compile your code in cc, gcc, or g++ with the -g command line option. (If you’re using a makefile, edit the CFLAGS or CXXFLAGS variable to include -g.) Without the -g option, the compiler will not include the debugging symbols that gdb needs to map your executable program back to your source code. The -g option makes your binary much larger and much easier to reverse-engineer, so it’s highly recommended that you not use it when compiling your final executable.

sdb and dbx are two alternate UNIX debuggers that you might find on your system. They are both less frequently used, so we focus here on gdb.

Launching gdb

You have several options for debugging your programs using gdb. You can launch gdb and run your program from inside it, you can attach gdb to a currently running version of your program, and you can use gdb to debug a core file from a crash.

Launching Programs Inside gdb

You can launch gdb with your application name as a command-line parameter. gdb will load up the program’s symbol information and allow you to launch the program. For example, the command

$ gdb debugprogram

will launch gdb and load debugprogram. It will give you a (gdb) prompt to indicate that it is ready for input. If you forgot to compile debugprogram with the -g flag, then you will see (no debugging symbols found) as part of your output.

If you type

(gdb) run

then gdb will launch your application. Where possible, gdb tries to save you some typing; you can abbreviate commands as long as the text is not ambiguous. In the case of run, you could type either r or ru and it would still launch your program.

You can also launch gdb without a command-line argument and load your program with the command

(gdb) file debugprogram

This will attempt to load debugprogram out of the current working directory You can load a file from a separate directory by including the path with the filename.

Attaching to a Process Using gdb

You can also attach gdb to a running process. You may want to do this if you notice a problem while your program is running. You can get a list of your running processes with ps -u username. It should look something like

$ ps -u nate

PID TTY TIME CMD

7240 ptS/34 00:00:00 bash

9033 pts/34 00:00:04 debugprogram

9238 pts/34 00:00:00 ps

The first column tells you that debugprogram is process ID 9033. You can then attach to your process using gdb by running it with the program name and the process ID. For example,

$ gdb debugprogram 9033

As long as there isn’t a file named 9033, this command will attach to the running process. It will halt the execution of the process at the current line of source. When you quit out of gdb, as long as you haven’t killed the program, gdb will detach the debugger from the process.

Using gdb to Debug a Core File

You can configure the UNIX environment to dump a core when an application crashes. The core is a core image-a file containing an image of the failed process, including its stack and heap at the moment of failure. (The term “core image” dates back to a time when the main memory of most computers was known as core memory, because it was built from donutshaped magnets called inductor cores.) The core image can be used by a debugger such as gdb to determine where the program was when it dropped core, and how-that is, by what sequence of function calls-it got there. gdb can also determine the values of variables at the moment the program failed, the statements and operations being executed at the time, and the argument(s) each function was called with.

Because core files can take up a lot of disk space, most UNIX systems by default will not dump a core when a program crashes. You can change this by running a shell command to increase or remove the limit on core dump sizes. In csh or tcsh, the shell command

% unlimit coredumpsize

will remove the core size restriction. In bash or ksh, this can be achieved with the command

$ ulimit -c unlimited

If you want to generate cores, you will need to add this line to a start-up script (such as your .bashrc) or run it each time you create a shell.

When you have core dumping enabled, if your program crashes through a bad pointer access, you should see a line of output that says

Segmentation fault (core dumped)

If you run ls in the program’s working directory then you should see a core file. On some systems, the core will have a name like core.20472, where 20472 is the number of the process that crashed.

You can debug a core file in gdb by specifying the application name followed by the name of the core file. For example,

$ gdb debugprogram core.20472

will launch gdb with the executable debugprogram and the core file core.20472. gdb will show the line of code that the program was executing when it dumped core.

Common gdb Commands

Table 24–3 lists some of the most commonly used gdb commands. As mentioned earlier, you can also enter abbreviated commands, such as p for print.

Table 24–3: gdb Commands

Command

Description

break

Sets a breakpoint in a file, function, or method.

bt

Shows the backtrace (calling stack) of your program.

c

Continue running the program.

delete

Remove breakpoints.

file

Sets the executable that you would like to debug.

help

Displays help information and help subtopics.

kill

Stop the process being debugged.

list

Show the source code around the line that the debugger is stopped on.

next

Step over the next program line. Executes functions and moves on.

print

Displays the value of a variable or expression.

quit

Exits the program.

set

Sets program or environment variables.

step

Step into the next program instruction. Will step into functions.

run

Runs your program. You can provide it command line arguments as well.

You can get more information and a description of the parameters that you can use for any of these commands with the help command. For example,

(gdb) help step

Step program until it reaches a different source line.

Argument N means do this N times (or till program stops for another reason).

Debugging with gdb

Once you launch gdb, it will print some copyright information to your terminal. If you attached to a process or are debugging a core file, it will print the symbols that it read, the function and line that it is currently debugging, and finally a (gdb) command prompt. The output should look something like this:

#0 0x080483b2 in main () at myfile.c:7

7 *x=80;

(gdb)

In this example, the debugger is stopped in the function main() on line 7 of myfile.c, and that line contains an assignment to *x.

If you aren’t attaching to a process or debugging a core, you will start out with a (gdb) command prompt. As described earlier, you can start a program with the run command. If you would like to get a command prompt while the program is running, then you can interrupt the program’s execution by typing CTRL-C. (Holding the CTRL key and pressing the c key) This should give you the file and line that you stopped the program at as just shown. If you would like to continue the program’s execution, you can enter the c command.

Text editing for the gdb command prompt works like the emacs editor by default. It will allow you to cycle through the list of past commands using the arrow keys, cut the text to the end of the current line with CTRL-K, and paste the copied text with CTRL-Y. See Chapter 5 for details on how to use emacs.

A gdb Example

This section provides a short walkthrough of gdb. It ‘s probably most helpful if you are able to try it out on your machine while you read through the material. Start off by entering the following text into a file called debugtest.c:

#include <stdio.h>

void foo()

{

unsigned int i = 0;

unsigned int * p = &i;

while(p)

i++;

}

}

int main()

{

foo() ;

return 0;

}

Then compile it with the command

$ gcc -Wall -g -o debugtest debugtest.c

and launch gdb with the command

$ gdb debugtest

You should get a (gdb) command prompt. Enter the run (or r) command. It should look like this:

(gdb) run

Starting program: /home/nate/debugtest

Reading symbols from shared object read from target memory...done.

Loaded system supplied DSO at 0xfc8000

The preceding code will go into an infinite loop. You should interrupt the program with CTRL-C. You may then see

Program received signal SIGINT, Interrupt.

0x0804835f in foo () at debugtest.c:9

9 i++;

(gdb)

It may also will stop on the line above it. You can look at the value of the variables i and p with the command print (or p):

(gdb) print i

$1 = 2 3 4 0 5 1 6 3 4 8

(gdb) print p

$2 = (unsigned int *) 0xbfa563d0

(gdb) p *p

$3 = 2 3 4 0 5 1 6 3 4 8

You can also get a listing of the source code near this line with the command list (or 1) and the call stack with the command bt:

(gdb) list

4 {

5 unsigned int i=0;

6 unsigned int * p=&i;

7

8 while(p) {

9 i++;

10 }

11 }

12

13 int main()

(gdb) bt

#0 Ox0804835f in foo () at debugtest.c:9

#1 0x08048385 in main () at debugtest.c:15

You can break out of the loop by changing the value of p with the set command. Try entering the following three commands:

(gdb) set variable p = 0

(gdb) p *p

Cannot access memory at address 0x0

(gdb) c

Continuing.

Program exited normally.

Using CTRL-C to interrupt your program works well in the case of an infinite loop, but often you want to look at why a specific piece of code isn’t working. You can force the debugger to stop on a particular line by setting a breakpoint using the break (or b) command. Next enter

(gdb) b debugtest.c:6

Breakpoint 1 at 0x8048355: file debugtest.c, line 6.

Now if you run the program again, you will see

(gdb) r

Starting program: /home/nate/debugtest

Reading symbols from shared object read from target memory...done.

Loaded system supplied DSO at Oxa48000

Breakpoint 1, foo () at debugtest.c:6

6 unsigned int * p = &i;

(gdb)

You can remove the breakpoint with the delete (or d) command:

(gdb) d 1

You can then step (or s) through program execution one line at a time:

(gdb) s

8 while(p) {

(gdb) s

9 i++;

(gdb) s

8 while(p) {

When you are done with your debugging session, run the quit (or q) command. If you launched the program inside the debugger and it’s still running, you will get a prompt:

(gdb) quit

The program is running. Exit anyway? (y or n)

Saying y will kill the program.

Source Control with cvs

Source control lets you store and manage multiple versions of your files. It is essential for managing the concurrent work of many developers, and it allows you to track changes to your files over time. It is not specific to C and C++ development; it can be used on files of any type. Source control programs track your changes by storing a history of your files in a central repository. The source control program can use this history to reconstruct any version of the file, which allows you to roll back if you make a mistake.

Modern source control programs allow each user to check out their own local copy of the source. Once you are ready to save your local changes, you can commit the changes back to the repository If you are modifying the most recent version of the file, then you can commit directly, but if someone else has modified the file since you checked it out, then you must merge your changes with the repository changes before submitting. Advanced source control programs also allow you to keep track of isolated code branches and to merge changes to source files across these branches.

There are many options available for source control on UNIX. Some of the most commonly used are SCCS, RCS, CVS, Subversion, and Perforce. SCCS and RCS are restrictive because they only allow you to lock files. This lock allows only a single user at a time to modify each file. CVS evolved from RCS. It also allows locking, but it supports concurrent work on the same file as well. For large projects, this is generally much more useful, as it is possible to merge multiple people’s work on the same file. Subversion is an evolution of CVS but isn’t as widely adopted. Perforce is a commercially available product with robust branching and merging support. This section focuses on CVS (Concurrent Version System), since it is arguably the most popular source control program used on UNIX systems.

Obtaining CVS

You can run the command which cvs to find out if you have CVS preinstalled. If you need to install CVS, you can download the binaries and source from: ftp.gnu.org/non-gnu/cυs/. This chapter gives the basics for using CVS; for more information, there is a central CVS web site at http://www.nongnu.org/cvs/.

Configuring Your Environment and Repository

To use CVS, you will either need to use an existing repository or create a new one. If you need to create a new repository, then you should create a directory for it in a central location on your file system. You could also use a remote server, but setting up remote authentication for a CVS server is beyond the scope of this book.

Once you have decided on a location for your repository you should set the environment variable $CVSROOT to the path for that directory. If you stored your repository locally at /usr/local/cvsdepot/, for example, you would run the following command in csh or tcsh:

% setenv CVSROOT /usr/local/cvsdepot

If instead you are running ksh or bash, then the commands would be

$ export CVSROOT=/usr/local/cvsdepot

You will likely want to add this to a start-up script such as your .profile or .login file. Another environment variable that you might want to modify is CVSEDITOR, which specifies which text editor cvs opens when you need to enter a commit message. CVS currently uses vi by default.

You can generate a new repository in your directory by typing the init command:

$ cvs -d /usr/local/cvsdepot init

This command will generate a directory called CVSROOT inside /usr/local/cvsdepot. The init command initializes the repository, so that you can add projects.

Adding a Project

Once you have a repository established, you will want to add your existing project source files to it. It’s possible to import your repository history from another source control program, but more commonly you will start a new history for your files. To do that, cd to your current source code base directory If you run the cvs import command like

$ cvs import -m "Imported project files." projectname vendortag releasetag

then cvs will add all of the files in the current directory to a directory in the repository called projectname. The initial version of the files will have the note “Imported project files.” If you have subdirectories in that base directory, then they will also be recursively added, along with their files, to the repository Both a vendortag and a releasetag string must be included. These tags are intended to track multiple releases of third-party code but often go unused. If the import command succeeds, then the source should now be in the repository Once you have carefully verified that you can check out a full copy of the source (see the next section) and made a backup, you might want to remove the old source directory that you imported from, since it will not be under source control.

Checking Out Source Files

Once you have imported your project into the repository each developer will want to check out their own local version of the source to work with. Change directory to where you would like the root directory of your source code to be located and type

$ cvs checkout projectname

This command will add a directory called projectname and populate it with your source files. Each project directory will have a subdirectory called CVS, which has information that CVS uses to keep track of which version you have of each file.

Working with Files

By default, when you checkout your source files from CVS, they are writable. When you are ready to save your local changes to the repository, you should cd to your project’s root directory You can commit all of the files in that directory and all of its subdirectories by running the command

$ cvs commit

You can also commit specific files by specifying them as command-line arguments. If you are up to date with the most recent repository version of your modified files, then cvs will bring up an editor asking you to type in a message about the changes that you made to each directory After you exit the editor, it will save your changes back into the repository This makes your version of these files the one that other users will get with a checkout command.

Your committed files are now also available to other users via the update command. An update copies the most recent version of each file from the repository and replaces your local copy of that file. To update,you can either specify which files you would like to update, or you can run the command

$ cvs update

which gives you all of the changes in the current directory and in any recursive subdirectory The only files that will be modified by an update are the ones that other users have checked in since you last ran a checkout or an update. cvs will display a special character and the filename for each of the files that are affected by the update. Table 24–4 lists the meaning of each character.

Table 24–4: cvs update Character Flags

Character

Description

?

This file is in your current directory but not in the repository.

A

This file is pending add when you commit.

C

Your changes to this file conflict with the changes from the repository.

M

This file was merged. Either your local changes and the repository changes didn’t conflict, or it’s a file that you have modified and the repository hasn’t.

P

This file has been patched; same as U.

R

This file is pending removal when you commit.

U

This file has been updated or added, and you haven’t modified it locally.

If you have locally modified files that another user has committed since your last update, then cvs update will need to merge your changes with the repository changes. When this is the case, cvs updatedisplays an M before the filename. It will also show an M if you modified a file locally and the repository version hasn’t changed. If you see a line like

Merging differences between 1.1.1.1 and 1.2 into hello.c

then cvs automatically merged the lines of your hello.c file and the repository version of the hello.c file together. cvs should make a backup of your pre-merge file called .#filename.revision. In the preceding example, it would create a file called .#hello2.c.1.1.1.1. Because the filename starts with a period, you will have to run ls -a to list it.

If you modified the same lines of a text file as another user who already committed to the repository, then cvs can’t automatically merge the results. This is called a conflict. cvs should save a backup of your old file in the same manner as it does for a merge, but in the case of a conflict, you must manually resolve the conflicting lines by editing the file. cvs will place both copies of the conflicting text into the file separated by conflict markers. Conflicting lines in the source file will look like

<<<<<<< hello.c

printf ("This is one version of the file.\n");

printf ("This is another version of the file.\n") ;

>>>>>>> 1.4

You must edit the conflicting file, fix the conflicts, and remove the conflict markers. CVS requires you to modify the file before submitting. Although it’s not advisable, CVS will not prevent you from submitting a file with unresolved conflict markers as long as you have made some modifications.

cvs update has several important command-line options. If you run cvs update with -d, it will tell cvs to create new directories that you don’t have in your local version of the repository You can revert a locally modified file with the -C option. This will overwrite the local file with the copy from the repository (cvs should back the file up in the same manner as it does for merges.) Finally, the -r option allows you to specify a revision that you would like to get. This allows you to pull an old version from a file’s history

The commands we described are the minimum set that you need to know to get up and running in CVS. Table 24–5 includes some additional cvs commands that may be useful to you.

Table 24–5: cvs Commands

Command

Description

add

Adds a file or directory to the repository. Can also undo a remove. Doesn’t take effect until the add is committed to the repository.

admin

Performs administrative and RCS commands. Allows you to lock and unlock files and to delete revision ranges.

checkout

Creates a new local project in the current directory. If one is already there, then it will update the contents, although update is recommended for this instead.

commit

Saves local changes into the repository. Can commit all changed files or you can specify a list. Will ask you for a message to accompany the change.

diff

Compares two revisions of a file and displays the differences. By default it compares the local version with the repository version.

edit

Makes a file that is being watched enabled for write.

log

Displays the log information for the listed files. This includes the location of the RCS file, tags, version authors, and the commit messages.

remove

Removes a file or directory from the repository. Can also undo an uncommitted add. Doesn’t take effect until committed.

status

Shows whether the specified files are up to date or have an unresolved conflict with the repository as well as other revision information.

update

Copies changes that were committed to the repository to the local copy of the source. Can also get a specific version of a file and bring in new directories.

unedit

Cancels an edit command and reverts the file to the depot version.

watch

Causes files to be checked out read-only. A user must run the edit command before modifying these files. Can also set up file status change notifications.

You can get a full list of cvs commands by running

$ cvs --help-commands

Manual Pages

You’ve seen that the man command will give you a description of a program and its options. The man command also provides information about C/C++ system calls and standard library functions. Often, UNIX commands (such as printf) have the same name as library functions. You can see all listings associated with a given name with man -a.

You can also create your own man pages. man pages are text documents that use special formatting via nroff/troff macros (see the companion web site for more information about troff and nroff). An example man page for a command called widget might look something like this:

. \" An example comment

.TH WIDGET 1 "July 2006"

.SH NAME

widget \- run a shell widget

.SH SYNOPSIS

.B widget

[

.I \-options

] [

.I arguments

\&...]

.SH DESCRIPTION

This is a description of what the widget does.

.SH SEE ALSO

WidgetFactory is a related command.

.RS

.B WidgetFactory

.RE

If you type this into a file called widget.1 and run the command

$ man ./widget.1

then man will display a page like Figure 24–1. You can also save man pages in compressed files (e.g., widget.1.gz).

Image from book
Figure 24–1: Sample man page

If you want to install your man page so that any user can find information about your application, you must place your file in a directory that man searches. On many systems, man pages are stored in /usr/manor /usr/share/man. You can list the paths that man searches by typing

$ man --path

These directories contain subdirectories such as man1, man2, etc. These subdirectories each store a section, which groups together man pages of a similar type. For example, most user commands belong in section 1, in the directory man1. (For a description of the man page sections, see http://en.wikipedia.org/wiki/Man_page.) The name of your man page file should also indicate which section it belongs to. For example, since widget belongs in section 1, it is called widget.1 and goes in the directory /usr/man/man1. Once you have placed it there, you can view the page with the command

$ man widget

Other Development Tools

This chapter describes key tools for developing C and C++ programs under UNIX, but there are many other useful tools available.

Integrated development environments, or IDEs, generally provide a graphical user interface for source code editing, project management, and debugging. Visual Slickedit (http://www.slickedit.com/) is a commercially available IDE that runs on most UNIX systems. It supports workspaces and projects; is flexible enough to use external tools; and has a highly configurable editor, a graphical debugger, and a tagging system that lets you quickly browse through your code. Two freely available IDEs for Linux are Anjuta (http://www.anjuta.org/) and Kdevelop (http://www.kdevelop.org/). Anjuta is intended for GNOMEdevelopment, and Kdevelop is intended for KDE development.

ddd (http://www.gnu.org/software/ddd/) is a GNU tool that provides a graphical user interface that builds on top of a command-line debugger. You can use it on top of gdb or dbx as well as debuggers for other languages like java, perl, and python. ddd allows you to view the source text and see breakpoints, and it has an interactive graphical data display.

There are many UNIX tools available for memory tracking and performance profiling. Valgrind (http://www.valgrind.org/) is an open-source memory tracking and performance profiling tool that is currently only supported on Linux, though it has experimental ports on other platforms. ElectricFence (http://perens.com/FreeSoftware/ElectricFence/) is a freely available memory bounds checker for Linux and UNIX. Purify(http://www.ibm.com/software/awdtoots/purify/unix/) does memory leak and corruption detection and is commercially available for both Linux and UNIX. Vtune (http://www.intet.com/cd/software/products/asmo-na/eng/υtune/υtin/index.htm) is a commercially available profiler that supports both event based sampling and call graph analysis for Linux.

If you need to build a graphical user interface for an X Window application, you will probably want to build on top of a widget toolkit. GTK+ (http://www.gtk.org/) and QT (http://www.trolltech.com/products/qt) are currently the most popular widget libraries. GTK+ is an open-source GNU project. It was used in building the GNOME desktop environment. QT is only freely available for open-source software. It was used in building KDE. If you are building 3D applications under UNIX, you will probably want to use OpenGL (http://www.opengl.org/).

DejaGnu (http://www.gnu.org/software/dejagnu/) is a testing framework. It helps you to build a test harness that allows you to run multiple tests on your programs. It supports both system and unit testing. DejaGnu tests are usually written in Expect using Tcl.

lex, flex, yacc, and GNU bison are tools that are useful if you are doing complex text interpretation such as writing your own source code compiler. lex and flex are tools that, given a specification file with regular expressions, will generate C source files that, when compiled, will perform lexical analysis. Lexical analysis takes an input string of characters and breaks it up into a series of symbols called tokens. You could use these tokens in your programs, but more often they are passed on to a parser. yacc and bison are tools that, given a grammar, will generate C source files that, when compiled, will parse tokens. A parser analyzes tokens and generates a structured tree from the tokens. This structured tree is most often used by a compiler to turn a program into assembly flex and bison are both open source. They are available from http://flex.sourceforge.net/ and http://www.gnu.org/software/bison/. Traditionally lex and yacc were proprietary, but their source is now available from: http://cυs.opensotaris.org/source/xref/on/usr/src/cmd/sgs/.

Summary

This chapter described how to build C and C++ programs with the gcc compiler, how to use make to manage dependencies, how to debug with gdb, how to manage your source files with cvs, and how to use man to write your own documentation. You are now familiar with all the tools that you need to develop complex C and C++ programs under UNIX.

This chapter did not cover the APIs for using UNIX system calls. UNIX provides functions that go far beyond the standard C/C++ libraries. UNIX system calls allow you to get information out of the environment, get the system time, interact with the file system, manage processes, communicate between processes, send/receive information over a network, access shared memory, handle signals, and use semaphores. There are also thread packages available to allow multithreaded programming.

If you find that you need more information about UNIX system calls, you can use the man command to look up function parameters and return values. While there is a wealth of knowledge in the man pages, it can be difficult to know what function name to search for. You can often find the name of the function that you’re looking for with a web search in your favorite search engine, or from the SEE ALSO section at the bottom of the UNIX man pages.

How to Find Out More

There are many books dedicated to each of the tools covered in this chapter. The following references are good places to start:

· Gough, Brian J., forward by Richard M. Stallman. An Introduction to GCC. Bristol, United Kingdom: Network Theory Limited, 2004.

· Mecklenburg, Robert. Managing Projects with GNU Make. 3rd ed. Sebastopol, CA: O’Reilly Media, Inc., 2004.

· Stallman, Richard M., Roland Pesch, Stan Shebs, et al. Debugging with GDB: The GNU Source-Level Debugger. 9th ed. Boston, MA: GNU Press, 2002.

· Vesperman, Jennifer. Essential CVS. Sebastopol, CA: O’Reilly & Associates, Inc., 2003.

The following book is a good reference for UNIX-specific system calls:

· Stevens, W. Richard, and Stephen A. Rago. Advanced Programming in the UNIX Environment. 2nd ed. Reading, MA: Addison-Wesley, 2005.

This book serves as an in depth reference for many programming tools, including source control, GNU make, and gdb:

· Robbins, Arnold. UNIX in a Nutshell 4th ed. Sebastopol, CA: O’Reilly Media, Inc., 2005.