Practical C Programming, 3rd Edition (2011)

Part III. Advanced Programming Concepts

Chapter 20. Portability Problems

Wherein I spake of most disastrous changes,

Of moving accidents by flood and field,

Of hair-breath ’scapes i’ the imminent deadly breath...

—Shakespeare, on program porting [Othello, Act 1, Scene III]

You’ve just completed work on your great masterpiece, a ray-tracing program that renders complex three-dimensional shaded graphics on a Cray supercomputer using 300MB of memory and 50GB of disk space. What do you do when someone comes in and asks you to port this program to an IBM PC with 640K of memory and 100MB of disk space? Killing him is out; not only is it illegal, but it is considered unprofessional. Your only choice is to whimper and start the port. During this process, you will find that your nice, working program exhibits all sorts of strange and mysterious problems.

C programs are supposed to be portable; however, C contains many machine-dependent features. Also, because of the vast difference between UNIX and MS-DOS/Windows, system deficiencies can frequently cause portability problems with many programs.

This chapter discusses some of the problems associated with writing truly portable programs as well as some of the traps you might encounter.

Modularity

One of the tricks to writing portable programs is to put all the nonportable code into a separate module. For example, screen handling differs greatly on MS-DOS/Windows and UNIX. To design a portable program, you’d have to write machine-specific modules that update the screen.

For example, the HP-98752A terminal has a set of function keys labeled F1 to F8. The PC terminal also has a set of function keys. The problem is that they don’t send out the same set of codes. The HP terminal sends “<esc>p<return>” for F1 and the PC sends “<NULL>;”. In this case, you would want to write a get_code routine that gets a character (or function-key string) from the keyboard and translates function keys. Because the translation is different for both machines, a machine-dependent module would be needed for each one. For the HP machine, you would put together the program with main.c and hp-tty.c, while for the PC you would use main.c and pc-tty.c.

Word Size

A long int is 32 bits, a short int is 16 bits, and a normal int can be 16 or 32 bits, depending on the machine. This disparity can lead to some unexpected problems. For example, the following code works on a 32-bit UNIX system, but fails when ported to MS-DOS/Windows:

int zip;

zip = 92126;

printf("Zip code %d\n", zip);

The problem is that on MS-DOS/Windows, zip is only 16 bits—too small for 92126. To fix the problem, we declare zip as a 32-bit integer:

long int zip;

zip = 92126;

printf("Zip code %d\n", zip);

Now zip is 32 bits and can hold 92126.

Question 20-1: Why do we still have a problem? zip does not print correctly on a PC. (Click here for the answer Section 20.9)

Byte Order Problem

A short int consists of 2 bytes. Consider the number 0x1234. The 2 bytes have the values 0x12 and 0x34. Which value is stored in the first byte? The answer is machine dependent.

This uncertainty can cause considerable trouble when you are trying to write portable binary files. The Motorola 68000-series machines use one type of byte order (ABCD), while Intel and Digital Equipment Corporation machines use another (BADC).

One solution to the problem of portable binary files is to avoid them. Put an option in your program to read and write ASCII files. ASCII offers the dual advantages of being far more portable and human readable.

The disadvantage is that text files are larger. Some files may be too big for ASCII. In that case, the magic number at the beginning of a file may be useful. Suppose the magic number is 0x11223344 (a bad magic number, but a good example). When the program reads the magic number, it can check against the correct number as well as the byte-swapped version (0x22114433). The program can automatically fix the file problem:

const long int MAGIC = 0x11223344L /* file identification number*/

const long int SWAP_MAGIC = 0x22114433L /* magic-number byte swapped */

FILE *in_file; /* file containing binary data */

long int magic; /* magic number from file */

in_file = fopen("data", "rb");

fread((char *)&magic, sizeof(magic), 1, in_file);

switch (magic) {

case MAGIC:

/* No problem */

break;

case SWAP_MAGIC:

printf("Converting file, please wait\n");

convert_file(in_file);

break;

default:

fprintf(stderr,"Error:Bad magic number %lx\n", magic);

exit (8);

}

Alignment Problem

Some computers limit the address that can be used for integers and other types of data. For example, the 68000 series require that all integers start on a 2-byte boundary. If you attempt to access an integer using an odd address, you will generate an error. Some processors have no alignment rules, while some are even more restrictive—requiring integers to be aligned on a 4-byte boundary.

Alignment restrictions are not limited to integers. Floating-point numbers and pointers must also be aligned correctly.

C hides the alignment restrictions from you. For example, if you declare the following structure on a 68000 series:

struct funny {

char flag; /* type of data following */

long int value; /* value of the parameter*/

};

C will allocate storage for this structure as shown on the left in Figure 20-1.

Structure on 68000 and 8086 architectures

Figure 20-1. Structure on 68000 and 8086 architectures

On an 8086-class machine with no alignment restrictions, this storage will be allocated as shown on the right in Figure 20-1. The problem is that the size of the structure changes from machine to machine. On a 68000, the structure size is 6 bytes, and on the 8086 it is 5 bytes. So if you write a binary file containing 100 records on a 68000, it will be 600 bytes long, while on an 8086, it will be only 500 bytes long. Obviously, the file is not written the same way on both machines.

One way around this problem is to use ASCII files. As we have said before, there are many problems with binary files. Another solution is to explicitly declare a pad byte:

struct new_funny {

char flag; /* type of data following */

char pad; /* not used */

long int value; /* value of the parameter*/

};

The pad character makes the field value align correctly on a 68000 machine while making the structure the correct size on an 8086-class machine.

Using pad characters is difficult and error prone. For example, although new_funny is portable between machines with 1- and 2-byte alignment for 32-bit integers, it is not portable to any machine with a 4-byte integer alignment such as a Sun SPARC system.

NULL Pointer Problem

Many programs and utilities were written using UNIX on VAX computers. On this computer, the first byte of any program is 0. Many programs written on this computer contain a bug—they use the null pointer as a string.

For example:

#ifndef NULL

#define NULL ((char *)0)

#endif NULL

char *string;

string = NULL;

printf("String is '%s'\n", string);

This code is actually an illegal use of string. Null pointers should never be dereferenced. On the VAX, this error causes no problems. Because byte of the program is 0, string points to a null string. This result is due to luck, not design.

On a VAX, the following result is produced:

String is ''

On a Celerity computer, the first byte of the program is a 'Q‘. When this program is run on a C1200, it produces:

String is 'Q'

On other computers, this type of code can generate unexpected results. Many of the utilities ported from a VAX to a Celerity exhibited the 'Q' bug.

Many of the newer compilers will now check for NULL and print:

String is (null)

This message does not mean that printing NULL is not an error. It means that the error is so common that the compiler makers decided to give the programmer a safety net. The idea is that when the error occurs, the program prints something reasonable instead of crashing.

Filename Problems

UNIX specifies files as /root/sub/file, while MS-DOS/Windows specifies files as \root\sub\file. When porting from UNIX to MS-DOS/Windows, filenames must be changed. For example:

#ifndef __MSDOS__

#include <sys/stat.h> /* UNIX version of the file */

#else __MSDOS__

#include <sys\stat.h> /* DOS version of the file */

#endif __MSDOS__

Question 20-2: The following program works on UNIX, but when we run it on MS-DOS/Windows, we get the following message (Click here for the answer Section 20.9):

oot

ew able: file not found.

FILE *in_file;

#ifndef __MSDOS__

const char NAME[] = "/root/new/table";

#else __MSDOS__

const char NAME[] = "\root\new\table";

#endif __MSDOS__

in_file = fopen(NAME, "r");

if (in_file == NULL) {

fprintf(stderr,"%s: file not found\n", NAME);

exit(8);

}

File Types

In UNIX there is only one file type. In MS-DOS/Windows there are two: text and binary. The flags O_BINARY and O_TEXT are used in MS-DOS/Windows to indicate file type. These flags are not defined for UNIX.

When you port from MS-DOS/Windows to UNIX, you will need to do something about the flags. One solution is to use the preprocessor to define them if they have not already been defined:

#ifndef O_BINARY /* If we don't have a flag already */

#define O_BINARY 0 /* Define it to be a harmless value */

#define O_TEXT 0 /* Define it to be a harmless value */

#endif /* O_BINARY */

This method allows you to use the same opens for both UNIX and MS-DOS/Windows. However, going the other way may present some problems. In UNIX a file is a file. No additional flags are needed. Frequently none are supplied. However, when you get to MS-DOS/Windows, you need the extra flags and will have to put them in.

Summary

You can write portable programs in C. However, because C runs on many different types of machines that use different operating systems; making programs portable is not easy. Still, if you keep portability in mind when creating the code, you can minimize the problems.

Answers

Answer 20-1: The variable zip is a long int. The printf specification %d is for a normal int, not a long int. The correct specification is %ld to indicate a long:

printf("Zip code %ld\n", zip);

Answer 20-2: The problem is that C uses the backslash (\) as an escape character. The character \r is a carriage return, \n is newline, and \t is a tab. What we really have for a name is:

<return>oot<newline>ew<tab>able

The name should be specified as:

const char NAME[] = "\\root\\new\\table";

NOTE

The #include uses a filename, not a C string. While you must use double backslashes (\\) in a C string, in an #include file, you use single backslashes (\). The following two lines are both correct: const char NAME[] = “\\root\\new\\table”; #include “\root\new\defs.h”