The C++ Programming Language (2013)

Part IV: The Standard Library

43. The C Standard Library

C is a strongly typed, weakly checked language.

– D. M. Ritchie

• Introduction

• Files

• The printf() Family

• C-Style Strings

• Memory

• Date and Time

• Etc.

• Advice

43.1. Introduction

The standard library for the C language is with very minor modifications incorporated into the C++ standard library. The C standard library provides quite a few functions that have proven useful over the years in a wide variety of contexts – especially for relatively low-level programming.

There are more C standard-library functions than are presented here; see a good C textbook, such as “Kernighan and Ritchie” [Kernighan,1988] or the ISO C standard [C,2011], if you need to know more.

43.2. Files

The <stdio> I/O system is based on files. A file (a FILE*) can refer to a file or to one of the standard input and output streams: stdin, stdout, and stderr. The standard streams are available by default; other files need to be opened:

A file opened with fopen() must be closed by fclose() or the file will remain open until the operating system closes it. If that is a problem (is considered a leak), use an fstream (§38.2.1).

A mode is a C-style string containing one or more characters specifying how a file is to be opened (and used after opening):

There may be (and usually are) more options on a specific system. For example, x is sometimes used to mean “the file must not exist before this open operation.” Some options can be combined, for example, fopen("foo","rb") tries to open a file called foo for binary reading. The I/O modes should be the same for stdio and iostreams (§38.2.1).

43.3. The printf() Family

The most popular C standard library functions are the output functions. However, I prefer iostreams because that library is type-safe and extensible. The formatted output function, printf(), is widely used (also in C++ programs) and widely imitated in other programming languages:

For each version, n is the number of characters written or a negative number if the output failed. The return value from printf() is essentially always ignored.

The declaration of printf() is:

int printf(const char* format ...);

In other words, it takes a C-style string (typically a string literal) followed by an arbitrary number of arguments of arbitrary type. The meaning of those “extra arguments” is controlled by conversion specifications, such as %c (print as character) and %d (print as decimal integer), in the format string. For example:

int x = 5;
const char* p = "Pedersen";
printf("the value of x is '%d' and the value of s is '%s'\n",x,s);

A character following a % controls the handling of an argument. The first % applies to the first “extra argument” (here, %d applies to x), the second % to the second “extra argument” (here, %s applies to s), and so on. In particular, the output of that call to printf() is

the value of x is '5' and the value of s is 'Pedersen'

followed by a newline.

In general, the correspondence between a % conversion directive and the type to which it is applied cannot be checked, and when it can, it usually is not. For example:

printf("the value of x is '%s' and the value of s is '%x'\n",x,s); // oops

The set of conversion specifications is quite large (and growing over the years) and provides a great degree of flexibility. Various systems support options beyond the ones offered by the C standard. See also the set of options used for strftime() formatting (§43.6). Following the %, there may be:

–

an optional minus sign that specifies left-adjustment of the converted value in the field;

an optional plus sign that specifies that a value of a signed type will always begin with a + or – sign;

an optional zero that specifies that leading zeros are used for padding of a numeric value. If – or a precision is specified this 0 is ignored;

an optional # that specifies that floating-point values will be printed with a decimal point even if no nonzero digits follow, that trailing zeros will be printed, that octal values will be printed with an initial 0, and that hexadecimal values will be printed with an initial 0x or 0X;

an optional digit string specifying a field width; if the converted value has fewer characters than the field width, it will be blank-padded on the left (or right, if the left-adjustment indicator has been given) to make up the field width; if the field width begins with a zero, zero-padding will be done instead of blank-padding;

an optional period that serves to separate the field width from the next digit string;

an optional digit string specifying a precision that specifies the number of digits to appear after the decimal point, for e- and f-conversion, or the maximum number of characters to be printed from a string;

a field width or precision may be * instead of a digit string. In this case an integer argument supplies the field width or precision;

an optional character h, specifying that a following d, i, o, u, x, or X corresponds to a (signed or unsigned) short integer argument;

an optional pair of characters hh, specifying that a following d, i, o, u, x, or X argument is treated as a (signed or unsigned) char argument;

an optional character l (ell), specifying that a following d, i, o, u, x, or X corresponds to a (signed or unsigned) long integer argument;

an optional pair of characters ll (ell ell), specifying that a following d, i, o, u, x, or X corresponds to a (signed or unsigned) long long integer argument;

an optional character L, specifying that a following a, A, e, E, f, F, g, or G corresponds to a long double argument;

specifying that a following d, i, o, u, x, or X corresponds to a intmax_t or uintmax_t argument;

specifying that a following d, i, o, u, x, or X corresponds to a size_t argument;

specifying that a following d, i, o, u, x, or X corresponds to a ptrdiff_t argument;

indicating that the character % is to be printed; no argument is used;

a character that indicates the type of conversion to be applied. The conversion characters and their meanings are:

The integer argument is converted to decimal notation;

The integer argument is converted to octal notation;

The integer argument is converted to hexadecimal notation;

The float or double argument is converted to decimal notation in the style [–]ddd.ddd. The number of d’s after the decimal point is equal to the precision for the argument. If necessary, the number is rounded. If the precision is missing, six digits are given; if the precision is explicitly 0 and # isn’t specified, no decimal point is printed;

Like %f but uses capital letters for INF, INFINITY, and NAN.

The float or double argument is converted to decimal notation in the scientific style [–]d.ddde+dd or [–]d.ddde–dd, where there is one digit before the decimal point and the number of digits after the decimal point is equal to the precision specification for the argument. If necessary, the number is rounded. If the precision is missing, six digits are given; if the precision is explicitly 0 and # isn’t specified, no digits and no decimal point are printed;

As e, but with an uppercase E used to identify the exponent;

The float or double argument is printed in style d, in style f, or in style e, whichever gives the greatest precision in minimum space;

As g, but with an uppercase E used to identify the exponent;

The double argument is printed in the hexadecimal format [–]0xh.hhhhp+d or [–]0xh.hhhhp+d;

Like %a but using X and P instead or x and p;

The character argument is printed. Null characters are ignored;

The argument is taken to be a string (character pointer), and characters from the string are printed until a null character or until the number of characters indicated by the precision specification is reached; however, if the precision is 0 or missing, all characters up to a null are printed;

The argument is taken to be a pointer. The representation printed is implementation-dependent;

The unsigned integer argument is converted to decimal notation;

The number of characters written so far by the call of printf(), fprintf(), or sprintf() is written to the int pointed to by the pointer to int argument.

In no case does a nonexistent or small field width cause truncation of a field; padding takes place only if the specified field width exceeds the actual width.

Here is a more elaborate example:

char* line_format = "#line %d \"%s\"\n";
int line = 13;
char* file_name = "C++/main.c";

printf("int a;\n");
printf(line_format,line,file_name);

which produces:

int a;
#line 13 "C++/main.c"

Using printf() is unsafe in the sense that type checking is not done. For example, here is a well-known way of getting unpredictable output, a segmentation fault, or worse:

char x = 'q';
printf("bad input char: %s",x); // %s should have been %c

The printf() function does, however, provide great flexibility in a form that is familiar to C programmers.

Because C does not have user-defined types in the sense that C++ has, there are no provisions for defining output formats for user-defined types, such as complex, vector, or string. The format for strftime() (§43.6) is an example of the contortions you can get into by trying to design yet another set of format specifiers.

The C standard output, stdout, corresponds to cout. The C standard input, stdin, corresponds to cin. The C standard error output, stderr, corresponds to cerr. This correspondence between C standard I/O and C++ I/O streams is so close that C-style I/O and I/O streams can share a buffer. For example, a mix of cout and stdout operations can be used to produce a single output stream (that’s not uncommon in mixed C and C++ code). This flexibility carries a cost. For better performance, don’t mix stdio and iostream operations for a single stream. To ensure that, callios_base::sync_with_stdio(false) before the first I/O operation (§38.4.4).

The stdio library provides a function, scanf(), that is an input operation with a style that mimics printf(). For example:

int x;
char s[buf_size];
int i = scanf("the value of x is '%d' and the value of s is '%s'\n",&x,s);

Here, scanf() tries to read an integer into x and a sequence of non-whitespace characters into s. A non-format character specifies that the input should contain that character. For example:

the value of x is '123' and the value of s is 'string '\n"

will read 123 into x and string followed by a 0 into s. If the call of scanf() succeeds, the resulting value (i in the call above) will be the number of argument pointers assigned to (hopefully 2 in the example); otherwise, EOF. This way of specifying input is error-prone (e.g., what would happen if you forgot the space after string on that input line?). All arguments to scanf() must be pointers. I strongly recommend against the use of scanf().

So what can we do for input if we are obliged to use stdio? One popular answer is “use the standard-library function gets()”:

// very dangerous code:
char s[buf_size];
char* p = gets(s); // read a line into s

The call p=gets(s) reads characters into s until a newline or an end-of-file is encountered and a '\0' is placed after the last character written to s. If an end-of-file is encountered or if an error occurred, p is set to the nullptr; otherwise, it is set to s. Never use gets(s) or its rough equivalent (scanf("%s",s))! For years, they were the favorites of virus writers: By providing an input that overflows the input buffer (s in the example), a program can be corrupted and a computer potentially taken over by an attacker. The sprintf() function can suffer similar buffer-overflow problems. The C11 version of the C standard library offers a whole alternate set of stdio input functions that take an extra argument to defend against overflow, such as gets_s(p,n). As for iostream’s unformatted input, that leaves the user with the problem of deciding exactly which termination condition was encountered (§38.4.1.2; e.g., too many characters, a terminator character, or an end-of-file).

The stdio library also provides simple and useful character read and write functions:

The result of these operations is an int (not a char, or EOF could not be returned). For example, this is a typical C-style input loop:

int ch; // note: not "char ch;"
while ((ch=getchar())!=EOF) { /* do something */ }

Don’t do two consecutive ungetc()s on a stream. The result of that is undefined and non-portable.

There are many more stdio functions; see a good C textbook (e.g., “K&R,”) if you need to know more.

43.4. C-Style Strings

A C-style string is a zero-terminated array of char. This notion of a string is supported by a set of functions defined in <cstring> (or <string.h>; note: not <string>) and <cstdlib>. These functions operate on C-style strings through char* pointers (const char* pointers for memory that is only read, but not unsigned char* pointers):

Note that in C++, strchr() and strstr() are duplicated to make them type-safe (they can’t turn a const char* into a char* the way the C equivalents can). See also §36.3.2, §36.3.3, and §36.3.7.

The conversions to floating-point values set errno to ERANGE (§40.3) if their result doesn’t fit into the target type. See also §36.3.5.

43.5. Memory

The memory manipulation functions operate on “raw memory” (no type known) through void* pointers (const void* pointers for memory that is only read):

Note that malloc(), etc., does not invoke constructors and free() doesn’t invoke destructors. Do not use these functions for types with constructors or destructors. Also, memset() should never be used for any type with a constructor.

Note that realloc(p,n) will reallocate (that is, copy) the data stored, from p onward, when it needs more memory than is avaliable starting from p. For example:

int max = 1024;
char* p = static_cast<char*>(malloc(max));
char* current_word = nullptr;
bool in_word = false;
int i=0;
while (cin.get(&p[i]) {
if (isletter(p[i])) {
if (!in_word)
current_word = p;
in_word = true;
}
else
in_word = false;
if (++i==max)
p = static_cast<char*>(realloc(p,max*=2)); // double allocation
// ...
}

I hope you spotted the nasty bug: if realloc() was called, current_word may (may not) point to a location outside the current allocation pointed to by p.

Most uses of realloc() are better done using a vector (§31.4.1). The mem* functions are found in <cstring> and the allocation functions in <cstdlib>.

43.6. Date and Time

In <ctime>, you can find several types and functions related to date and time:

The struct tm is defined like this:

struct tm {
int tm_sec; // second of minute [0:61]; 60 and 61 represent leap seconds
int tm_min; // minute of hour [0:59]
int tm_hour; // hour of day [0:23]
int tm_mday; // day of month [1:31]
int tm_mon; // month of year [0:11]; 0 means January (note: not [1:12])
int tm_year; // year since 1900; 0 means year 1900, and 115 means 2015
int tm_wday; // days since Sunday [0:6]; 0 means Sunday
int tm_yday; // days since January 1 [0:365]; 0 means January 1
int tm_isdst; // hours of daylight savings time
};

A system clock is supported by the function clock() supported by a few functions giving meaning to its return type clock_t:

A example of the result of a call of asctime() is

"Sun Sep 16 01:03:52 1973\n"

Here is an example of how clock() can be used to time a function:

int main(int argc, char* argv[])
{
int n = atoi(argv[1]);

clock_t t1 = clock();
if (t1 == clock_t(–1)) { // clock_t(-1) means "clock() didn't work"
cerr << "sorry, no clock\n";
exit(1);
}

for (int i = 0; i<n; i++)
do_something(); // timing loop
clock_t t2 = clock();
if (t2 == clock_t(–1)) {
cerr << "sorry, clock overflow\n";
exit(2);
}
cout << "do_something() " << n << " times took "
<< double(t2–t1)/CLOCKS_PER_SEC " seconds"
<< " (measurement granularity: " CLOCKS_PER_SEC
<< " of a second)\n";
}

The explicit conversion double(t2–t1) before dividing is necessary because clock_t might be an integer. For values t1 and t2 returned by clock(), double(t2–t1)/CLOCKS_PER_SEC is the system’s best approximation of the time in seconds between the two calls.

Compare <ctime> with the facilities provided in <chrono>; see §35.2.

If clock() isn’t provided for a processor or if a time interval is too long to measure, clock() returns clock_t(–1) .

The strftime() function uses a printf() format string to control the output of a tm. For example:

void almost_C()
{
const int max = 80;
char str[max];
time_t t = time(nullptr);
tm* pt = localtime(&t);
strftime(str,max,"%D, %H:%M (%I:%M%p)\n",pt);
printf(str);
}

The output is something like:

06/28/12, 15:38 (03:38PM)

The strftime() formatting characters almost constitute a small programming language:

The locale referred to is the program’s global locale.

Some conversion specifiers can be modified by an E or O modifier, indicating alternative implementation-specific and locale-specific formatting. For example:

The strftime() is used by the put_time facet (§39.4.4.1).

For C++-style time facilities, see §35.2.

43.7. Etc.

In <cstdlib> we find:

The comparison function (cmp) used by qsort() and bsort() must have the type

int (*cmp)(const void* p, const void* q);

That is, no type information is known to the sort functions that simply “see” their array arguments as sequences of bytes. The integer returned is

• Negative if *p is considered less than *q

• Zero if *p is considered equal to *q

• Positive if *p is considered greater than *q

This differs from sort(), which uses a conventional <.

Note that exit() and abort() do not invoke destructors. If you want destructors called for constructed objects, throw an exception (§13.5.1).

Similarly, longjmp() from <csetjmp> is a nonlocal goto that unravels the stack until it finds the result of a matching setjmp(). It does not invoke destructors. Its behavior is undefined if a destructor would be invoked by a throw from the same point of a program. Never use setjmp() in a C++ program.

For more C standard library functions see [Kernighan,1988] or some other reputable C language reference.

In <cstdint>, we find int_fast16_t and other standard integer aliases:

Also in <cstdint>, we find type aliases for the largest signed and unsigned integer types for an implementation. For example:

typedef long long intmax_t; // largest signed integer type
typedef unsigned long long uintmax_t; // largest unsigned integer type

43.8. Advice

[1] Use fstreams rather than fopen()/fclose() if you worry about resource leaks; §43.2.

[2] Prefer <iostream> to <stdlib> for reasons of type safety and extensibility; §43.3.

[3] Never use gets() or scanf("%s",s); §43.3.

[4] Prefer <string> to <cstring> for reasons of ease of use and simplicity of resource management; §43.4.

[5] Use the C memory management routines, such as memcpy(), only for raw memory; §43.5.

[6] Prefer vector to uses of malloc() and realloc(); §43.5.

[7] Beware that the C standard library does not know about constructors and destructors; §43.5.

[8] Prefer <chrono> to <ctime> for timing; §43.6.

[9] For flexibility, ease of use, and performance, prefer sort() over qsort(); §43.7.

[10] Don’t use exit(); instead, throw an exception; §43.7.

[11] Don’t use longjmp(); instead, throw an exception; §43.7.