Running awk and gawk - The awk Language - Effective awk Programming (2015)

Effective awk Programming (2015)

Part I. The awk Language

Chapter 2. Running awk and gawk

This chapter covers how to run awk, both POSIX-standard and gawk-specific command-line options, and what awk and gawk do with nonoption arguments. It then proceeds to cover how gawk searches for source files, reading standard input along with other files, gawk’s environment variables, gawk’s exit status, using include files, and obsolete and undocumented options and/or features.

Many of the options and features described here are discussed in more detail later in the book; feel free to skip over things in this chapter that don’t interest you right now.

Invoking awk

There are two ways to run awk—with an explicit program or with one or more program files. Here are templates for both of them; items enclosed in [...] in these templates are optional:

awk [options] -f progfile [--] file
awk [options] [--] 'program' file

In addition to traditional one-letter POSIX-style options, gawk also supports GNU long options.

It is possible to invoke awk with an empty program:

awk '' datafile1 datafile2

Doing so makes little sense, though; awk exits silently when given an empty program. (d.c.) If --lint has been specified on the command line, gawk issues a warning that the program is empty.

Command-Line Options

Options begin with a dash and consist of a single character. GNU-style long options consist of two dashes and a keyword. The keyword can be abbreviated, as long as the abbreviation allows the option to be uniquely identified. If the option takes an argument, either the keyword is immediately followed by an equals sign (‘=’) and the argument’s value, or the keyword and the argument’s value are separated by whitespace. If a particular option with a value is given more than once, it is the last value that counts.

Each long option for gawk has a corresponding POSIX-style short option. The long and short options are interchangeable in all contexts. The following list describes options mandated by the POSIX standard:

-F fs
--field-separator fs

Set the FS variable to fs (see Specifying How Fields Are Separated).

-f source-file
--file source-file

Read the awk program source from source-file instead of in the first nonoption argument. This option may be given multiple times; the awk program consists of the concatenation of the contents of each specified source-file.

-v var=val
--assign var=val

Set the variable var to the value val before execution of the program begins. Such variable values are available inside the BEGIN rule (see Other Command-Line Arguments).

The -v option can only set one variable, but it can be used more than once, setting another variable each time, like this: ‘awk -v foo=1 -v bar=2 …’.

CAUTION

Using -v to set the values of the built-in variables may lead to surprising results. awk will reset the values of those variables as it needs to, possibly ignoring any initial value you may have given.

-W gawk-opt

Provide an implementation-specific option. This is the POSIX convention for providing implementation-specific options. These options also have corresponding GNU-style long options. Note that the long options may be abbreviated, as long as the abbreviations remain unique. The full list of gawk-specific options is provided next.

--

Signal the end of the command-line options. The following arguments are not treated as options even if they begin with ‘-’. This interpretation of -- follows the POSIX argument parsing conventions.

This is useful if you have filenames that start with ‘-’, or in shell scripts, if you have filenames that will be specified by the user that could start with ‘-’. It is also useful for passing options on to the awk program; see Processing Command-Line Options.

The following list describes gawk-specific options:

-b
--characters-as-bytes

Cause gawk to treat all input data as single-byte characters. In addition, all output written with print or printf is treated as single-byte characters.

Normally, gawk follows the POSIX standard and attempts to process its input data according to the current locale (see Where You Are Makes a Difference). This can often involve converting multibyte characters into wide characters (internally), and can lead to problems or confusion if the input data does not contain valid multibyte characters. This option is an easy way to tell gawk, “Hands off my data!”

-c
--traditional

Specify compatibility mode, in which the GNU extensions to the awk language are disabled, so that gawk behaves just like BWK awk. See Extensions in gawk Not in POSIX awk, which summarizes the extensions.

-C
--copyright

Print the short version of the General Public License and then exit.

-d[file]
--dump-variables[=file]

Print a sorted list of global variables, their types, and final values to file. If no file is provided, print this list to a file named awkvars.out in the current directory. No space is allowed between the -d and file, if file is supplied.

Having a list of all global variables is a good way to look for typographical errors in your programs. You would also use this option if you have a large program with a lot of functions, and you want to be sure that your functions don’t inadvertently use global variables that you meant to be local. (This is a particularly easy mistake to make with simple variable names like i, j, etc.)

-D[file]
--debug[=file]

Enable debugging of awk programs (see Introduction to the gawk Debugger). By default, the debugger reads) commands interactively from the keyboard (standard input). The optional file argument allows you to specify a file with a list of commands for the debugger to execute noninteractively. No space is allowed between the -D and file, if file is supplied.

-e program-text
--source program-text

Provide program source code in the program-text. This option allows you to mix source code in files with source code that you enter on the command line. This is particularly useful when you have library functions that you want to use from your command-line programs (see The AWKPATH Environment Variable).

-E file
--exec file

Similar to -f, read awk program text from file. There are two differences from -f:

§ This option terminates option processing; anything else on the command line is passed on directly to the awk program.

§ Command-line variable assignments of the form ‘var=value’ are disallowed.

This option is particularly necessary for World Wide Web CGI applications that pass arguments through the URL; using this option prevents a malicious (or other) user from passing in options, assignments, or awk source code (via -e) to the CGI application.[9] This option should be used with ‘#!’ scripts (see Executable awk Programs), like so:

#! /usr/local/bin/gawk -E

awk program here …

-g
--gen-pot

Analyze the source program and generate a GNU gettext portable object template file on standard output for all string constants that have been marked for translation. See Chapter 13 for information about this option.

-h
--help

Print a “usage” message summarizing the short- and long-style options that gawk accepts and then exit.

-i source-file
--include source-file

Read an awk source library from source-file. This option is completely equivalent to using the @include directive inside your program. It is very similar to the -f option, but there are two important differences. First, when -i is used, the program source is not loaded if it has been previously loaded, whereas with -f, gawk always loads the file. Second, because this option is intended to be used with code libraries, gawk does not recognize such files as constituting main program input. Thus, after processing an -i argument, gawk still expects to find the main source code via the -f option or on the command line.

-l ext
--load ext

Load a dynamic extension named ext. Extensions are stored as system shared libraries. This option searches for the library using the AWKLIBPATH environment variable. The correct library suffix for your platform will be supplied by default, so it need not be specified in the extension name. The extension initialization routine should be named dl_load(). An alternative is to use the @load keyword inside the program to load a shared library. This advanced feature is described in detail in Chapter 16.

-L[value]
--lint[=value]

Warn about constructs that are dubious or nonportable to other awk implementations. No space is allowed between the -L and value, if value is supplied. Some warnings are issued when gawk first reads your program. Others are issued at runtime, as your program executes. With an optional argument of ‘fatal’, lint warnings become fatal errors. This may be drastic, but its use will certainly encourage the development of cleaner awk programs. With an optional argument of ‘invalid’, only warnings about things that are actually invalid are issued. (This is not fully implemented yet.)

Some warnings are only printed once, even if the dubious constructs they warn about occur multiple times in your awk program. Thus, when eliminating problems pointed out by --lint, you should take care to search for all occurrences of each inappropriate construct. As awk programs are usually short, doing so is not burdensome.

-M
--bignum

Force arbitrary-precision arithmetic on numbers. This option has no effect if gawk is not compiled to use the GNU MPFR and MP libraries (see Chapter 15).

-n
--non-decimal-data

Enable automatic interpretation of octal and hexadecimal values in input data (see Allowing Nondecimal Input Data).

CAUTION

This option can severely break old programs. Use with care. Also note that this option may disappear in a future version of gawk.

-N
--use-lc-numeric

Force the use of the locale’s decimal point character when parsing numeric input data (see Where You Are Makes a Difference).

-o[file]
--pretty-print[=file]

Enable pretty-printing of awk programs. By default, the output program is created in a file named awkprof.out (see Profiling Your awk Programs). The optional file argument allows you to specify a different filename for the output. No space is allowed between the -o and file, iffile is supplied.

NOTE

Due to the way gawk has evolved, with this option your program still executes. This will change in the next major release, such that gawk will only pretty-print the program and not run it.

-O
--optimize

Enable some optimizations on the internal representation of the program. At the moment, this includes just simple constant folding.

-p[file]
--profile[=file]

Enable profiling of awk programs (see Profiling Your awk Programs). By default, profiles are created in a file named awkprof.out. The optional file argument allows you to specify a different filename for the profile file. No space is allowed between the -p and file, if file is supplied.

The profile contains execution counts for each statement in the program in the left margin, and function call counts for each function.

-P
--posix

Operate in strict POSIX mode. This disables all gawk extensions (just like --traditional) and disables all extensions not allowed by POSIX. See Common Extensions Summary for a summary of the extensions in gawk that are disabled by this option. Also, the following additional restrictions apply:

§ Newlines do not act as whitespace to separate fields when FS is equal to a single space (see Examining Fields).

§ Newlines are not allowed after ‘?’ or ‘:’ (see Conditional Expressions).

§ Specifying ‘-Ft’ on the command line does not set the value of FS to be a single TAB character (see Specifying How Fields Are Separated).

§ The locale’s decimal point character is used for parsing input data (see Where You Are Makes a Difference).

If you supply both --traditional and --posix on the command line, --posix takes precedence. gawk issues a warning if both options are supplied.

-r
--re-interval

Allow interval expressions (see Regular Expression Operators) in regexps. This is now gawk’s default behavior. Nevertheless, this option remains (both for backward compatibility and for use in combination with --traditional).

-S
--sandbox

Disable the system() function, input redirections with getline, output redirections with print and printf, and dynamic extensions. This is particularly useful when you want to run awk scripts from questionable sources and need to make sure the scripts can’t access your system (other than the specified input datafile).

-t
--lint-old

Warn about constructs that are not available in the original version of awk from Version 7 Unix (see Major Changes Between V7 and SVR3.1).

-V
--version

Print version information for this particular copy of gawk. This allows you to determine if your copy of gawk is up to date with respect to whatever the Free Software Foundation is currently distributing. It is also useful for bug reports (see Reporting Problems and Bugs).

As long as program text has been supplied, any other options are flagged as invalid with a warning message but are otherwise ignored.

In compatibility mode, as a special case, if the value of fs supplied to the -F option is ‘t’, then FS is set to the TAB character ("\t"). This is true only for --traditional and not for --posix (see Specifying How Fields Are Separated).

The -f option may be used more than once on the command line. If it is, awk reads its program source from all of the named files, as if they had been concatenated together into one big file. This is useful for creating libraries of awk functions. These functions can be written once and then retrieved from a standard place, instead of having to be included in each individual program. The -i option is similar in this regard. (As mentioned in Function Definition Syntax, function names must be unique.)

With standard awk, library functions can still be used, even if the program is entered at the keyboard, by specifying ‘-f /dev/tty’. After typing your program, type Ctrl-d (the end-of-file character) to terminate it. (You may also use ‘-f -’ to read program source from the standard input, but then you will not be able to also use the standard input as a source of data.)

Because it is clumsy using the standard awk mechanisms to mix source file and command-line awk programs, gawk provides the -e option. This does not require you to preempt the standard input for your source code; it allows you to easily mix command-line and library source code (seeThe AWKPATH Environment Variable). As with -f, the -e and -i options may also be used multiple times on the command line.

If no -f or -e option is specified, then gawk uses the first nonoption command-line argument as the text of the program source code.

If the environment variable POSIXLY_CORRECT exists, then gawk behaves in strict POSIX mode, exactly as if you had supplied --posix. Many GNU programs look for this environment variable to suppress extensions that conflict with POSIX, but gawk behaves differently: it suppresses all extensions, even those that do not conflict with POSIX, and behaves in strict POSIX mode. If --lint is supplied on the command line and gawk turns on POSIX mode because of POSIXLY_CORRECT, then it issues a warning message indicating that POSIX mode is in effect. You would typically set this variable in your shell’s startup file. For a Bourne-compatible shell (such as Bash), you would add these lines to the .profile file in your home directory:

POSIXLY_CORRECT=true

export POSIXLY_CORRECT

For a C shell-compatible shell,[10] you would add this line to the .login file in your home directory:

setenv POSIXLY_CORRECT true

Having POSIXLY_CORRECT set is not recommended for daily use, but it is good for testing the portability of your programs to other environments.

Other Command-Line Arguments

Any additional arguments on the command line are normally treated as input files to be processed in the order specified. However, an argument that has the form var=value, assigns the value value to the variable var—it does not specify a file at all. (See Assigning variables on the command line.) In the following example, count=1 is a variable assignment, not a filename:

awk -f program.awk file1 count=1 file2

All the command-line arguments are made available to your awk program in the ARGV array (see Predefined Variables). Command-line options and the program text (if present) are omitted from ARGV. All other arguments, including variable assignments, are included. As each element ofARGV is processed, gawk sets ARGIND to the index in ARGV of the current element.

Changing ARGC and ARGV in your awk program lets you control how awk processes the input files; this is described in more detail in Using ARGC and ARGV.

The distinction between filename arguments and variable-assignment arguments is made when awk is about to open the next input file. At that point in execution, it checks the filename to see whether it is really a variable assignment; if so, awk sets the variable instead of reading a file.

Therefore, the variables actually receive the given values after all previously specified files have been read. In particular, the values of variables assigned in this fashion are not available inside a BEGIN rule (see The BEGIN and END Special Patterns), because such rules are run before awkbegins scanning the argument list.

The variable values given on the command line are processed for escape sequences (see Escape Sequences). (d.c.)

In some very early implementations of awk, when a variable assignment occurred before any filenames, the assignment would happen before the BEGIN rule was executed. awk’s behavior was thus inconsistent; some command-line assignments were available inside the BEGIN rule, while others were not. Unfortunately, some applications came to depend upon this “feature.” When awk was changed to be more consistent, the -v option was added to accommodate applications that depended upon the old behavior.

The variable assignment feature is most useful for assigning to variables such as RS, OFS, and ORS, which control input and output formats, before scanning the datafiles. It is also useful for controlling state if multiple passes are needed over a datafile. For example:

awk 'pass == 1 { pass 1 stuff }

pass == 2 { pass 2 stuff }' pass=1 mydata pass=2 mydata

Given the variable assignment feature, the -F option for setting the value of FS is not strictly necessary. It remains for historical compatibility.

Naming Standard Input

Often, you may wish to read standard input together with other files. For example, you may wish to read one file, read standard input coming from a pipe, and then read another file.

The way to name the standard input, with all versions of awk, is to use a single, standalone minus sign or dash, ‘-’. For example:

some_command | awk -f myprog.awk file1 - file2

Here, awk first reads file1, then it reads the output of some_command, and finally it reads file2.

You may also use "-" to name standard input when reading files with getline (see Using getline from a File).

In addition, gawk allows you to specify the special filename /dev/stdin, both on the command line and with getline. Some other versions of awk also support this, but it is not standard. (Some operating systems provide a /dev/stdin file in the filesystem; however, gawk always processes this filename itself.)

The Environment Variables gawk Uses

A number of environment variables influence how gawk behaves.

The AWKPATH Environment Variable

In most awk implementations, you must supply a precise pathname for each program file, unless the file is in the current directory. But with gawk, if the filename supplied to the -f or -i options does not contain a directory separator ‘/’, then gawk searches a list of directories (called thesearch path) one by one, looking for a file with the specified name.

The search path is a string consisting of directory names separated by colons.[11] gawk gets its search path from the AWKPATH environment variable. If that variable does not exist, or if it has an empty value, gawk uses a default path (described shortly).

The search path feature is particularly helpful for building libraries of useful awk functions. The library files can be placed in a standard directory in the default path and then specified on the command line with a short filename. Otherwise, you would have to type the full filename for each file.

By using the -i or -f options, your command-line awk programs can use facilities in awk library files (see Chapter 10). Path searching is not done if gawk is in compatibility mode. This is true for both --traditional and --posix. See Command-Line Options.

If the source code file is not found after the initial search, the path is searched again after adding the suffix ‘.awk’ to the filename.

gawk’s path search mechanism is similar to the shell’s. (See The Bourne-Again SHell manual.) It treats a null entry in the path as indicating the current directory. (A null entry is indicated by starting or ending the path with a colon or by placing two colons next to each other [‘::’].)

NOTE

To include the current directory in the path, either place . as an entry in the path or write a null entry in the path.

Different past versions of gawk would also look explicitly in the current directory, either before or after the path search. As of version 4.1.2, this no longer happens; if you wish to look in the current directory, you must include . either as a separate entry or as a null entry in the search path.

The default value for AWKPATH is ‘.:/usr/local/share/awk’.[12] Since . is included at the beginning, gawk searches first in the current directory and then in /usr/local/share/awk. In practice, this means that you will rarely need to change the value of AWKPATH.

gawk places the value of the search path that it used into ENVIRON["AWKPATH"]. This provides access to the actual search path value from within an awk program.

Although you can change ENVIRON["AWKPATH"] within your awk program, this has no effect on the running program’s behavior. This makes sense: the AWKPATH environment variable is used to find the program source files. Once your program is running, all the files have been found, andgawk no longer needs to use AWKPATH.

The AWKLIBPATH Environment Variable

The AWKLIBPATH environment variable is similar to the AWKPATH variable, but it is used to search for loadable extensions (stored as system shared libraries) specified with the -l option rather than for source files. If the extension is not found, the path is searched again after adding the appropriate shared library suffix for the platform. For example, on GNU/Linux systems, the suffix ‘.so’ is used. The search path specified is also used for extensions loaded via the @load keyword (see Loading Dynamic Extensions into Your Program).

If AWKLIBPATH does not exist in the environment, or if it has an empty value, gawk uses a default path; this is typically ‘/usr/local/lib/gawk’, although it can vary depending upon how gawk was built.

gawk places the value of the search path that it used into ENVIRON["AWKLIBPATH"]. This provides access to the actual search path value from within an awk program.

Other Environment Variables

A number of other environment variables affect gawk’s behavior, but they are more specialized. Those in the following list are meant to be used by regular users:

GAWK_MSEC_SLEEP

Specifies the interval between connection retries, in milliseconds. On systems that do not support the usleep() system call, the value is rounded up to an integral number of seconds.

GAWK_READ_TIMEOUT

Specifies the time, in milliseconds, for gawk to wait for input before returning with an error. See Reading Input with a Timeout.

GAWK_SOCK_RETRIES

Controls the number of times gawk attempts to retry a two-way TCP/IP (socket) connection before giving up. See Using gawk for Network Programming.

POSIXLY_CORRECT

Causes gawk to switch to POSIX-compatibility mode, disabling all traditional and GNU extensions. See Command-Line Options.

The environment variables in the following list are meant for use by the gawk developers for testing and tuning. They are subject to change. The variables are:

AWKBUFSIZE

This variable only affects gawk on POSIX-compliant systems. With a value of ‘exact’, gawk uses the size of each input file as the size of the memory buffer to allocate for I/O. Otherwise, the value should be a number, and gawk uses that number as the size of the buffer to allocate. (When this variable is not set, gawk uses the smaller of the file’s size and the “default” blocksize, which is usually the filesystem’s I/O blocksize.)

AWK_HASH

If this variable exists with a value of ‘gst’, gawk switches to using the hash function from GNU Smalltalk for managing arrays. This function may be marginally faster than the standard function.

AWKREADFUNC

If this variable exists, gawk switches to reading source files one line at a time, instead of reading in blocks. This exists for debugging problems on filesystems on non-POSIX operating systems where I/O is performed in records, not in blocks.

GAWK_MSG_SRC

If this variable exists, gawk includes the filename and line number within the gawk source code from which warning and/or fatal messages are generated. Its purpose is to help isolate the source of a message, as there are multiple places that produce the same warning or error message.

GAWK_NO_DFA

If this variable exists, gawk does not use the DFA regexp matcher for “does it match” kinds of tests. This can cause gawk to be slower. Its purpose is to help isolate differences between the two regexp matchers that gawk uses internally. (There aren’t supposed to be differences, but occasionally theory and practice don’t coordinate with each other.)

GAWK_NO_PP_RUN

When gawk is invoked with the --pretty-print option, it will not run the program if this environment variable exists.

CAUTION

This variable will not survive into the next major release.

GAWK_STACKSIZE

This specifies the amount by which gawk should grow its internal evaluation stack, when needed.

INT_CHAIN_MAX

This specifies the intended maximum number of items gawk will maintain on a hash chain for managing arrays indexed by integers.

STR_CHAIN_MAX

This specifies the intended maximum number of items gawk will maintain on a hash chain for managing arrays indexed by strings.

TIDYMEM

If this variable exists, gawk uses the mtrace() library calls from the GNU C Library to help track down possible memory leaks.

gawk’s Exit Status

If the exit statement is used with a value (see The exit Statement), then gawk exits with the numeric value given to it.

Otherwise, if there were no problems during execution, gawk exits with the value of the C constant EXIT_SUCCESS. This is usually zero.

If an error occurs, gawk exits with the value of the C constant EXIT_FAILURE. This is usually one.

If gawk exits because of a fatal error, the exit status is two. On non-POSIX systems, this value may be mapped to EXIT_FAILURE.

Including Other Files into Your Program

This section describes a feature that is specific to gawk.

The @include keyword can be used to read external awk source files. This gives you the ability to split large awk source files into smaller, more manageable pieces, and also lets you reuse common awk code from various awk scripts. In other words, you can group together awk functions used to carry out specific tasks into external files. These files can be used just like function libraries, using the @include keyword in conjunction with the AWKPATH environment variable. Note that source files may also be included using the -i option.

Let’s see an example. We’ll start with two (trivial) awk scripts, namely test1 and test2. Here is the test1 script:

BEGIN {

print "This is script test1."

}

and here is test2:

@include "test1"

BEGIN {

print "This is script test2."

}

Running gawk with test2 produces the following result:

$ gawk -f test2

This is script test1.

This is script test2.

gawk runs the test2 script, which includes test1 using the @include keyword. So, to include external awk source files, you just use @include followed by the name of the file to be included, enclosed in double quotes.

NOTE

Keep in mind that this is a language construct and the filename cannot be a string variable, but rather just a literal string constant in double quotes.

The files to be included may be nested; e.g., given a third script, namely test3:

@include "test2"

BEGIN {

print "This is script test3."

}

Running gawk with the test3 script produces the following results:

$ gawk -f test3

This is script test1.

This is script test2.

This is script test3.

The filename can, of course, be a pathname. For example:

@include "../io_funcs"

and:

@include "/usr/awklib/network"

are both valid. The AWKPATH environment variable can be of great value when using @include. The same rules for the use of the AWKPATH variable in command-line file searches (see The AWKPATH Environment Variable) apply to @include also.

This is very helpful in constructing gawk function libraries. If you have a large script with useful, general-purpose awk functions, you can break it down into library files and put those files in a special directory. You can then include those “libraries,” either by using the full pathnames of the files, or by setting the AWKPATH environment variable accordingly and then using @include with just the file part of the full pathname. Of course, you can keep library files in more than one directory; the more complex the working environment is, the more directories you may need to organize the files to be included.

Given the ability to specify multiple -f options, the @include mechanism is not strictly necessary. However, the @include keyword can help you in constructing self-contained gawk programs, thus reducing the need for writing complex and tedious command lines. In particular,@include is very useful for writing CGI scripts to be run from web pages.

As mentioned in The AWKPATH Environment Variable, the current directory is always searched first for source files, before searching in AWKPATH; this also applies to files named with @include.

Loading Dynamic Extensions into Your Program

This section describes a feature that is specific to gawk.

The @load keyword can be used to read external awk extensions (stored as system shared libraries). This allows you to link in compiled code that may offer superior performance and/or give you access to extended capabilities not supported by the awk language. The AWKLIBPATH variable is used to search for the extension. Using @load is completely equivalent to using the -l command-line option.

If the extension is not initially found in AWKLIBPATH, another search is conducted after appending the platform’s default shared library suffix to the filename. For example, on GNU/Linux systems, the suffix ‘.so’ is used:

$ gawk '@load "ordchr"; BEGIN {print chr(65)}'

A

This is equivalent to the following example:

$ gawk -lordchr 'BEGIN {print chr(65)}'

A

For command-line usage, the -l option is more convenient, but @load is useful for embedding inside an awk source file that requires access to an extension.

Chapter 16 describes how to write extensions (in C or C++) that can be loaded with either @load or the -l option. It also describes the ordchr extension.

Obsolete Options and/or Features

This section describes features and/or command-line options from previous releases of gawk that either are not available in the current version or are still supported but deprecated (meaning that they will not be in the next release).

The process-related special files /dev/pid, /dev/ppid, /dev/pgrpid, and /dev/user were deprecated in gawk 3.1, but still worked. As of version 4.0, they are no longer interpreted specially by gawk. (Use PROCINFO instead; see Built-in Variables That Convey Information.)

Undocumented Options and Features

Use the Source, Luke!

—Obi-Wan

This section intentionally left blank.

Summary

§ Use either ‘awk 'program' files’ or ‘awk -f program-file files’ to run awk.

§ The three standard options for all versions of awk are -f, -F, and -v. gawk supplies these and many others, as well as corresponding GNU-style long options.

§ Nonoption command-line arguments are usually treated as filenames, unless they have the form ‘var=value’, in which case they are taken as variable assignments to be performed at that point in processing the input.

§ All nonoption command-line arguments, excluding the program text, are placed in the ARGV array. Adjusting ARGC and ARGV affects how awk processes input.

§ You can use a single minus sign (‘-’) to refer to standard input on the command line. gawk also lets you use the special filename /dev/stdin.

§ gawk pays attention to a number of environment variables. AWKPATH, AWKLIBPATH, and POSIXLY_CORRECT are the most important ones.

§ gawk’s exit status conveys information to the program that invoked it. Use the exit statement from within an awk program to set the exit status.

§ gawk allows you to include other awk source files into your program using the @include statement and/or the -i and -f command-line options.

§ gawk allows you to load additional functions written in C or C++ using the @load statement and/or the -l option. (This advanced feature is described later, in Chapter 16.)


[9] For more detail, please see Section 4.4 of RFC 3875. Also see the explanatory note sent to the gawk bug mailing list.

[10] Not recommended.

[11] Semicolons on MS-Windows and MS-DOS.

[12] Your version of gawk may use a different directory; it will depend upon how gawk was built and installed. The actual directory is the value of $(datadir) generated when gawk was configured. You probably don’t need to worry about this, though.