Installing gawk - Appendices - Effective awk Programming (2015)

Effective awk Programming (2015)

Part IV. Appendices

Appendix B. Installing gawk

This appendix provides instructions for installing gawk on the various platforms that are supported by the developers. The primary developer supports GNU/Linux (and Unix), whereas the other ports are contributed. See Reporting Problems and Bugs for the email addresses of the people who maintain the respective ports.

The gawk Distribution

This section describes how to get the gawk distribution, how to extract it, and then what is in the various files and subdirectories.

Getting the gawk Distribution

There are two ways to get GNU software:

§ Copy it from someone else who already has it.

§ Retrieve gawk from the Internet host ftp.gnu.org, in the directory /gnu/gawk. Both anonymous ftp and http access are supported. If you have the wget program, you can use a command like the following:

wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.2.tar.gz

The GNU software archive is mirrored around the world. The up-to-date list of mirror sites is available from the main FSF website. Try to use one of the mirrors; they will be less busy, and you can usually find one closer to your site.

Extracting the Distribution

gawk is distributed as several tar files compressed with different compression programs: gzip, bzip2, and xz. For simplicity, the rest of these instructions assume you are using the one compressed with the GNU Gzip program (gzip).

Once you have the distribution (e.g., gawk-4.1.2.tar.gz), use gzip to expand the file and then use tar to extract it. You can use the following pipeline to produce the gawk distribution:

gzip -d -c gawk-4.1.2.tar.gz | tar -xvpf -

On a system with GNU tar, you can let tar do the decompression for you:

tar -xvpzf gawk-4.1.2.tar.gz

Extracting the archive creates a directory named gawk-4.1.2 in the current directory.

The distribution filename is of the form gawk-V.R.P.tar.gz. The V represents the major version of gawk, the R represents the current release of version V, and the P represents a patch level, meaning that minor bugs have been fixed in the release. The current patch level is 2, but when retrieving distributions, you should get the version with the highest version, release, and patch level. (Note, however, that patch levels greater than or equal to 70 denote “beta” or nonproduction software; you might not want to retrieve such a version unless you don’t mind experimenting.) If you are not on a Unix or GNU/Linux system, you need to make other arrangements for getting and extracting the gawk distribution. You should consult a local expert.

Contents of the gawk Distribution

The gawk distribution has a number of C source files, documentation files, subdirectories, and files related to the configuration process (see Compiling and Installing gawk on Unix-Like Systems), as well as several subdirectories related to different non-Unix operating systems:

Various ‘.c’, ‘.y’, and ‘.h’ files

These files contain the actual gawk source code.

ABOUT-NLS

A file containing information about GNU gettext and translations.

AUTHORS

A file with some information about the authorship of gawk. It exists only to satisfy the pedants at the Free Software Foundation.

README
README_d/README.*

Descriptive files: README for gawk under Unix and the rest for the various hardware and software combinations.

INSTALL

A file providing an overview of the configuration and installation process.

ChangeLog

A detailed list of source code changes as bugs are fixed or improvements made.

ChangeLog.0

An older list of source code changes.

NEWS

A list of changes to gawk since the last release or patch.

NEWS.0

An older list of changes to gawk.

COPYING

The GNU General Public License.

POSIX.STD

A description of behaviors in the POSIX standard for awk that are left undefined, or where gawk may not comply fully, as well as a list of things that the POSIX standard should describe but does not.

doc/awkforai.txt

Pointers to the original draft of a short article describing why gawk is a good language for artificial intelligence (AI) programming.

doc/bc_notes

A brief description of gawk’s “byte code” internals.

doc/README.card
doc/ad.block
doc/awkcard.in
doc/cardfonts
doc/colors
doc/macros
doc/no.colors
doc/setter.outline

The troff source for a five-color awk reference card. A modern version of troff such as GNU troff (groff) is needed to produce the color version. See the file README.card for instructions if you have an older troff.

doc/gawk.1

The troff source for a manual page describing gawk. This is distributed for the convenience of Unix users.

doc/gawktexi.in
doc/sidebar.awk

The Texinfo source file for this book. It should be processed by doc/sidebar.awk before processing with texi2dvi or texi2pdf to produce a printed document, and with makeinfo to produce an Info or HTML file. The Makefile takes care of this processing and produces printable output via texi2dvi or texi2pdf.

doc/gawk.texi

The file produced after processing gawktexi.in with sidebar.awk.

doc/gawk.info

The generated Info file for this book.

doc/gawkinet.texi

The Texinfo source file for TCP/IP Internetworking with gawk. It should be processed with TeX (via texi2dvi or texi2pdf) to produce a printed document and with makeinfo to produce an Info or HTML file.

doc/gawkinet.info

The generated Info file for TCP/IP Internetworking with gawk.

doc/igawk.1

The troff source for a manual page describing the igawk program presented in An Easy Way to Use Library Functions.

doc/Makefile.in

The input file used during the configuration process to generate the actual Makefile for creating the documentation.

Makefile.am
*/Makefile.am

Files used by the GNU Automake software for generating the Makefile.in files used by Autoconf and configure.

Makefile.in
aclocal.m4
bisonfix.awk
config.guess
configh.in
configure.ac
configure
custom.h
depcomp
install-sh
missing_d/*
mkinstalldirs
m4/*

These files and subdirectories are used when configuring and compiling gawk for various Unix systems. Most of them are explained in Compiling and Installing gawk on Unix-Like Systems. The rest are there to support the main infrastructure.

po/*

The po library contains message translations.

awklib/extract.awk
awklib/Makefile.am
awklib/Makefile.in
awklib/eg/*

The awklib directory contains a copy of extract.awk (see Extracting Programs from Texinfo Source Files), which can be used to extract the sample programs from the Texinfo source file for this book. It also contains a Makefile.in file, which configure uses to generate aMakefile. Makefile.am is used by GNU Automake to create Makefile.in. The library functions from Chapter 10 and the igawk program from An Easy Way to Use Library Functions are included as ready-to-use files in the gawk distribution. They are installed as part of the installation process. The rest of the programs in this book are available in appropriate subdirectories of awklib/eg.

extension/*

The source code, manual pages, and infrastructure files for the sample extensions included with gawk. See Chapter 16 for more information.

posix/*

Files needed for building gawk on POSIX-compliant systems.

pc/*

Files needed for building gawk under MS-Windows (see Installation on PC Operating Systems for details).

vms/*

Files needed for building gawk under Vax/VMS and OpenVMS (see Compiling and Installing gawk on Vax/VMS and OpenVMS for details).

test/*

A test suite for gawk. You can use ‘make check’ from the top-level gawk directory to run your version of gawk against the test suite. If gawk successfully passes ‘make check’, then you can be confident of a successful port.

Compiling and Installing gawk on Unix-Like Systems

Usually, you can compile and install gawk by typing only two commands. However, if you use an unusual system, you may need to configure gawk for your system yourself.

Compiling gawk for Unix-Like Systems

The normal installation steps should work on all modern commercial Unix-derived systems, GNU/Linux, BSD-based systems, and the Cygwin environment for MS-Windows.

After you have extracted the gawk distribution, cd to gawk-4.1.2. As with most GNU software, you configure gawk for your system by running the configure program. This program is a Bourne shell script that is generated automatically using GNU Autoconf. (The Autoconf software is described fully in Autoconf—Generating Automatic Configuration Scripts, which can be found online at the Free Software Foundation’s website.)

To configure gawk, simply run configure:

sh ./configure

This produces a Makefile and config.h tailored to your system. The config.h file describes various facts about your system. You might want to edit the Makefile to change the CFLAGS variable, which controls the command-line options that are passed to the C compiler (such as optimization levels or compiling for debugging).

Alternatively, you can add your own values for most make variables on the command line, such as CC and CFLAGS, when running configure:

CC=cc CFLAGS=-g sh ./configure

See the file INSTALL in the gawk distribution for all the details.

After you have run configure and possibly edited the Makefile, type:

make

Shortly thereafter, you should have an executable version of gawk. That’s all there is to it! To verify that gawk is working properly, run ‘make check’. All of the tests should succeed. If these steps do not work, or if any of the tests fail, check the files in the README_d directory to see if you’ve found a known problem. If the failure is not described there, send in a bug report (see Reporting Problems and Bugs).

Of course, once you’ve built gawk, it is likely that you will wish to install it. To do so, you need to run the command ‘make install’, as a user with the appropriate permissions. How to do this varies by system, but on many systems you can use the sudo command to do so. The command then becomes ‘sudo make install’. It is likely that you will be asked for your password, and you will have to have been set up previously as a user who is allowed to run the sudo command.

Additional Configuration Options

There are several additional options you may use on the configure command line when compiling gawk from scratch, including:

--disable-extensions

Disable configuring and building the sample extensions in the extension directory. This is useful for cross-compiling. The default action is to dynamically check if the extensions can be configured and compiled.

--disable-lint

Disable all lint checking within gawk. The --lint and --lint-old options (see Command-Line Options) are accepted, but silently do nothing. Similarly, setting the LINT variable (see Built-in Variables That Control awk) has no effect on the running awk program.

When used with the GNU Compiler Collection’s (GCC’s) automatic dead code elimination, this option cuts almost 23K bytes off the size of the gawk executable on GNU/Linux x86_64 systems. Results on other systems and with other compilers are likely to vary. Using this option may bring you some slight performance improvement.

CAUTION

Using this option will cause some of the tests in the test suite to fail. This option may be removed at a later date.

--disable-nls

Disable all message-translation facilities. This is usually not desirable, but it may bring you some slight performance improvement.

--with-whiny-user-strftime

Force use of the included version of the C strftime() function for deficient systems.

Use the command ‘./configure --help’ to see the full list of options supplied by configure.

The Configuration Process

This section is of interest only if you know something about using the C language and Unix-like operating systems.

The source code for gawk generally attempts to adhere to formal standards wherever possible. This means that gawk uses library routines that are specified by the ISO C standard and by the POSIX operating system interface standard. The gawk source code requires using an ISO C compiler (the 1990 standard).

Many Unix systems do not support all of either the ISO or the POSIX standards. The missing_d subdirectory in the gawk distribution contains replacement versions of those functions that are most likely to be missing.

The config.h file that configure creates contains definitions that describe features of the particular operating system where you are attempting to compile gawk. The three things described by this file are: what header files are available, so that they can be correctly included, what (supposedly) standard functions are actually available in your C libraries, and various miscellaneous facts about your operating system. For example, there may not be an st_blksize element in the stat structure. In this case, ‘HAVE_STRUCT_STAT_ST_BLKSIZE’ is undefined.

It is possible for your C compiler to lie to configure. It may do so by not exiting with an error when a library function is not available. To get around this, edit the custom.h file. Use an ‘#ifdef’ that is appropriate for your system, and either #define any constants that configureshould have defined but didn’t, or #undef any constants that configure defined and should not have. The custom.h file is automatically included by the config.h file.

It is also possible that the configure program generated by Autoconf will not work on your system in some other fashion. If you do have a problem, the configure.ac file is the input for Autoconf. You may be able to change this file and generate a new version of configure that works on your system (see Reporting Problems and Bugs for information on how to report problems in configuring gawk). The same mechanism may be used to send in updates to configure.ac and/or custom.h.

Installation on Other Operating Systems

This section describes how to install gawk on various non-Unix systems.

Installation on PC Operating Systems

This section covers installation and usage of gawk on Intel architecture machines running MS-DOS and any version of MS-Windows. In this section, the term “Windows32” refers to any of Microsoft Windows 95/98/ME/NT/2000/XP/Vista/7/8.

The limitations of MS-DOS (and MS-DOS shells under the other operating systems) have meant that various “DOS extenders” are often used with programs such as gawk. The varying capabilities of Microsoft Windows 3.1 and Windows32 can add to the confusion. For an overview of the considerations, refer to README_d/README.pc in the distribution.

Compiling gawk for PC operating systems

gawk can be compiled for MS-DOS and Windows32 using the GNU development tools from DJ Delorie (DJGPP: MS-DOS only) or MinGW (Windows32). The file README_d/README.pc in the gawk distribution contains additional notes, and pc/Makefile contains important information on compilation options.

To build gawk for MS-DOS and Windows32, copy the files in the pc directory (except for ChangeLog) to the directory with the rest of the gawk sources, then invoke make with the appropriate target name as an argument to build gawk. The Makefile copied from the pc directory contains a configuration section with comments and may need to be edited in order to work with your make utility.

The Makefile supports a number of targets for building various MS-DOS and Windows32 versions. A list of targets is printed if the make command is given without a target. As an example, to build gawk using the DJGPP tools, enter ‘make djgpp’. (The DJGPP tools needed for the build may be found at ftp://ftp.delorie.com/pub/djgpp/current/v2gnu/.) To build a native MS-Windows binary of gawk using the MinGW tools, type ‘make mingw32’.

Testing gawk on PC operating systems

Using make to run the standard tests and to install gawk requires additional Unix-like tools, including sh, sed, and cp. In order to run the tests, the test/*.ok files may need to be converted so that they have the usual MS-DOS-style end-of-line markers. Alternatively, run make check CMP="diff -a" to use GNU diff in text mode instead of cmp to compare the resulting files.

Using gawk on PC operating systems

Under MS-DOS and MS-Windows, the Cygwin and MinGW environments support both the ‘|&’ operator and TCP/IP networking (see Using gawk for Network Programming).

The MS-DOS and MS-Windows versions of gawk search for program files as described in The AWKPATH Environment Variable. However, semicolons (rather than colons) separate elements in the AWKPATH variable. If AWKPATH is not set or is empty, then the default search path is ‘.;c:/lib/awk;c:/gnu/lib/awk’.

An sh-like shell (as opposed to command.com under MS-DOS or cmd.exe under MS-Windows) may be useful for awk programming. The DJGPP collection of tools includes an MS-DOS port of Bash.

Under MS-Windows and MS-DOS, gawk (and many other text programs) silently translates end-of-line ‘\r\n’ to ‘\n’ on input and ‘\n’ to ‘\r\n’ on output. A special BINMODE variable (c.e.) allows control over these translations and is interpreted as follows:

§ If BINMODE is "r" or one, then binary mode is set on read (i.e., no translations on reads).

§ If BINMODE is "w" or two, then binary mode is set on write (i.e., no translations on writes).

§ If BINMODE is "rw" or "wr" or three, binary mode is set for both read and write.

§ BINMODE=non-null-string is the same as ‘BINMODE=3’ (i.e., no translations on reads or writes). However, gawk issues a warning message if the string is not one of "rw" or "wr".

The modes for standard input and standard output are set one time only (after the command line is read, but before processing any of the awk program). Setting BINMODE for standard input or standard output is accomplished by using an appropriate ‘-v BINMODE=N ’ option on the command line. BINMODE is set at the time a file or pipe is opened and cannot be changed midstream.

The name BINMODE was chosen to match mawk (see Other Freely Available awk Implementations). mawk and gawk handle BINMODE similarly; however, mawk adds a ‘-W BINMODE=N ’ option and an environment variable that can set BINMODE, RS, and ORS. The files binmode[1-3].awk(under gnu/lib/awk in some of the prepared binary distributions) have been chosen to match mawk’s ‘-W BINMODE=N ’ option. These can be changed or discarded; in particular, the setting of RS giving the fewest “surprises” is open to debate. mawk uses ‘RS = "\r\n"’ if binary mode is set on read, which is appropriate for files with the MS-DOS-style end-of-line.

To illustrate, the following examples set binary mode on writes for standard output and other files, and set ORS as the “usual” MS-DOS-style end-of-line:

gawk -v BINMODE=2 -v ORS="\r\n" …

or:

gawk -v BINMODE=w -f binmode2.awk …

These give the same result as the ‘-W BINMODE=2’ option in mawk. The following changes the record separator to "\r\n" and sets binary mode on reads, but does not affect the mode on standard input:

gawk -v RS="\r\n" -e "BEGIN { BINMODE = 1 }" …

or:

gawk -f binmode1.awk …

With proper quoting, in the first example the setting of RS can be moved into the BEGIN rule.

Using gawk in the Cygwin environment

gawk can be built and used “out of the box” under MS-Windows if you are using the Cygwin environment. This environment provides an excellent simulation of GNU/Linux, using Bash, GCC, GNU Make, and other GNU programs. Compilation and installation for Cygwin is the same as for a Unix system:

tar -xvpzf gawk-4.1.2.tar.gz

cd gawk-4.1.2

./configure

make && make check

When compared to GNU/Linux on the same system, the ‘configure’ step on Cygwin takes considerably longer. However, it does finish, and then the ‘make’ proceeds as usual.

Using gawk in the MSYS environment

In the MSYS environment under MS-Windows, gawk automatically uses binary mode for reading and writing files. Thus, there is no need to use the BINMODE variable.

This can cause problems with other Unix-like components that have been ported to MS-Windows that expect gawk to do automatic translation of "\r\n", because it won’t.

Compiling and Installing gawk on Vax/VMS and OpenVMS

This subsection describes how to compile and install gawk under VMS. The older designation “VMS” is used throughout to refer to OpenVMS.

Compiling gawk on VMS

To compile gawk under VMS, there is a DCL command procedure that issues all the necessary CC and LINK commands. There is also a Makefile for use with the MMS and MMK utilities. From the source directory, use either:

$ @[.vms]vmsbuild.com

or:

$ MMS/DESCRIPTION=[.vms]descrip.mms gawk

or:

$ MMK/DESCRIPTION=[.vms]descrip.mms gawk

MMK is an open source, free, near-clone of MMS and can better handle ODS-5 volumes with upper- and lowercase filenames. MMK is available from https://github.com/endlesssoftware/mmk.

With ODS-5 volumes and extended parsing enabled, the case of the target parameter may need to be exact.

gawk has been tested under VAX/VMS 7.3 and Alpha/VMS 7.3-1 using Compaq C V6.4, and under Alpha/VMS 7.3, Alpha/VMS 7.3-2, and IA64/VMS 8.3. The most recent builds used HP C V7.3 on Alpha VMS 8.3 and both Alpha and IA64 VMS 8.4 used HP C 7.3.[107]

See The VMS GNV project for information on building gawk as a PCSI kit that is compatible with the GNV product.

Compiling gawk dynamic extensions on VMS

The extensions that have been ported to VMS can be built using one of the following commands:

$ MMS/DESCRIPTION=[.vms]descrip.mms extensions

or:

$ MMK/DESCRIPTION=[.vms]descrip.mms extensions

gawk uses AWKLIBPATH as either an environment variable or a logical name to find the dynamic extensions.

Dynamic extensions need to be compiled with the same compiler options for floating-point, pointer size, and symbol name handling as were used to compile gawk itself. Alpha and Itanium should use IEEE floating point. The pointer size is 32 bits, and the symbol name handling should be exact case with CRC shortening for symbols longer than 32 bits.

For Alpha and Itanium:

/name=(as_is,short)

/float=ieee/ieee_mode=denorm_results

For VAX:

/name=(as_is,short)

Compile-time macros need to be defined before the first VMS-supplied header file is included, as follows:

#if (__CRTL_VER >= 70200000) && !defined (__VAX)

#define _LARGEFILE 1

#endif

#ifndef __VAX

#ifdef __CRTL_VER

#if __CRTL_VER >= 80200000

#define _USE_STD_STAT 1

#endif

#endif

#endif

If you are writing your own extensions to run on VMS, you must supply these definitions yourself. The config.h file created when building gawk on VMS does this for you; if instead you use that file or a similar one, then you must remember to include it before any VMS-supplied header files.

Installing gawk on VMS

To use gawk, all you need is a “foreign” command, which is a DCL symbol whose value begins with a dollar sign. For example:

$ GAWK :== $disk1:[gnubin]gawk

Substitute the actual location of gawk.exe for ‘$disk1:[gnubin]’. The symbol should be placed in the login.com of any user who wants to run gawk, so that it is defined every time the user logs on. Alternatively, the symbol may be placed in the system-wide sylogin.com procedure, which allows all users to run gawk.

If your gawk was installed by a PCSI kit into the GNV$GNU: directory tree, the program will be known as GNV$GNU:[bin]gnv$gawk.exe and the help file will be GNV$GNU:[vms_help]gawk.hlp.

The PCSI kit also installs a GNV$GNU:[vms_bin]gawk_verb.cld file that can be used to add gawk and awk as DCL commands.

For just the current process you can use:

$ set command gnv$gnu:[vms_bin]gawk_verb.cld

Or the system manager can use GNV$GNU:[vms_bin]gawk_verb.cld to add the gawk and awk to the system-wide ‘DCLTABLES’.

The DCL syntax is documented in the gawk.hlp file.

Optionally, the gawk.hlp entry can be loaded into a VMS help library:

$ LIBRARY/HELP sys$help:helplib [.vms]gawk.hlp

(You may want to substitute a site-specific help library rather than the standard VMS library ‘HELPLIB’.) After loading the help text, the command:

$ HELP GAWK

provides information about both the gawk implementation and the awk programming language.

The logical name ‘AWK_LIBRARY’ can designate a default location for awk program files. For the -f option, if the specified filename has no device or directory path information in it, gawk looks in the current directory first, then in the directory specified by the translation of ‘AWK_LIBRARY’ if the file is not found. If, after searching in both directories, the file still is not found, gawk appends the suffix ‘.awk’ to the filename and retries the file search. If ‘AWK_LIBRARY’ has no definition, a default value of ‘SYS$LIBRARY:’ is used for it.

Running gawk on VMS

Command-line parsing and quoting conventions are significantly different on VMS, so examples in this book or from other sources often need minor changes. They are minor though, and all awk programs should run correctly.

Here are a couple of trivial tests:

$ gawk -- "BEGIN {print ""Hello, World!""}"

$ gawk -"W" version

! could also be -"W version" or "-W version"

Note that uppercase and mixed-case text must be quoted.

The VMS port of gawk includes a DCL-style interface in addition to the original shell-style interface (see the help entry for details). One side effect of dual command-line parsing is that if there is only a single parameter (as in the quoted string program), the command becomes ambiguous. To work around this, the normally optional -- flag is required to force Unix-style parsing rather than DCL parsing. If any other dash-type options (or multiple parameters such as datafiles to process) are present, there is no ambiguity and -- can be omitted.

The exit value is a Unix-style value and is encoded into a VMS exit status value when the program exits.

The VMS severity bits will be set based on the exit value. A failure is indicated by 1, and VMS sets the ERROR status. A fatal error is indicated by 2, and VMS sets the FATAL status. All other values will have the SUCCESS status. The exit value is encoded to comply with VMS coding standards and will have the C_FACILITY_NO of 0x350000 with the constant 0xA000 added to the number shifted over by 3 bits to make room for the severity codes.

To extract the actual gawk exit code from the VMS status, use:

unix_status = (vms_status .and. &x7f8) / 8

A C program that uses exec() to call gawk will get the original Unix-style exit value.

Older versions of gawk for VMS treated a Unix exit code 0 as 1, a failure as 2, a fatal error as 4, and passed all the other numbers through. This violated the VMS exit status coding requirements.

VAX/VMS floating point uses unbiased rounding. See Rounding Numbers.

VMS reports time values in GMT unless one of the SYS$TIMEZONE_RULE or TZ logical names is set. Older versions of VMS, such as VAX/VMS 7.3, do not set these logical names.

The default search path, when looking for awk program files specified by the -f option, is "SYS$DISK:[],AWK_LIBRARY:". The logical name AWKPATH can be used to override this default. The format of AWKPATH is a comma-separated list of directory specifications. When defining it, the value should be quoted so that it retains a single translation and not a multitranslation RMS searchlist.

The VMS GNV project

The VMS GNV package provides a build environment similar to POSIX with ports of a collection of open source tools. The gawk found in the GNV base kit is an older port. Currently, the GNV project is being reorganized to supply individual PCSI packages for each component. Seehttps://sourceforge.net/p/gnv/wiki/InstallingGNVPackages/.

The normal build procedure for gawk produces a program that is suitable for use with GNV.

The file vms/gawk_build_steps.txt in the distribution documents the procedure for building a VMS PCSI kit that is compatible with GNV.

Some VMS systems have an old version of gawk

Some versions of VMS have an old version of gawk. To access it, define a symbol, as follows:

$ gawk :== $sys$common:[syshlp.examples.tcpip.snmp]gawk.exe

This is apparently version 2.15.6, which is extremely old. We recommend compiling and using the current version.

Reporting Problems and Bugs

There is nothing more dangerous than a bored archaeologist.

—Douglas Adams, The Hitchhiker’s Guide to the Galaxy

If you have problems with gawk or think that you have found a bug, report it to the developers; we cannot promise to do anything, but we might well want to fix it.

Before reporting a bug, make sure you have really found a genuine bug. Carefully reread the documentation and see if it says you can do what you’re trying to do. If it’s not clear whether you should be able to do something or not, report that too; it’s a bug in the documentation!

Before reporting a bug or trying to fix it yourself, try to isolate it to the smallest possible awk program and input datafile that reproduce the problem. Then send us the program and datafile, some idea of what kind of Unix system you’re using, the compiler you used to compile gawk, and the exact results gawk gave you. Also say what you expected to occur; this helps us decide whether the problem is really in the documentation.

Make sure to include the version number of gawk you are using. You can get this information with the command ‘gawk --version’.

Once you have a precise problem description, send email to bug-gawk@gnu.org.

The gawk maintainers subscribe to this address, and thus they will receive your bug report. Although you can send mail to the maintainers directly, the bug reporting address is preferred because the email list is archived at the GNU Project. All email must be in English. This is the only language understood in common by all the maintainers.

CAUTION

Do not try to report bugs in gawk by posting to the Usenet/Internet newsgroup comp.lang.awk. The gawk developers do occasionally read this newsgroup, but there is no guarantee that we will see your posting. The steps described here are the only officially recognized way for reporting bugs. Really.

NOTE

Many distributions of GNU/Linux and the various BSD-based operating systems have their own bug reporting systems. If you report a bug using your distribution’s bug reporting system, you should also send a copy to bug-gawk@gnu.org.

This is for two reasons. First, although some distributions forward bug reports “upstream” to the GNU mailing list, many don’t, so there is a good chance that the gawk maintainers won’t even see the bug report! Second, mail to the GNU list is archived, and having everything at the GNU Project keeps things self-contained and not dependent on other organizations.

Non-bug suggestions are always welcome as well. If you have questions about things that are unclear in the documentation or are just obscure features, ask on the bug list; we will try to help you out if we can.

If you find bugs in one of the non-Unix ports of gawk, send an email to the bug list, with a copy to the person who maintains that port. The maintainers are named in the following list, as well as in the README file in the gawk distribution. Information in the README file should be considered authoritative if it conflicts with this book.

The people maintaining the various gawk ports are:

Unix and POSIX systems

Arnold Robbins, arnold@skeeve.com

MS-DOS with DJGPP

Scott Deifik, scottd.mail@sbcglobal.net

MS-Windows with MinGW

Eli Zaretskii, eliz@gnu.org

OS/2

Andreas Buening, andreas.buening@nexgo.de

VMS

John Malmberg, wb8tyw@qsl.net

z/OS (OS/390)

Dave Pitts, dpitts@cozx.com

If your bug is also reproducible under Unix, send a copy of your report to the bug-gawk@gnu.org email list as well.

Other Freely Available awk Implementations

It’s kind of fun to put comments like this in your awk code:
// Do C++ comments work? answer: yes! of course

—Michael Brennan

There are a number of other freely available awk implementations. This section briefly describes where to get them:

Unix awk

Brian Kernighan, one of the original designers of Unix awk, has made his implementation of awk freely available. You can retrieve this version via his home page. It is available in several archive formats:

Shell archive

http://www.cs.princeton.edu/~bwk/btl.mirror/awk.shar

Compressed tar file

http://www.cs.princeton.edu/~bwk/btl.mirror/awk.tar.gz

Zip file

http://www.cs.princeton.edu/~bwk/btl.mirror/awk.zip

You can also retrieve it from GitHub:

git clone git://github.com/onetrueawk/awk bwkawk

This command creates a copy of the Git repository in a directory named bwkawk. If you leave that argument off the git command line, the repository copy is created in a directory named awk.

This version requires an ISO C (1990 standard) compiler; the C compiler from GCC (the GNU Compiler Collection) works quite nicely.

See Common Extensions Summary for a list of extensions in this awk that are not in POSIX awk.

As a side note, Dan Bornstein has created a Git repository tracking all the versions of BWK awk that he could find. It’s available at http://github.com/danfuzz/one-true-awk.

mawk

Michael Brennan wrote an independent implementation of awk, called mawk. It is available under the GPL, just as gawk is.

The original distribution site for the mawk source code no longer has it. A copy is available at http://www.skeeve.com/gawk/mawk1.3.3.tar.gz.

In 2009, Thomas Dickey took on mawk maintenance. Basic information is available on the project’s web page. The download URL is http://invisible-island.net/datafiles/release/mawk.tar.gz.

Once you have it, gunzip may be used to decompress this file. Installation is similar to gawk’s (see Compiling and Installing gawk on Unix-Like Systems).

See Common Extensions Summary for a list of extensions in mawk that are not in POSIX awk.

awka

Written by Andrew Sumner, awka translates awk programs into C, compiles them, and links them with a library of functions that provide the core awk functionality. It also has a number of extensions.

The awk translator is released under the GPL, and the library is under the LGPL.

To get awka, go to http://sourceforge.net/projects/awka.

The project seems to be frozen; no new code changes have been made since approximately 2001.

pawk

Nelson H.F. Beebe at the University of Utah has modified BWK awk to provide timing and profiling information. It is different from gawk with the --profile option (see Profiling Your awk Programs) in that it uses CPU-based profiling, not line-count profiling. You may find it at eitherftp://ftp.math.utah.edu/pub/pawk/pawk-20030606.tar.gz or http://www.math.utah.edu/pub/pawk/pawk-20030606.tar.gz.

BusyBox awk

BusyBox is a GPL-licensed program providing small versions of many applications within a single executable. It is aimed at embedded systems. It includes a full implementation of POSIX awk. When building it, be careful not to do ‘make install’ as it will overwrite copies of other applications in your /usr/local/bin. For more information, see the project’s home page.

The OpenSolaris POSIX awk

The versions of awk in /usr/xpg4/bin and /usr/xpg6/bin on Solaris are more or less POSIX-compliant. They are based on the awk from Mortice Kern Systems for PCs. We were able to make this code compile and work under GNU/Linux with 1–2 hours of work. Making it more generally portable (using GNU Autoconf and/or Automake) would take more work, and this has not been done, at least to our knowledge.

The source code used to be available from the OpenSolaris website. However, that project was ended and the website shut down. Fortunately, the Illumos project makes this implementation available. You can view the files one at a time from https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/awk_xpg4.

jawk

This is an interpreter for awk written in Java. It claims to be a full interpreter, although because it uses Java facilities for I/O and for regexp matching, the language it supports is different from POSIX awk. More information is available on the project’s home page.

Libmawk

This is an embeddable awk interpreter derived from mawk. For more information, see http://repo.hu/projects/libmawk/.

pawk

This is a Python module that claims to bring awk-like features to Python. See https://github.com/alecthomas/pawk for more information. (This is not related to Nelson Beebe’s modified version of BWK awk, described earlier.)

QSE awk

This is an embeddable awk interpreter. For more information, see http://code.google.com/p/qse/ and http://awk.info/?tools/qse.

QTawk

This is an independent implementation of awk distributed under the GPL. It has a large number of extensions over standard awk and may not be 100% syntactically compatible with it. See http://www.quiktrim.org/QTawk.html for more information, including the manual and a download link.

The project may also be frozen; no new code changes have been made since approximately 2008.

Other versions

See also the “Versions and implementations” section of the Wikipedia article on awk for information on additional versions.

Summary

§ The gawk distribution is available from the GNU Project’s main distribution site, ftp.gnu.org. The canonical build recipe is:

§ wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.2.tar.gz

§ tar -xvpzf gawk-4.1.2.tar.gz

§ cd gawk-4.1.2

./configure && make && make check

§ gawk may be built on non-POSIX systems as well. The currently supported systems are MS-Windows using DJGPP, MSYS, MinGW, and Cygwin, and both Vax/VMS and OpenVMS. Instructions for each system are included in this appendix.

§ Bug reports should be sent via email to bug-gawk@gnu.org. Bug reports should be in English and should include the version of gawk, how it was compiled, and a short program and datafile that demonstrate the problem.

§ There are a number of other freely available awk implementations. Many are POSIX-compliant; others are less so.


[107] The IA64 architecture is also known as “Itanium.”