Advanced Features of gawk - Moving Beyond Standard awk with gawk - Effective awk Programming (2015)

Effective awk Programming (2015)

Part III. Moving Beyond Standard awk with gawk

Part III focuses on features specific to gawk. It contains the following chapters:

§ Chapter 12, Advanced Features of gawk

§ Chapter 13, Internationalization with gawk

§ Chapter 14, Debugging awk Programs

§ Chapter 15, Arithmetic and Arbitrary-Precision Arithmetic with gawk

§ Chapter 16, Writing Extensions for gawk

Chapter 12. Advanced Features of gawk

Write documentation as if whoever reads it is a violent psychopath who knows where you live.

—Steve English, as quoted by Peter Langston

This chapter discusses advanced features in gawk. It’s a bit of a “grab bag” of items that are otherwise unrelated to each other. First, we look at a command-line option that allows gawk to recognize nondecimal numbers in input data, not just in awk programs. Then, gawk’s special features for sorting arrays are presented. Next, two-way I/O, discussed briefly in earlier parts of this book, is described in full detail, along with the basics of TCP/IP networking. Finally, we see how gawk can profile an awk program, making it possible to tune it for performance.

Additional advanced features are discussed in separate chapters of their own:

§ Chapter 13, Internationalization with gawk, discusses how to internationalize your awk programs, so that they can speak multiple national languages.

§ Chapter 14, Debugging awk Programs, describes gawk’s built-in command-line debugger for debugging awk programs.

§ Chapter 15, Arithmetic and Arbitrary-Precision Arithmetic with gawk, describes how you can use gawk to perform arbitrary-precision arithmetic.

§ Chapter 16, Writing Extensions for gawk, discusses the ability to dynamically add new built-in functions to gawk.

Allowing Nondecimal Input Data

If you run gawk with the --non-decimal-data option, you can have nondecimal values in your input data:

$ echo 0123 123 0x123 |

> gawk --non-decimal-data '{ printf "%d, %d, %d\n", $1, $2, $3 }'

83, 123, 291

For this feature to work, write your program so that gawk treats your data as numeric:

$ echo 0123 123 0x123 | gawk '{ print $1, $2, $3 }'

0123 123 0x123

The print statement treats its expressions as strings. Although the fields can act as numbers when necessary, they are still strings, so print does not try to treat them numerically. You need to add zero to a field to force it to be treated as a number. For example:

$ echo 0123 123 0x123 | gawk --non-decimal-data '

> { print $1, $2, $3

> print $1 + 0, $2 + 0, $3 + 0 }'

0123 123 0x123

83 123 291

Because it is common to have decimal data with leading zeros, and because using this facility could lead to surprising results, the default is to leave it disabled. If you want it, you must explicitly request it.

CAUTION

Use of this option is not recommended. It can break old programs very badly. Instead, use the strtonum() function to convert your data (see String-Manipulation Functions). This makes your programs easier to write and easier to read, and leads to less surprising results.

This option may disappear in a future version of gawk.

Controlling Array Traversal and Array Sorting

gawk lets you control the order in which a ‘for (indx in array)’ loop traverses an array.

In addition, two built-in functions, asort() and asorti(), let you sort arrays based on the array values and indices, respectively. These two functions also provide control over the sorting criteria used to order the elements during sorting.

Controlling Array Traversal

By default, the order in which a ‘for (indx in array)’ loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside awk.

Often, though, it is desirable to be able to loop over the elements in a particular order that you, the programmer, choose. gawk lets you do this.

Using Predefined Array Scanning Orders with gawk describes how you can assign special, predefined values to PROCINFO["sorted_in"] in order to control the order in which gawk traverses an array during a for loop.

In addition, the value of PROCINFO["sorted_in"] can be a function name.[78] This lets you traverse an array based on any custom criterion. The array elements are ordered according to the return value of this function. The comparison function should be defined with at least four arguments:

function comp_func(i1, v1, i2, v2)

{

compare elements 1 and 2 in some fashion

return < 0; 0; or > 0

}

Here, i1 and i2 are the indices, and v1 and v2 are the corresponding values of the two elements being compared. Either v1 or v2, or both, can be arrays if the array being traversed contains subarrays as values. (See Arrays of Arrays for more information about subarrays.) The three possible return values are interpreted as follows:

comp_func(i1, v1, i2, v2) < 0

Index i1 comes before index i2 during loop traversal.

comp_func(i1, v1, i2, v2) == 0

Indices i1 and i2 come together, but the relative order with respect to each other is undefined.

comp_func(i1, v1, i2, v2) > 0

Index i1 comes after index i2 during loop traversal.

Our first comparison function can be used to scan an array in numerical order of the indices:

function cmp_num_idx(i1, v1, i2, v2)

{

# numerical index comparison, ascending order

return (i1 - i2)

}

Our second function traverses an array based on the string order of the element values rather than by indices:

function cmp_str_val(i1, v1, i2, v2)

{

# string value comparison, ascending order

v1 = v1 ""

v2 = v2 ""

if (v1 < v2)

return -1

return (v1 != v2)

}

The third comparison function makes all numbers, and numeric strings without any leading or trailing spaces, come out first during loop traversal:

function cmp_num_str_val(i1, v1, i2, v2, n1, n2)

{

# numbers before string value comparison, ascending order

n1 = v1 + 0

n2 = v2 + 0

if (n1 == v1)

return (n2 == v2) ? (n1 - n2) : -1

else if (n2 == v2)

return 1

return (v1 < v2) ? -1 : (v1 != v2)

}

Here is a main program to demonstrate how gawk behaves using each of the previous functions:

BEGIN {

data["one"] = 10

data["two"] = 20

data[10] = "one"

data[100] = 100

data[20] = "two"

f[1] = "cmp_num_idx"

f[2] = "cmp_str_val"

f[3] = "cmp_num_str_val"

for (i = 1; i <= 3; i++) {

printf("Sort function: %s\n", f[i])

PROCINFO["sorted_in"] = f[i]

for (j in data)

printf("\tdata[%s] = %s\n", j, data[j])

print ""

}

}

Here are the results when the program is run:

$ gawk -f compdemo.awk

Sort function: cmp_num_idx Sort by numeric index

data[two] = 20

data[one] = 10 Both strings are numerically zero

data[10] = one

data[20] = two

data[100] = 100

Sort function: cmp_str_val Sort by element values as strings

data[one] = 10

data[100] = 100 String 100 is less than string 20

data[two] = 20

data[10] = one

data[20] = two

Sort function: cmp_num_str_val Sort all numeric values before all strings

data[one] = 10

data[two] = 20

data[100] = 100

data[10] = one

data[20] = two

Consider sorting the entries of a GNU/Linux system password file according to login name. The following program sorts records by a specific field position and can be used for this purpose:

# passwd-sort.awk --- simple program to sort by field position

# field position is specified by the global variable POS

function cmp_field(i1, v1, i2, v2)

{

# comparison by value, as string, and ascending order

return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])

}

{

for (i = 1; i <= NF; i++)

a[NR][i] = $i

}

END {

PROCINFO["sorted_in"] = "cmp_field"

if (POS < 1 || POS > NF)

POS = 1

for (i in a) {

for (j = 1; j <= NF; j++)

printf("%s%c", a[i][j], j < NF ? ":" : "")

print ""

}

}

The first field in each entry of the password file is the user’s login name, and the fields are separated by colons. Each record defines a subarray, with each field as an element in the subarray. Running the program produces the following output:

$ gawk -v POS=1 -F: -f sort.awk /etc/passwd

adm:x:3:4:adm:/var/adm:/sbin/nologin

apache:x:48:48:Apache:/var/www:/sbin/nologin

avahi:x:70:70:Avahi daemon:/:/sbin/nologin

The comparison should normally always return the same value when given a specific pair of array elements as its arguments. If inconsistent results are returned, then the order is undefined. This behavior can be exploited to introduce random order into otherwise seemingly ordered data:

function cmp_randomize(i1, v1, i2, v2)

{

# random order (caution: this may never terminate!)

return (2 - 4 * rand())

}

As already mentioned, the order of the indices is arbitrary if two elements compare equal. This is usually not a problem, but letting the tied elements come out in arbitrary order can be an issue, especially when comparing item values. The partial ordering of the equal elements may change the next time the array is traversed, if other elements are added to or removed from the array. One way to resolve ties when comparing elements with otherwise equal values is to include the indices in the comparison rules. Note that doing this may make the loop traversal less efficient, so consider it only if necessary. The following comparison functions force a deterministic order, and are based on the fact that the (string) indices of two elements are never equal:

function cmp_numeric(i1, v1, i2, v2)

{

# numerical value (and index) comparison, descending order

return (v1 != v2) ? (v2 - v1) : (i2 - i1)

}

function cmp_string(i1, v1, i2, v2)

{

# string value (and index) comparison, descending order

v1 = v1 i1

v2 = v2 i2

return (v1 > v2) ? -1 : (v1 != v2)

}

A custom comparison function can often simplify ordered loop traversal, and the sky is really the limit when it comes to designing such a function.

When string comparisons are made during a sort, either for element values where one or both aren’t numbers, or for element indices handled as strings, the value of IGNORECASE (see Predefined Variables) controls whether the comparisons treat corresponding upper- and lowercase letters as equivalent or distinct.

Another point to keep in mind is that in the case of subarrays, the element values can themselves be arrays; a production comparison function should use the isarray() function (see Getting Type Information) to check for this, and choose a defined sorting order for subarrays.

All sorting based on PROCINFO["sorted_in"] is disabled in POSIX mode, because the PROCINFO array is not special in that case.

As a side note, sorting the array indices before traversing the array has been reported to add a 15% to 20% overhead to the execution time of awk programs. For this reason, sorted array traversal is not the default.

Sorting Array Values and Indices with gawk

In most awk implementations, sorting an array requires writing a sort() function. This can be educational for exploring different sorting algorithms, but usually that’s not the point of the program. gawk provides the built-in asort() and asorti() functions (see String-Manipulation Functions) for sorting arrays. For example:

populate the array data

n = asort(data)

for (i = 1; i <= n; i++)

do something with data[i]

After the call to asort(), the array data is indexed from 1 to some number n, the total number of elements in data. (This count is asort()’s return value.) data[1] ≤ data[2] ≤ data[3], and so on. The default comparison is based on the type of the elements (see Variable Typing and Comparison Expressions). All numeric values come before all string values, which in turn come before all subarrays.

An important side effect of calling asort() is that the array’s original indices are irrevocably lost. As this isn’t always desirable, asort() accepts a second argument:

populate the array source

n = asort(source, dest)

for (i = 1; i <= n; i++)

do something with dest[i]

In this case, gawk copies the source array into the dest array and then sorts dest, destroying its indices. However, the source array is not affected.

Often, what’s needed is to sort on the values of the indices instead of the values of the elements. To do that, use the asorti() function. The interface and behavior are identical to that of asort(), except that the index values are used for sorting and become the values of the result array:

{ source[$0] = some_func($0) }

END {

n = asorti(source, dest)

for (i = 1; i <= n; i++) {

Work with sorted indices directly:

do something with dest[i]

Access original array via sorted indices:

do something with source[dest[i]]

}

}

So far, so good. Now it starts to get interesting. Both asort() and asorti() accept a third string argument to control comparison of array elements. When we introduced asort() and asorti() in String-Manipulation Functions, we ignored this third argument; however, now is the time to describe how this argument affects these two functions.

Basically, the third argument specifies how the array is to be sorted. There are two possibilities. As with PROCINFO["sorted_in"], this argument may be one of the predefined names that gawk provides (see Using Predefined Array Scanning Orders with gawk), or it may be the name of a user-defined function (see Controlling Array Traversal).

In the latter case, the function can compare elements in any way it chooses, taking into account just the indices, just the values, or both. This is extremely powerful.

Once the array is sorted, asort() takes the values in their final order and uses them to fill in the result array, whereas asorti() takes the indices in their final order and uses them to fill in the result array.

NOTE

Copying array indices and elements isn’t expensive in terms of memory. Internally, gawk maintains reference counts to data. For example, when asort() copies the first array to the second one, there is only one copy of the original array elements’ data, even though both arrays use the values.

Because IGNORECASE affects string comparisons, the value of IGNORECASE also affects sorting for both asort() and asorti(). Note also that the locale’s sorting order does not come into play; comparisons are based on character values only.[79]

Two-Way Communications with Another Process

It is often useful to be able to send data to a separate program for processing and then read the result. This can always be done with temporary files:

# Write the data for processing

tempfile = ("mydata." PROCINFO["pid"])

while (not done with data)

print data | ("subprogram > " tempfile)

close("subprogram > " tempfile)

# Read the results, remove tempfile when done

while ((getline newdata < tempfile) > 0)

process newdata appropriately

close(tempfile)

system("rm " tempfile)

This works, but not elegantly. Among other things, it requires that the program be run in a directory that cannot be shared among users; for example, /tmp will not do, as another user might happen to be using a temporary file with the same name.[80]

However, with gawk, it is possible to open a two-way pipe to another process. The second process is termed a coprocess, as it runs in parallel with gawk. The two-way connection is created using the ‘|&’ operator (borrowed from the Korn shell, ksh):[81]

do {

print data |& "subprogram"

"subprogram" |& getline results

} while (data left to process)

close("subprogram")

The first time an I/O operation is executed using the ‘|&’ operator, gawk creates a two-way pipeline to a child process that runs the other program. Output created with print or printf is written to the program’s standard input, and output from the program’s standard output can be read by the gawk program using getline. As is the case with processes started by ‘|’, the subprogram can be any program, or pipeline of programs, that can be started by the shell.

There are some cautionary items to be aware of:

§ As the code inside gawk currently stands, the coprocess’s standard error goes to the same place that the parent gawk’s standard error goes. It is not possible to read the child’s standard error separately.

§ I/O buffering may be a problem. gawk automatically flushes all output down the pipe to the coprocess. However, if the coprocess does not flush its output, gawk may hang when doing a getline in order to read the coprocess’s results. This could lead to a situation known as deadlock, where each process is waiting for the other one to do something.

It is possible to close just one end of the two-way pipe to a coprocess, by supplying a second argument to the close() function of either "to" or "from" (see Closing Input and Output Redirections). These strings tell gawk to close the end of the pipe that sends data to the coprocess or the end that reads from it, respectively.

This is particularly necessary in order to use the system sort utility as part of a coprocess; sort must read all of its input data before it can produce any output. The sort program does not receive an end-of-file indication until gawk closes the write end of the pipe.

When you have finished writing data to the sort utility, you can close the "to" end of the pipe, and then start reading sorted data via getline. For example:

BEGIN {

command = "LC_ALL=C sort"

n = split("abcdefghijklmnopqrstuvwxyz", a, "")

for (i = n; i > 0; i--)

print a[i] |& command

close(command, "to")

while ((command |& getline line) > 0)

print "got", line

close(command)

}

This program writes the letters of the alphabet in reverse order, one per line, down the two-way pipe to sort. It then closes the write end of the pipe, so that sort receives an end-of-file indication. This causes sort to sort the data and write the sorted data back to the gawk program. Once all of the data has been read, gawk terminates the coprocess and exits.

As a side note, the assignment ‘LC_ALL=C’ in the sort command ensures traditional Unix (ASCII) sorting from sort. This is not strictly necessary here, but it’s good to know how to do this.

You may also use pseudo-ttys (ptys) for two-way communication instead of pipes, if your system supports them. This is done on a per-command basis, by setting a special element in the PROCINFO array (see Built-in Variables That Convey Information), like so:

command = "sort -nr" # command, save in convenience variable

PROCINFO[command, "pty"] = 1 # update PROCINFO

print … |& command # start two-way pipe

Using ptys usually avoids the buffer deadlock issues described earlier, at some loss in performance. If your system does not have ptys, or if all the system’s ptys are in use, gawk automatically falls back to using regular pipes.

Using gawk for Network Programming

EMISTERED:
A host is a host from coast to coast,
and nobody talks to a host that’s close,
unless the host that isn’t close
is busy, hung, or dead.

—Mike O’Brien (aka Mr. Protocol)

In addition to being able to open a two-way pipeline to a coprocess on the same system (see Two-Way Communications with Another Process), it is possible to make a two-way connection to another process on another system across an IP network connection.

You can think of this as just a very long two-way pipeline to a coprocess. The way gawk decides that you want to use TCP/IP networking is by recognizing special filenames that begin with one of ‘/inet/’, ‘/inet4/’, or ‘/inet6/’.

The full syntax of the special filename is /net-type/protocol/local-port/remote-host/remote-port. The components are:

net-type

Specifies the kind of Internet connection to make. Use ‘/inet4/’ to force IPv4, and ‘/inet6/’ to force IPv6. Plain ‘/inet/’ (which used to be the only option) uses the system default, most likely IPv4.

protocol

The protocol to use over IP. This must be either ‘tcp’, or ‘udp’, for a TCP or UDP IP connection, respectively. TCP should be used for most applications.

local-port

The local TCP or UDP port number to use. Use a port number of ‘0’ when you want the system to pick a port. This is what you should do when writing a TCP or UDP client. You may also use a well-known service name, such as ‘smtp’ or ‘http’, in which case gawk attempts to determine the predefined port number using the C getaddrinfo() function.

remote-host

The IP address or fully qualified domain name of the Internet host to which you want to connect.

remote-port

The TCP or UDP port number to use on the given remote-host. Again, use ‘0’ if you don’t care, or else a well-known service name.

NOTE

Failure in opening a two-way socket will result in a nonfatal error being returned to the calling code. The value of ERRNO indicates the error (see Built-in Variables That Convey Information).

Consider the following very simple example:

BEGIN {

Service = "/inet/tcp/0/localhost/daytime"

Service |& getline

print $0

close(Service)

}

This program reads the current date and time from the local system’s TCP daytime server. It then prints the results and closes the connection.

Because this topic is extensive, the use of gawk for TCP/IP programming is documented separately. See TCP/IP Internetworking with gawk, which comes as part of the gawk distribution, for a much more complete introduction and discussion, as well as extensive examples.

Profiling Your awk Programs

You may produce execution traces of your awk programs. This is done by passing the option --profile to gawk. When gawk has finished running, it creates a profile of your program in a file named awkprof.out. Because it is profiling, it also executes up to 45% slower than gawknormally does.

As shown in the following example, the --profile option can be used to change the name of the file where gawk will write the profile:

gawk --profile=myprog.prof -f myprog.awk data1 data2

In the preceding example, gawk places the profile in myprog.prof instead of in awkprof.out.

Here is a sample session showing a simple awk program, its input data, and the results from running gawk with the --profile option. First, the awk program:

BEGIN { print "First BEGIN rule" }

END { print "First END rule" }

/foo/ {

print "matched /foo/, gosh"

for (i = 1; i <= 3; i++)

sing()

}

{

if (/foo/)

print "if is true"

else

print "else is true"

}

BEGIN { print "Second BEGIN rule" }

END { print "Second END rule" }

function sing( dummy)

{

print "I gotta be me!"

}

Following is the input data:

foo

bar

baz

foo

junk

Here is the awkprof.out that results from running the gawk profiler on this program and data (this example also illustrates that awk programmers sometimes get up very early in the morning to work):

# gawk profile, created Mon Sep 29 05:16:21 2014

# BEGIN rule(s)

BEGIN {

1 print "First BEGIN rule"

}

BEGIN {

1 print "Second BEGIN rule"

}

# Rule(s)

5 /foo/ { # 2

2 print "matched /foo/, gosh"

6 for (i = 1; i <= 3; i++) {

6 sing()

}

}

5 {

5 if (/foo/) { # 2

2 print "if is true"

3 } else {

3 print "else is true"

}

}

# END rule(s)

END {

1 print "First END rule"

}

END {

1 print "Second END rule"

}

# Functions, listed alphabetically

6 function sing(dummy)

{

6 print "I gotta be me!"

}

This example illustrates many of the basic features of profiling output. They are as follows:

§ The program is printed in the order BEGIN rules, BEGINFILE rules, pattern–action rules, ENDFILE rules, END rules, and functions, listed alphabetically. Multiple BEGIN and END rules retain their separate identities, as do multiple BEGINFILE and ENDFILE rules.

§ Pattern–action rules have two counts. The first count, to the left of the rule, shows how many times the rule’s pattern was tested. The second count, to the right of the rule’s opening left brace in a comment, shows how many times the rule’s action was executed. The difference between the two indicates how many times the rule’s pattern evaluated to false.

§ Similarly, the count for an if-else statement shows how many times the condition was tested. To the right of the opening left brace for the if’s body is a count showing how many times the condition was true. The count for the else indicates how many times the test failed.

§ The count for a loop header (such as for or while) shows how many times the loop test was executed. (Because of this, you can’t just look at the count on the first statement in a rule to determine how many times the rule was executed. If the first statement is a loop, the count is misleading.)

§ For user-defined functions, the count next to the function keyword indicates how many times the function was called. The counts next to the statements in the body show how many times those statements were executed.

§ The layout uses “K&R” style with TABs. Braces are used everywhere, even when the body of an if, else, or loop is only a single statement.

§ Parentheses are used only where needed, as indicated by the structure of the program and the precedence rules. For example, ‘(3 + 5) * 4’ means add three and five, then multiply the total by four. However, ‘3 + 5 * 4’ has no parentheses, and means ‘3 + (5 * 4)’.

§ Parentheses are used around the arguments to print and printf only when the print or printf statement is followed by a redirection. Similarly, if the target of a redirection isn’t a scalar, it gets parenthesized.

§ gawk supplies leading comments in front of the BEGIN and END rules, the BEGINFILE and ENDFILE rules, the pattern–action rules, and the functions.

The profiled version of your program may not look exactly like what you typed when you wrote it. This is because gawk creates the profiled version by “pretty-printing” its internal representation of the program. The advantage to this is that gawk can produce a standard representation. The disadvantage is that all source code comments are lost. Also, things such as:

/foo/

come out as:

/foo/ {

print $0

}

which is correct, but possibly unexpected.

Besides creating profiles when a program has completed, gawk can produce a profile while it is running. This is useful if your awk program goes into an infinite loop and you want to see what has been executed. To use this feature, run gawk with the --profile option in the background:

$ gawk --profile -f myprog &

[1] 13992

The shell prints a job number and process ID number; in this case, 13992. Use the kill command to send the USR1 signal to gawk:

$ kill -USR1 13992

As usual, the profiled version of the program is written to awkprof.out, or to a different file if one was specified with the --profile option.

Along with the regular profile, as shown earlier, the profile file includes a trace of any active functions:

# Function Call Stack:

# 3. baz

# 2. bar

# 1. foo

# -- main --

You may send gawk the USR1 signal as many times as you like. Each time, the profile and function call trace are appended to the output profile file.

If you use the HUP signal instead of the USR1 signal, gawk produces the profile and the function call trace and then exits.

When gawk runs on MS-Windows systems, it uses the INT and QUIT signals for producing the profile, and in the case of the INT signal, gawk exits. This is because these systems don’t support the kill command, so the only signals you can deliver to a program are those generated by the keyboard. The INT signal is generated by the Ctrl-c or Ctrl-BREAK key, while the QUIT signal is generated by the Ctrl-\ key.

Finally, gawk also accepts another option, --pretty-print. When called this way, gawk “pretty-prints” the program into awkprof.out, without any execution counts.

NOTE

The --pretty-print option still runs your program. This will change in the next major release.

Summary

§ The --non-decimal-data option causes gawk to treat octal- and hexadecimal-looking input data as octal and hexadecimal. This option should be used with caution or not at all; use of strtonum() is preferable. Note that this option may disappear in a future version of gawk.

§ You can take over complete control of sorting in ‘for (indx in array)’ array traversal by setting PROCINFO["sorted_in"] to the name of a user-defined function that does the comparison of array elements based on index and value.

§ Similarly, you can supply the name of a user-defined comparison function as the third argument to either asort() or asorti() to control how those functions sort arrays. Or you may provide one of the predefined control strings that work for PROCINFO["sorted_in"].

§ You can use the ‘|&’ operator to create a two-way pipe to a coprocess. You read from the coprocess with getline and write to it with print or printf. Use close() to close off the coprocess completely, or optionally close off one side of the two-way communications.

§ By using special filenames with the ‘|&’ operator, you can open a TCP/IP (or UDP/IP) connection to remote hosts on the Internet. gawk supports both IPv4 and IPv6.

§ You can generate statement count profiles of your program. This can help you determine which parts of your program may be taking the most time and let you tune them more easily. Sending the USR1 signal while profiling causes gawk to dump the profile and keep going, including a function call stack.

§ You can also just “pretty-print” the program. This currently also runs the program, but that will change in the next major release.


[78] This is why the predefined sorting orders start with an ‘@’ character, which cannot be part of an identifier.

[79] This is true because locale-based comparison occurs only when in POSIX-compatibility mode, and because asort() and asorti() are gawk extensions, they are not available in that case.

[80] Michael Brennan suggests the use of rand() to generate unique filenames. This is a valid point; nevertheless, temporary files remain more difficult to use than two-way pipes.

[81] This is very different from the same operator in the C shell and in Bash.