Working with Tables - Beginning Lua Programming (2007)

Beginning Lua Programming (2007)

Chapter 4. Working with Tables

This chapter explores a new data type called a table. It’s a data structure, which means that it lets you combine other values. Because of its flexibility, it is Lua’s only data structure. (It is possible to create other, special-purpose data structures in C.)

In this chapter, you learn how to do the following:

· Create and modify tables

· Loop through the elements of tables

· Use Lua’s built-in table library

· Write programs in an object-oriented style

· Write functions that take variable numbers of arguments

Tables Introduced

The following example creates a table and assigns it to the variable NameTolnstr, and then looks around inside the table:

> NameTolnstr = {["John"] = "rhythm guitar",

>> ["Paul"] = "bass guitar",

>> ["George"] = "lead guitar",

>> ["Ringo"] = "drumkit"}

> print(NameToInstr["Paul"])

bass guitar

> A = "Ringo"

> print(NameToInstr[A])

drumkit

> print(NameToInstr["Mick"])

nil

A table is a collection of key-value pairs. In this example, the expression that starts and ends with { and } (curly braces) is a table constructor that creates a table that associates the key "John" with the value "rhythm guitar", the key "Paul" with the value "bass guitar", and so on. Each key is surrounded in [ and ] (square brackets) and is separated from its value by an equal sign. The key-value pairs are separated from each other by commas.

After the table is created and assigned to NameToInstr, square brackets are used to retrieve the values for particular keys. When NameToInstr["Paul"] is evaluated, the result is "bass guitar", which is the value associated with the key "Paul" in the NameTolnstr table.

The term “value” is used here to mean “the second half of a key-value pair.” Both this sense and the broader sense used in Chapters 2 and 3 are used in this chapter; which sense is intended should be clear from the context. A “key” is a value in the broader (but not the narrower) sense.

As the line with NameTolnstr[A] shows, the expression in between the square brackets doesn’t have to be a literal string. Here it is a variable, but it can be any expression. (This also applies to the square brackets inside table constructors—if an expression inside square brackets is a function call, it is adjusted to one value.)

If you ask a table for the value of a key it doesn’t contain, it gives you nil:

> print(NameToInstr["Mick"])

nil

This means that nil cannot be a value in a table. Another way of saying this is that there is no difference between a key not existing in a table, and that key existing but having nil as its value. Keys cannot be nil , although if the value of a nil key is asked for, the result will be nil :

> Tbl = {}

> print(Tbl[nil])

nil

Both keys and values can be any type other than nil. For example:

> T = {[print] = "The print function",

>> ["print"] = print,

>> [0.1] = 0.2}

> print(T[print]) -- Function key, string value.

The print function

> print(T["print"]) -- String key, function value,

function: 0x481720

> print(T[0.1]) -- Number key, number value.

0.2

The association between a key and a value is one-way. NameToInstr["Ringo"] is "drumkit", but NameToInstr["drumkit"] is nil. A given value can be associated with multiple keys, but a given key can only have one value at a time. For example:

> T = {["a"] = "duplicate value",

>> ["b"] = "duplicate value",

>> ["duplicate key"] = "y",

>> ["duplicate key"] = "z"}

> print(T["a"])

duplicate value

> print(T["b"])

duplicate value

> print(T["duplicate key"])

z

Keys follow the same equality rules as other values, so (in the following example, 1 and "1" are two distinct keys:

> T = {[1] = "number", ["1"] = "string"}

> print(T[1], T["1"])

number string

A Shorter Way to Write Some Keys

A key is often called an index to a table, and accessing a key’s value (like T[X]) is called indexing the table.

The word “index” has other uses too. For example, getting the nth character of a string is “indexing” that string, and a loop variable, particularly one with an integer value, can be called an index (which is why some of the loop variables in this book are named I).

The value of a particular index of a particular table is often called a field, and the index itself is called a field name. This terminology is used especially when the field name is a valid identifier. If a field name is a valid identifier, you can use it in a table constructor without the square brackets or quotes. The following is another way to write the constructor for NameToInstr:

NameToInstr = {John = "rhythm guitar",

Paul = "bass guitar",

George = "lead guitar",

Ringo = "drumkit"}

That doesn’t work with any of the following tables, because none of the keys are valid identifiers (notice that the error messages are different, because the keys are invalid identifiers for different reasons):

> T = {1st = "test"}

stdin:1: malformed number near '1st'

> T = {two words = "test"}

stdin:1: '}' expected near 'words'

> T = {and = "test"}

stdin:1: unexpected symbol near 'and'

You can also access fields in an existing table (if the field names are valid identifiers) by using . (a dot) instead of square brackets and quotes, like this:

> print(NameToInstr.George)

lead guitar

You can index a table within a table in one step, as follows:

> Tbl1 = {Tbl2 = {Bool = true}}

> print(Tbl1.Tbl2.Bool)

true

> print(Tbl1["Tbl2"].Bool)

true

> print(Tbl1.Tbl2["Bool"])

true

> print(Tbl1["Tbl2"]["Bool"])

true

This works for tables within tables within tables, as deep as you want to go. If there are enough nested tables, then Tbl.Think.Thank.Thunk.Thenk.Thonk is perfectly valid.

Don’t let the flexibility of tables and the variety of methods for accessing them confuse you. In particular, remember that NameToInstr["John"] and NameToInstr.John both mean “get the value for the "John" key,” and NameToInstr[John] means “get the value for whatever key is in the variable John.” If you find yourself getting a nil when you don’t expect to, make sure you’re not mixing these up.

Altering a Table’s Contents

After you create a table, you can modify or remove the values already in it, and you can add new values to it. You do these things with the assignment statement.

Try It Out

Assigning to Table Indexes

Type the following into the Lua interpreter:

> Potluck = {John = "chips", Jane = "lemonade",

>> Jolene = "egg salad"}

> Potluck.Jolene = "fruit salad" -- A change.

> Potluck.Jayesh = "lettuce wraps" -- An addition.

> Potluck.John = nil -- A removal.

> print(Potluck.John, Potluck.Jane, Potluck.Jolene,

>> Potluck.Jayesh)

Here’s the result:

nil lemonade fruit salad lettuce wraps

How It Works

In this exercise, you create a table with three people and their foods. You then use assignment to change one person’s food, to add a new person-food pair to the table, and to remove an existing person-food pair.

Potluck.Jolene = "fruit salad" overwrites the previous value of Potluck.Jolene ("egg salad").

Potluck.Jayesh = "lettuce wraps" adds a new key (and its value) to the table. The value of Potluck.Jayesh before this line would have been nil.

Potluck.John = nil overwrites the previous value of Potluck.John with nil. This is another way of saying that it removes the key "John" from the table (because there’s no difference between a nil-valued key and a nonexistent key).

Notice that, except in the line with the table constructor, the variable Potluck is never assigned to. Rather, individual fields of the table in Potluck are being assigned to. This is called indexing assignment.

The Lua reference manual actually calls table fields a third type of variable (after globals and locals). This usage makes some things clearer, but it isn’t widespread, so it isn’t followed in this book.

Often the most convenient way to populate a table is to start with an empty table and add things to it one at a time. Here’s an example of creating a table of the first five perfect squares and then printing it:

> Squares = {} -- A table constructor can be empty.

> for I = 1, 5 do

>> Squares[I] = I * 2

>> end

> for I = 1, 5 do

>> print(I .. " squared is " .. Squares[I])

>> end

1 squared is 1

2 squared is 4

3 squared is 9

4 squared is 16

5 squared is 25

You can assign to nested tables in one step, as in Tbl.Think[Thank].Thunk = true.

Here are a couple of other table constructor tidbits. You can optionally follow the final value in a table constructor by a comma: {A = 1, B = 2, C = 3,}. This is convenient for automatically generated table constructors, and for frequently edited ones (so you don’t have to always make sure to delete a comma if a value becomes the last one). And instead of commas, you can use semicolons (or a mixture of commas and semicolons): {A = 1; B = 2; C = 3}.

Tables as Arrays

It’s common for the keys of a table to be consecutive integers, starting at 1. For example:

> Months = {[1] = "January", [2] = "February", [3] = "March",

>> [4] = "April", [5] = "May", [6] = "June", [7] = "July",

>> [8] = "August", [9] = "September", [10] = "October",

>> [11] = "November", [12] = "December"}

> print(Months[11])

November

A table used in this way is sometimes called an array (or a list). To emphasize that a table is not being used as an array, it can be called an associative table.

You can write table constructors that build arrays in a more concise, less error-prone way that doesn’t require writing out each integer key. For example:

> Months = {"January", "February", "March", "April", "May",

>> "June", "July", "August", "September", "October",

>> "November", "December"}

> print(Months[11])

November

Inside a table constructor, the first value that doesn’t have a key (and an equal sign) in front of it is associated with the key 1 . Any subsequent such values are given a key one higher than that given to the previous such value. This rule applies even if key-value pairs with equal signs are intermixed, like this:

> T = {A = "x", "one", B = "y", "two", C = "z", "three"}

> print(T[1], T[2], T[3])

one two three

Usually this sort of mixed table constructor is easier to read if the consecutive-integer values are all together and the other key-value pairs are all together.

If a function call is used as the value of an explicit key ({K = F()}, for example), it’s adjusted to one return value. If it’s used as the value of an implicit integer key, it’s only adjusted to one return value if it’s not the last thing in the table constructor; if it is the last thing, no adjustment is made:

> function ReturnNothing()

>> end

>

>> function ReturnThreeVals()

>> return "x", "y", "z"

>> end

>

> TblA = {ReturnThreeVals(), ReturnThreeVals()}

> print(TblA[1], TblA[2], TblA[3], TblA[4])

x x y z

> TblB = {ReturnNothing(), ReturnThreeVals()}

> -- The following nil is the result of adjustment:

> print(TblB[1], TblB[2], TblB[3], TblB[4])

nil x y z

> TblC = {ReturnThreeVals(), ReturnNothing()}

> -- The following three nils are not the result of adjustment;

> -- they’re there because TblC[2] through TblC[4] were not

> -- given values in the constructor:

> print(TblC[1], TblC[2], TblC[3], TblC[4])

x nil nil nil

> TblD = {ReturnNothing(), ReturnNothing()}

> -- The first nil that follows is the result of adjustment; the

> -- second is there because TblD[2] was not given a value

> -- in the constructor:

> print(TblD[1], TblD[2])

nil nil

Array Length

The # (length) operator can be used to measure the length of an array. Normally, this number is also the index of the last element in the array, as in the following example:

> Empty = {}

> One = {"a"}

> Three = {"a", "b", "c"}

> print(#Empty, #One, #Three)

0 1 3

Apart from arrays with gaps (discussed shortly), the length operator gives the same result whether a table got the way it is purely because of its constructor (as previously shown) or because of assignments made to it after it was created like this:

> Empty = {"Delete me!"}

> Empty[1] = nil

> Three = {"a"}

> Three[2], Three[3] = "b", "c"

> print(#Empty, #Three)

0 3

It also doesn’t matter whether the constructor uses implicit or explicit integer indexing, as shown here:

> print(#{[1] = "a", [2] = "b"})

2

Noninteger indexes (or nonpositive integer indexes, for that matter) do not count—the length operator measures the length of a table as an array as follows:

> Two = {"a", "b", Ignored = true, [0.5] = true}

> print(#Two)

2

An array is said to have a gap if there are is a nil somewhere between element 1 and the highest positive integer element with a value that is not nil . Here is an example of a gap in an array:

T1 = {nil, "b", "c"} -- Gap between beginning and

-- element 2.

T2 = {"a", "b", nil, nil, "e"} -- Gap between element 2 and

-- element 5.

T3 = {"a", "b", "c", nil} -- No gap! (Element 3 is the last

-- element in the array.)

Arrays with gaps cause the length operator to behave unpredictably. The only thing you can be sure of is that it will always return the index of a non-nil value that is followed by a nil, or possibly 0 if element 1 of the array is nil . For example:

> T = {"a", "b", "c", nil, "e"}

> print(#T)

5

> -- Equivalent table; different result:

> T = {}

> T[1], T[2], T[3], T[4], T[5] = "a", "b", "c", nil, "e"

> print(#T)

3

For this reason, it’s generally a good idea to avoid having gaps in your arrays. However, an array is just a table used in a certain way, not a separate datatype, so this warning about avoiding gaps only applies if you’re planning to use a table as an array. This means there’s nothing wrong with the following table:

-- Considered as an array, this would have gaps, but it's

-- obviously intended as an associative table:

AustenEvents = {[1775] = "born",

[1811] = "Sense and Sensibility published",

[1813] = "Pride and Prejudice published",

[1814] = "Mansfield Park published",

[1816] = "Emma published",

[1817] = "died",

[1818] = "Northanger Abbey and Persuasion published"}

You can define # operator to always give either the first element followed by nil or the last, but both of these approaches require searches through the table much more time-consuming than the way that # actually works.

Lua 5.0 dealt with array length differently. In addition to the nonexistence of the # operator, the main differences were that the functions table.getn and table.setn were used to get and set the length of an array, and a table’s "n" field could be used to store its array length.

Looping through Tables

Printing a table gives similar results to printing a function:

> print({})

table: 0x493bc0

This means that to see what’s inside a table, you need to look at each key-value pair in turn. In the previous Squares example, this was done with a for loop hardcoded to run to 5 (the length of Squares). You could improve this by using the # operator, so that if the array’s length is changed, you only need to it in one place:

> Squares = {}

> for I = 1, 5 do

>> Squares[I] = I * 2

>> end

> for I = 1, #Squares do

>> print(I .. " squared is " .. Squares[I])

>> end

1 squared is 1

2 squared is 4

3 squared is 9

4 squared is 16

5 squared is 25

This is better, but there’s an even better way to loop through an array, as you’ll see in the following Try It Out.

Try It Out

Using ipairs to Loop through an Array

Type the following into the interpreter. The first of the two loops is the for loop you already know from Chapter 2. The second is still a for loop, but it looks and works a bit differently.

> Squares = {}

> for I = 1, 5 do

>> Squares[I] = I * 2

>> end

> for I, Square in ipairs(Squares) do

>> print(I .. " squared is " .. Square)

>> end

1 squared is 1

2 squared is 4

3 squared is 9

4 squared is 16

5 squared is 25

How It Works

The first for loop in this example loops through a series of numbers. The second one is a new type of for loop that loops (in this case) through an array. It’s called the generic for loop because, as you will see soon, it is able to iterate through anything at all (including tables that aren’t arrays and even things other than tables). The for loop you learned about in Chapter 2 is called the numeric for loop. You can tell the difference between them because the generic for will always include the keyword in.

The thing that makes this example’s generic for treat Squares as an array is the use of the function ipairs in the line:

for I, Square in ipairs(Squares) do

This line means “loop (in order) through each key-value pair in the array Squares, assigning the key and value to (respectively) the loop variables I and Square." To write your own similar loop, replace Squares with the array you want to loop through, and replace I and Square with the names you want to give to its keys and values. (As you’ll soon see, ipairs can be replaced when you don’t want to treat the thing being looped through as an array.)

Squares itself never needs to be indexed in the body of the loop, because Square is available.

If you use a generic for loop with ipairs to loop through an array that has gaps, it will stop when it reaches the first gap, as follows:

> for Number, Word in ipairs({"one", "two", nil, "four"}) do

>> print(Number, Word)

>> end

1 one

2 two

That means that, if an array has gaps, looping through it with a generic for and ipairs will not necessarily give the same results as looping through it with a numeric for whose end value is the length of the array.

A generic for loop that uses ipairs after the in keyword is often called an ipairs loop for short. If pairs is used instead of ipairs, then all key-value pairs, not just the array ones, are looped through.

A pairs loop has the same form as an ipairs loop:

for Key, Val in ipairs(Tbl) do

for Key, Val in pairs(Tbl) do

Try this pairs loop:

> NameToInstr = {John = "rhythm guitar",

>> Paul = "bass guitar",

>> George = "lead guitar",

>> Ringo = "drumkit"}

> for Name, Instr in pairs(NameToInstr) do

>> print(Name .. " played " .. Instr)

>> end

Ringo played drumkit

George played lead guitar

John played rhythm guitar

Paul played bass guitar

This is similar to an ipairs loop in that on each iteration, the first loop variable (Name) is set to a key in the table given to pairs, and the second loop variable (Instr) is set to that key’s value. One difference is that the first value no longer has to be a positive integer (although it could be, if there happened to be any positive integer keys in NameToInstr).

The second difference is that the key-value pairs are looped through in an arbitrary order. The order in which pairs occur in the table constructor does not matter. Nor is there any significance to the order in which keys are added or removed after a table is constructed. The only guarantee is that each pair will be visited once and only once. Tables in general have no intrinsic order (other than the arbitrary order shown by a pairs loop). Even the order shown by an ipairs loop is only a result of adding 1 to each index to get the next one. pairs often visits integer keys all together and in the correct order, but it’s not guaranteed to do so. For example:

> T = {A = "a", B = "b", C = "c"}

> T[1], T[2], T[3] = "one", "two", "three"

> for K, V in pairs(T) do

>> print(K,

>> end

A a

1 one

C c

B b

3 three

2 two

Both ipairs loops and pairs loops have the property that neither loop variable is ever nil. (This can be deduced from what has been said about nil keys and nil values.)

Like the loop variable in a numeric for, the loop variables in a generic for are local to each iteration. They can be assigned to, although because of their limited scope, there’s seldom a reason to do this:

> T = {Gleep = true, Glarg = false}

> for Fuzzy, Wuzzy in pairs(T) do

>> Fuzzy, Wuzzy = Fuzzy .. "ing", #tostring(Wuzzy)

>> print(Fuzzy, Wuzzy)

>> end

Gleeping 4

Glarging 5

> -- The table itself is unchanged:

> print(T.Gleep, T.Glarg)

true false

> print(T.Gleeping, T.Glarging)

nil nil

Because the assignments are made to the loop variables, and not to fields in the table itself, they do not alter the table’s contents. If you do want to alter the table’s contents, do an indexing assignment on the table itself, like this:

> T = {"apple", "banana", "kiwi"}

> for I, Fruit in ipairs(T) do

>> T[I] = Fruit .. "s"

>> end

> print(T[2])

bananas

Adding a previously nonexistent key to a table while looping over it with pairs has undefined results. If you need to do this, save a list of the changes you need to make in another table and apply them after the loop is over. You can remove a key (by setting its value to nil) and change a key’s value during a pairs loop.

You can use loop variables as upvalues to closures. As shown in the previous chapter (with a numeric for), each iteration means a new upvalue:

> -- A table that maps numbers to their English names:

> Numbers = {"one", "two", "three"}

> -- A table that will contain functions:

> PrependNumber = {}

> for Num, NumName in ipairs(Numbers) do

>> -- Add a function to PrependNumber that prepends NumName

>> -- to its argument:

>> PrependNumber[Num] = function(Str)

>> return NumName .. ": " .. Str

>> end

>> end

> -- Call the second and third functions in PrependNumber:

> print(PrependNumber[2]("is company"))

two: is company

> print(PrependNumber[3]("is a crowd"))

three: is a crowd

In this example, each time the loop iterates, a new function is created that prepends (appends to the front) a spelled-out number name to its argument and returns the result. These functions are placed, by number, into the PrependNumber table, so that when, for example, PrependNumber[2] is called, it prepends "two: " to its argument.

The notes about Lua 5.0's numeric for in the previous chapter also apply to the generic for. Assigning to the first (that is, leftmost) loop variable has undefined results, and the scope of the loop variables extends over the entire loop (not each individual iteration). This means that if you tried the PrependNumber example on Lua 5.0, you would get "attempt to concatenate a nil value" errors because both loop variables are set to nil when the end of the table is reached.

To loop through a table in a way not supported by either ipairs or pairs, use either while or the numeric for (along with some extra bookkeeping), or structure your data differently. An example of the latter is the following loop, which is a rewrite of the earlier pairs loop through NameToInstr that goes in the order specified by the table (it also serves as an example of tables within tables):

> NamesAndInstrs = {

>> {Name = "John", Instr = "rhythm guitar"},

>> {Name = "Paul", Instr = "bass guitar"},

>> {Name = "George", Instr = "lead guitar"},

>> {Name = "Ringo", Instr = "drumkit"}}

> for _, NameInstr in ipairs(NamesAndInstrs) do

>> print(NameInstr.Name .. " played " .. NameInstr.Instr)

>> end

John played rhythm guitar

Paul played bass guitar

George played lead guitar

Ringo played drumkit

Yet another option is to write your own function to use instead of ipairs or pairs. This is covered later in this chapter.

Tables of Functions

Using tables that contain functions is a handy way to organize functions, and Lua keeps many of its built-in functions in tables, indexed by strings. For example, the table found in the global variable table contains functions useful for working with tables. If you assign another value to table, or to one of the other global variables used to store built-in functions, the functions won’t be available anymore unless you put them somewhere else beforehand. If you do this accidentally, just restart the interpreter.

The Table Library

The functions contained in table are known collectively as the table library.

table.sort

One function in the table library is table.sort. Here is an example of how you use this function:

> Names = {"Scarlatti", "Telemann", "Corelli", "Purcell",

>> "Vivaldi", "Handel", "Bach"}

> table.sort(Names)

> for I, Name in ipairs(Names) do

>> print(I, Name)

>> end

1 Bach

2 Corelli

3 Handel

4 Purcell

5 Scarlatti

6 Telemann

7 Vivaldi

The table.sort function takes an array and sorts it in place. This means that, rather than returning a new array that’s a sorted version of the one given to it, table.sort uses indexing assignment (a side effect) on the given array itself to move its values to different keys. (See Chapter 3 for an explanation of side effects.)

table.sort uses the < operator to decide whether an element of the array should come before another element. To override this behavior, give a comparison function as a second argument to table.sort. A comparison function takes two arguments and returns a true result if and only if its first argument should come before its second argument.

table.sort only looks at a table as an array. It ignores any noninteger keys and any integer keys less than 1 or greater than the table’s array length. To sort a table that isn’t an array, you need to put its contents into an array and sort that array. The following Try It Out demonstrates this, as well as the use of a comparison function.

Try It Out

Sorting the Contents of an Associative Table

1. Save the following as sortednametoinstr.lua:

-- A demonstration of sorting an associative table.

NameToInstr = {John = "rhythm guitar",

Paul = "bass guitar",

George = "lead guitar",

Ringo = "drumkit"}

-- Transfer the associative table NameToInstr to the

-- array Sorted:

Sorted = {}

for Name, Instr in pairs(NameToInstr) do

table.insert(Sorted, {Name = Name, Instr = Instr}) end

-- The comparison function sorts by Name:

table.sort(Sorted, function(A, B) return A.Name < B.Name end)

-- Output:

for _, NameInstr in ipairs(Sorted) do

print(NameInstr.Name .. " played " .. NameInstr.Instr)

end

2. Run sortednametoinstr.lua by typing lua sortednametoinstr.lua into your shell.

The output is as follows (in alphabetical order by the player’s name):

George played lead guitar

John played rhythm guitar

Paul played bass guitar

Ringo played drumkit

How It Works

The contents of NameToInstr are transferred, one-by-one, into Sorted using the table.insert function. This function, like table.sort, works by side-effecting the table given to it rather than by returning a value. Specifically, it puts its second argument at the end of the array given as the first argument. For example, if the first argument is a fifteen-element array, it will be given a sixteenth element (the second argument). Take a look at the following argument:

table.insert(Arr, Val)

This has the same effect as the following:

Arr[#Arr + 1] = Val

Both table.sort and table.insert could be rewritten to have no side effect on the tables they are given, but they would then need to spend time making independent copies of those tables to return.

After Sorted has been populated, it can be passed to table.sort, but because each of its elements is itself a table, a comparison function needs to be given as well (otherwise table.sort would use < to compare the subtables, which would cause an error). The comparison function is quite simple. It just asks whether the Name element of its first argument is less than that of its second argument. It would be very easy to change it to sort by Instr instead, or (by using >) to have it sort in reverse order.

The comparison function accepted by table.sort is an example of a callback. A callback is a function that you write to be called by a library function. It gets its name from the fact that it allows a library to call back into code you have written (reversing the normal situation, in which you call a function in the library).

For efficiency, table.sort performs an unstable sort, which means that two elements that are considered equal by the comparison function may end up in a different order than they started in.

If you need a stable sort, one solution is to record all the elements’ original positions and have the comparison function use that as a tiebreaker.

If you’re using table.sort with a comparison function, and you’re getting errors that you can’t make sense of within your comparison function or within table.sort itself, your comparison function may be at fault. table.sort relies on a comparison function having consistent results—it should always return false for things that it considers equal, it should never say that A is less than B if it’s already said that B is less than A, it should say that A is less than C if it’s already said that A is less than B and B is less than C, and so on.

In the following example, the comparison function returns inconsistent results. It says that, for sorting purposes, 5 is less than 5. This confuses table.sort, hence the following error:

> T = {5, 5, 10, 15}

> table.sort(T,

>> function(A, B)

>> return not (A < B) -- BAD COMPARISON FUNCTION!

>> end)

stdin:3: attempt to compare nil with number

stack traceback:

stdin:3: in function <stdin:2>

[C]: in function 'sort'

stdin:1: in main chunk

[C]: ?

The desired effect of not (A < B) was presumably to sort in reverse order. Either A > B or B < A would have had that effect.

table.concat

The function table.concat takes an array of strings (or numbers) and concatenates them all into one string, as follows:

> print(table.concat({"a", "bc", "d"}))

abcd

If given a second argument, it puts it in between the elements of the array like this:

> -- Returns a string showing an array's elements separated by

> -- commas (and spaces):

> function CommaSeparate(Arr)

>> return table.concat(Arr, ", ")

>> end

> print(CommaSeparate({"a", "bc", "d"}))

a, bc, d

Normally, all elements from the first to the last will be concatenated. To start concatenating at a different element, give its index as the third argument of table.concat; to stop concatenating at a different element, give its index as the fourth argument of table.concat:

> Tbl = {"a", "b", "c", "d"}

> —Concatenate the second through last elements:

> print(table.concat(Tbl, "", 2))

bcd

> —Concatenate the second through third elements:

> print(table.concat(Tbl, "", 2, 3))

bc

If any of the second through fourth arguments are nil, the defaults of (respectively) the empty string, 1, and the length of the array are used, as follows:

> print(table.concat(Tbl, nil, nil, 3))

abc

If the third argument is greater than the fourth argument, the empty string is returned, like this:

> print(table.concat(Tbl, 4, 1) == "")

true

table.remove

The table.insert function (seen in the most recent Try It Out) has a counterpart that removes elements from an array, table.remove. By default, both work on the last element of the array (or the top element when viewing the array as a stack). table.remove works by side effect like table.insert does, but it also returns a useful value—the element it just removed—as follows:

The following examples use the CommaSeparate function defined in the previous example.

> T = {}

> table.insert(T, "a")

> table.insert(T, "b")

> table.insert(T, "c")

> print(CommaSeparate(T))

a, b, c

> print(table.remove(T))

c

> print(CommaSeparate(T))

a, b

> print(table.remove(T))

b

> print(CommaSeparate(T))

a

> print(table.remove(T))

a

> -- T is now empty again:

e> print(#T)

0

Both of these functions take an optional second argument that specifies the position at which to insert or remove an element. (In the case of table.insert, this means that the thing to be inserted is either the second or the third argument, depending on whether a position argument is given.) Any elements above that inserted or removed are shifted up or down to compensate, like this:

> T = {"a", "b", "c"}

> table.insert(T, 2, "X")

> -- C is now the fourth element:

> print(CommaSeparate(T))

a, X, b, c

> print(table.remove(T, 2))

X

> -- C is the third element again:

> print(CommaSeparate(T))

a, b, c

table.maxn

The function table.maxn looks at every single key-value pair in a table and returns the highest positive number used as a key, or 0 if there are no positive numbers used as keys. For example:

> print(table.maxn({"a", nil, nil, "c"}))

4

> print(table.maxn({[1.5] = true}))

1.5

> print(table.maxn({["1.5"] = true}))

0

One possible use for table.maxn is to find the length of arrays with gaps, but keep in mind both that it is time-consuming in proportion to the size of the table (including nonnumeric keys) and that it considers fractional keys as well as integers.

That covers all the functions in Lua’s built-in table library. Among Lua’s other built-in libraries are the string library (whose functions are found in the string table), the mathematical library (in the math table), the input/output library (in the io table), and the basic or base library (functions like print, tostring, and pairs). You’ll learn about these and other built-in libraries throughout the book. In particular, the next chapter will discuss the string library in detail.

Object-Oriented Programming with Tables

Another use for tables is in what is known as object-oriented programming. In this style of programming, functions that deal with a particular type of value are themselves part of that value. Such a value is called an object, and its functions are called methods.

The term “object” is also sometimes used in a more general sense, to mean a value (such as a table or function) that is not equal to any other value created at a different time.

It’s quite easy to rewrite the MakeGetAndInc example from Chapter 3 to return a two-method object rather than two functions. Here’s how:

-- Returns a table of two functions: a function that gets

-- N's value, and a function that increments N by its

-- argument.

function MakeGetAndInc(N)

-- Returns N:

local function Get()

return N

end

-- Increments N by M:

local function Inc(M)

N = N + M end

return {Get = Get, Inc = Inc}

end

An object is created and used like so:

> -- Create an object:

> A = MakeGetAndInc(50)

> -- Try out its methods:

> print(A.Get())

50

> A.Inc(2)

> print(A.Get())

52

This is an improvement on the previous technique in that only the newly created object needs to be given a name (rather than both functions), and in that the functions are bundled up into an object (so that the whole object can be passed around the program as a unit).

Both of these advantages are greater the more methods there are, and this is an acceptable way of implementing objects. But it has two disadvantages: each time an object is created (or instantiated), a closure needs to be created for each method, and an object’s state is stored in multiple places (as an upvalue in each method) rather than in one place.

The creation of a closure for each method is really only a disadvantage for efficiency reasons. In a program that instantiates several new multiple-method objects a second, creating all those closures could have a noticeable speed impact.

The second point, about state being stored as an upvalue within each method, means that you can use a method apart from its object, as shown here:

> A = MakeGetAndInc(50)

> Inc, Get = A.Inc, A.Get

> A = nil

> -- The methods are still usable even though A is no longer

> -- accessible:

> Inc(2)

> print(Get())

This might occasionally be convenient, but it’s usually just confusing.

A technique that avoids these problems is to store the object’s state in the object (table) itself, and have the methods be, rather than closures, just regular functions that take the object as an argument:

-- Returns Obj.N:

function Get(Obj)

return Obj.N

end

-- Increments Obj.N by M:

function Inc(Obj, M)

Obj.N = Obj.N + M

end

-- Creates an object:

function MakeGetAndInc(N)

return {N = N}

end

The Inc method of an object A would then be called like Inc(A, 5) , which means you’d need to keep track of which methods go with which objects. You wouldn’t need to keep track of this if the methods were (as in an earlier example) fields of their objects, but you’d still need to type the object’s name twice: A.Inc(A, 5).

To get around this problem, Lua offers a bit of syntactic sugar. Syntax just means grammar—the rules of how operators, variable names, parentheses, curly braces, and so on can fit together to make a valid Lua program. And syntactic sugar just means an extension to Lua’s syntax that doesn’t give Lua any new powers, but does make programs easier to type or read. For example, the equivalence between a function statement and an assignment statement with a function expression as a value (which you learned about in the previous chapter) is due to the former being syntactic sugar for the latter.

Similarly, when Lua sees something that looks like A:Inc(5) (note the colon), it treats it as though it were A.Inc(A, 5) . A is used both as the source for the Inc function and as the first argument to that function. Because the methods in the previous example are written to expect their first argument to be the object, the only change you need to make in order to use colon syntax is to include Get and Inc in the object that MakeGetAndInc returns. (Get and Inc are also made local below this, because they no longer need to be used anywhere but inside MakeGetAndInc.) Now the methods are called right from the object, just as in the example at the beginning of this section, but with a colon substituted for the dot:

> do -- Local scope for Get and Inc.

>> -- Returns Obj.N:

>> local function Get(Obj)

>> return Obj.N

>> end

>> --Increments Obj.N by M:

>> local function Inc(Obj, M)

>> Obj.N = Obj.N + M

>> end

>> --Creates an object:

>> function MakeGetAndInc(N)

>> return {N = N, Get = Get, Inc

>> end

>> end

>

> -- Create an object:

> A = MakeGetAndInc(50)

> -- Try out its methods:

> print(A:Get())

50

> A:Inc(2)

> print(A:Get())

52

There’s also syntactic sugar for defining methods: function T:F(X) is equivalent to function T.F(self, X), which itself is equivalent to T.F = function(self, X). You can rewrite the preceding example to use this if you make a table in which you can put Get and Inc, and if you have them use self instead ofObj as a name for their (now implicit) first argument. Here’s how:

do -- Local scope for T.

-- A table in which to define Get and Inc:

local T = {}

-- Returns self.N:

function T:Get()

return self.N

end

-- Increments self.N by M:

function T:Inc(M)

self.N = self.N + M

end

-- Creates an object:

function MakeGetAndInc(N)

return {N = N, Get = T.Get, Inc = T.Inc}

end

end

Note the following about this example:

· If the colon syntax is used to define a function, Lua itself will take care of inserting the formal self argument. If you forget this and try do to it yourself by typing function T:Get(self), then Lua will treat that as though it were functionT.Get(self, self), which is not what you want.

· Get and Inc are neither local nor global—they are fields of a (local) table. local function T:Get() would be wrong for the same reason that local T.Get = function(self) would be wrong—the local keyword is for creating new local variables, but T.Get is not a variable name (it’s the name of a table field).

· Because the colon syntaxes for function calls and function definitions are just syntactic sugar, you can mix and match them. You can use the dot syntax to call a function defined with the colon syntax, and you can use the colon syntax to call a function defined with the dot syntax (assuming, of course, that the actual arguments correspond with the formal arguments after translating from colon syntax to dot syntax).

· T is only used as a container for Get and Inc up to the point they’re put into a real object. If there were something else that all objects needed to have in common (for instance, a default value for N), T would be a good place to put it.

Later in this chapter, you’ll see an extended example that uses the colon syntax for something more interesting than incrementing numbers.

Functions with Variable Numbers of Arguments

Functions that accept variable numbers of arguments are called vararg functions and, as promised in the previous chapter, you’ll now learn how to write them.

Defining Vararg Functions

The Average function returns the average of all its arguments. It also introduces the built-in function assert, which does nothing if its first argument is true and triggers an error if it’s false. (The second argument is used as the error message.) Here’s an example of how you use the Average function:

> -- Returns the average of all its arguments:

> function Average(...)

>> local Ret, Count = 0, 0

>> for _, Num in ipairs({...}) do

>> Ret = Ret + Num

>> Count = Count + 1

>> end

>> assert(Count > 0, "Attempted to average zero numbers")

>> return Ret / Count

>> end

>

> print(Average(1))

1

> print(Average(41, 43))

42

> print(Average(31, -41, 59, -26, 53))

15.2

> print(Average())

stdin:7: Attempted to average zero numbers

stack traceback:

[C]: in function 'assert'

stdin:7: in function 'Average'

stdin:1: in main chunk

[C]: ?

The Average function’s formal argument list consists only of ... (three dots), which tells Lua that Average is a vararg function. Within a vararg function, three dots can be used as an expression, which is called a vararg expression. A vararg expression, like a function call, can evaluate to zero or more values. The vararg expression in Average is inside a table constructor. When Average is called with 1 as an argument, it is as though the table constructor looked like {1}. When it’s called with 41 and 43 as arguments, it’s as though the table constructor looked like {41, 43}. When it’s called with 31, -41, 59, -2 6, and 53 as arguments, it’s as though the table constructor looked like {31, -41, 59, -26, 53}. And when it’s called with no arguments, it’s as though the table constructor looked like {} .

You can use a vararg expression anywhere any other expression can be used. It follows exactly the same adjustment rules as a function call. For example, the vararg expression in the following assignment would be adjusted to two values:

Var1, Var2

Both vararg expressions in the following return statement would be adjusted to one value:

return ... , (...)

And the one in the following function call would not be adjusted at all—all of its zero or more values would be passed along to print:

print("args here:", ...)

A vararg expression includes any nil passed to the function, as follows:

> function F(...)

>> print(... )

>> end

>

> F(nil, "b", nil, nil)

nil b nil nil

A vararg function can also have regular (named) formal arguments, in which case the three dots come last and catch any actual arguments that are left over after the leftmost ones are assigned to the named formal arguments. Here’s an example that makes that clearer:

> function F(Arg1, Arg2, ... )

>> print("Arg1 and Arg2:", Arg1, Arg2)

>> print("The rest:", ...)

>> end

>

> F()

Arg1 and Arg2: nil nil

The rest:

> F("a")

Arg1 and Arg2: a nil

The rest:

> F("a", "b")

Arg1 and Arg2: a b

The rest:

> -- Now there will be arguments left over after Arg1 and

> -- Arg2 have been taken care of:

> F("a", "b", "c")

Arg1 and Arg2: a b

The rest: c

> F("a", "b", "c", "d")

Arg1 and Arg2: a b

The rest: c d

A vararg expression cannot be used as an upvalue. Again, this will make more sense with an example. Let’s say you want to write a MakePrinter function. MakePrinter will return a function that takes no arguments and prints all the arguments given to MakePrinter. The obvious way to write this is like this:

function MakePrinter(...)

return function()

print(...) THIS DOESN'T WORK!

end

end

But if you type that in, Lua will complain partway through:

> function MakePrinter(...)

>> return function()

>> print(...) -- THIS DOESN'T WORK!

stdin:3: cannot use '...' outside a vararg function near '...'

The anonymous function is not a vararg function. The vararg expression used in it is local to MakePrinter, which makes it an upvalue in the anonymous function, and because vararg expressions can’t be used as upvalues, another way needs to be found to make the MakePrinter arguments available inside the anonymous function. That part is actually quite easy—just put the vararg expression inside a table constructor, and use the variable holding that table as the upvalue. The hard part is calling print with each of the table’s values as arguments. That’s easy to do with the unpack function, which takes an array as its first argument and returns, in order, all of the elements of that array (up to the array’s length). For example:

> function MakePrinter(...)

>> local Args = {...}

>> return function()

>> print(unpack(Args))

>> end

>> end

>

> Printer = MakePrinter("a", "b", "c")

> Printer()

a b c

Because unpack uses its argument’s length, it may not act right with an array that has gaps, which Args will if MakePrinter is given any nil arguments. The fix for this involves extra arguments to unpack, as well as a new built-in function, select.

The first select argument is a positive integer. If it’s 1, select will return all its additional arguments; if it’s 2, select will return all its additional arguments except for the first; and so on:

> print(select(1, "a", "b", "c"))

a b c

> print(select(2, "a", "b", "c"))

b c

> print(select(3, "a", "b", "c"))

c

> -- This returns nothing:

> print(select(4, "a", "b", "c"))

>

As a special case, if the first select argument is the string “#”, then it returns how many additional arguments it received, as follows:

> print(select("#"))

> print(select("#", "a"))

1

> print(select("#", "a", "b"))

2

> print(select("#", "a", "b", "c"))

3

It’s this “#” usage that lets you find out how many values (including nils) are in a vararg expression (or in any expression that can have multiple values, for that matter):

> function F(...)

>> print(select("#", ...))

>> end

>

> F(nil, "b", nil, nil)

4

You might think that # ... would get the length of a vararg expression, but all it really does is get the length of the first element of the vararg expression (which, being used as an operand, is not eligible for multiple-value treatment and so is adjusted to one value).

unpack takes a second and third argument specifying where it starts and stops, getting values out of the table given to it:

If these arguments are not given, they default to 1 and the length of the table.

> -- Get elements 2 through 4 (inclusive):

> print(unpack({"a", "b", "c", "d", "e"}, 2, 4))

b c d

Here’s the rewritten version of MakePrinter that handles nils properly. It uses select("#", ...) to count MakePrinter’s arguments, and when it calls unpack, it unpacks all elements from the first up to however many arguments it counted:

> function MakePrinter(...)

>> local Args = {...}

>> local ArgCount = select("#", ...)

>> return function()

>> print(unpack(Args, 1, ArgCount))

>> end

>> end

>

> Printer = MakePrinter(nil, "b", nil, nil)

> Printer()

nil b nil nil

If a vararg function doesn’t contain any vararg expressions, then a local arg variable is created and initialized to a table of all the extra arguments. arg.n is the number of extra arguments. It is as though the first lines of the function were as follows:

local arg = {...}

arg.n = select("#", ...)

This is done so that vararg functions written for Lua 5.0 will run on Lua 5.1. Lua 5.0 had no vararg expression, so vararg arguments were always put in such an arg table.

In addition to the lack of the vararg expression and use of arg, Lua 5.0 did not have the select function, and its unpack function took only one argument.

Scripts as Vararg Functions

You already know that chunks are functions. In this section, you’ll see that they are vararg functions. In particular, Lua scripts are vararg functions and they can be given arguments on the command line.

Try It Out

Creating Command-Line Arguments

1. Save the following as cmdlineargs.lua:

-- This script lists (by number) all arguments given to it

-- on the command line.

local Count = select("#", ...)

if Count > 0 then

print("Command-line arguments:")

for I = 1, Count do

print(I, (select(I, ...))) -- The parentheses adjust

-- select to one value.

end

else

print("No command-line arguments given.")

end

2. Run it by typing the following into your shell:

lua cmdlineargs.lua this is a test

The output should be as follows:

Command-line arguments:

1 this

2 is

3 a

4 test

How It Works

When you type the name of a program (lua) into the shell, the words that come after it are called command-line arguments. The shell passes these arguments along to the program. In this case, lua treats the first command-line argument, cmdlineargs.lua, as the name of a program. It compiles that program . into a function, and calls the function with the remaining command-line arguments (the strings "this", "is", "a", and "test").

This example also shows how you can use select to access arguments without putting them into a table first.

The shell gives special meaning to some characters. For example, it treats spaces as argument separators. If you want to include a special character in an argument, you need to escape it or quote it. The exact rules for escaping or quoting characters vary from shell to shell, but something like the following:

lua cmdlineargs.lua "this is a test" "" "<*>"

generally results in this:

Command-line arguments:

1 this is a test

2

3 <*>

The second argument is the empty string.

Command-line arguments are always strings, which means you don’t have to worry about a gap caused by a nil.

lua treats some of its command-line arguments specially. These are called options, and they all start with a hyphen. The following table lists the lua options:

4-t1

Here are a couple examples. Starting lua like this

lua -e "print('Hello')"

prints "Hello", but does not enter interactive mode (the Lua interpreter). Starting it like this:

lua -i sortednametoinstr.lua

runs sortednametoinstr.lua and then enters interactive mode, where you have access to any global variables it created (which in this case are NameToInstr and Sorted):

Lua 5.1.1 Copyright (C) 1994-2006 Lua.org, PUC-Rio

George played lead guitar

John played rhythm guitar

Paul played bass guitar

Ringo played drumkit

> print (NameTolnstr.John)

rhythm guitar

> print (Sorted[4].Name)

Ringo

The options -e and -l can be combined with the command-line arguments immediately following them. The following example of -e does exactly the same thing as the one given earlier:

lua "-eprint('Hello')"

A script’s command-line arguments are also available in the global table arg, even if a vararg expression is used in the script. This is done to give access to the script’s name (found at arg[0]) and the interpreter’s name and any options (found at negative indexes). If lua is started with the following:

lua -i cmdlineargs.lua this is a test

then this is the arg:

{[-2] = "lua",

[-1] = "-i",

[0] = "cmdlineargs.lua",

[1] = "this",

[2] = "is",

[3] = "a",

[4] = "test"}

Notice that, unlike the arg described in the previous section, this one doesn’t have its length in arg.n. The length is easy enough to find out, though (for instance, with #arg, or with a loop if you want to count the nonpositive indexes).

Keyword Arguments

In the previous chapter, you saw that a function call whose sole argument is a literal string doesn’t require parentheses. The same applies to table constructors. For example:

> print {}

table: 0x493760

> print{}

table: 0x493978

This can be used to simulate keyword arguments—arguments that are identified not by their position, but by being associated with an identifier. In all three of the following examples, a function called Sort is being called with its Arr keyword argument set to T and its CompFnc keyword argument set to F (the last example reveals that all that’s really going on is that an associative table is being passed to Sort):

Sort{Arr = T, CompFnc = F}

Sort{CompFnc = F, Arr = T}

Sort({Arr = T, CompFnc = F})

There is no special syntax for defining functions with keyword arguments—they’re just defined to take a single table as an argument. For example, you could define Sort as follows:

A wrapper for table.sort that takes keyword arguments:

function Sort(KeyArgs)

local Arr = KeyArgs.Arr -- The array to be sorted.

local CompFnc = KeyArgs. CompFnc -- Comparison function.

or function(A, B) return A < B end -- Default.

if KeyArgs.Reverse then

-- Reverse the sense of the comparison function:

local OrigCompFnc = CompFnc

CompFnc = function(A, B)

return OrigCompFnc(B, A)

end

end

table.sort(Arr, CompFnc)

return Arr

end

The Reverse argument reverses the sense of the comparison function. When no CompFnc is given, but Reverse is set to true, the sense of the default comparison function is reversed, which sorts the table in reverse order:

> Letters = {"a", "b", "c"}

> Sort{Reverse = true, Arr = Letters}

> print(table.concat(Letters))

cba

The usual reasons for writing a function to take keyword arguments are that it has a lot of optional arguments, or that it has so many arguments that it’s hard to remember what order they go in.

Different but the Same

A common problem in understanding how Lua works comes from the fact that tables are mutable, which means they can be changed. Side-effecting a table—changing it using indexing assignment—is called mutating the table. There’s no way to mutate a string (or a number, Boolean, or nil)it can only be replaced with a different value. (Strings, numbers, Booleans, and nil are therefore said to be immutable.) But a table can be mutated, and afterward it will have different content, but it will still be the same table.

Table Equality

Because tables are mutable, there needs to be a way to tell whether two tables are really the same table or not (so that you can tell if a mutation of one will be visible in the other). You do this with the == (equal-ity) operator. When two tables are tested for equality, their contents are not looked at. Rather, they are considered equal if and only if they were created by the same table constructor at the same time (and are therefore the same table). In the following example, A and B are equal because they were created by the same table constructor at the same time:

> A = {}

> B = A

> print(A == B)

true

In the next example, C and D are unequal because they were created by two different table constructors, and E and F are unequal because they were created by the same table constructor at different times:

> C, D = {}, {}

> print(C == D)

false

>

> function CreateTbl()

>> return {}

>> end

>

> E, F = CreateTbl(), CreateTbl()

> print(E == F)

false

Functions follow the same equality rule as tables: Two functions are equal if and only if they were created by the same function expression (or function statement) at the same time. This is because both table constructors and function expressions create new objects (using the term “object” in the broad sense).

Avoiding Bugs by Understanding Mutability

Among other things, you can use mutability to model things in the real world, most of which are mutable. For example, this book would still be the same book if you “mutated” it by writing your name in it. But mutability can also be a source of bugs, if you don’t keep track of what’s what. In the real world, you would never confuse having both hands on the same book with having each hand on a different book. But in Programming Land, it’s not too tough to forget that two variables (or two table fields, or a variable and a table field) both contain the same table.

For example, imagine the following variant of table.sort, which still sorts its first argument in place but also returns it:

function TableSort(Arr, CompFnc)

table.sort(Arr, CompFnc)

return Arr

end

This would be convenient in some cases, letting you sort an array and pass it to another function in one statement like this:

SomeFnc(TableSort(SomeArr))

instead of in two statements like this:

table.sort(SomeArr)

SomeFnc(SomeArr)

But it might also imply Sorted is sorted and SomeArr is unsorted after the following line:

local Sorted = TableSort(SomeArr)

If you write a function that side-effects a table given to it, make sure that’s clear in any documentation you write for the function. If you’re using a function that someone else wrote, make sure you know if the function causes side effects in any tables given to it.

Variables and Mutable Values

In Chapter 2, you saw an illustration of the cubbyhole model (shown in Figure 4-1) and the arrow model (shown in Figure 4-2) of the association between variables and their values.

4-1

Figure 4-1

4-2

Figure 4-2

Both of these models are accurate for immutable values (ignoring memory usage). But the cubbyhole model doesn’t work for mutable values. Consider the following code:

> A, B = {}, {}

> C = B

> -- Before

> B.Test = "test"

> -- After

> print(C.Test)

test

An arrow diagram of the variables as of the Before comment (shown in Figure 4-3) reflects the fact that B and C are the same table.

4-3

Figure 4-3

It’s an easy step from there to an accurate arrow diagram of the variables as of the After comment, as shown in Figure 4-4.

4-4

Figure 4-4

A cubbyhole diagram as of the Before comment (see Figure 4-5) does not show that B and C are the same table.

4-5

Figure 4-5

It thus could lead to an incorrect diagram as of the After comment, as shown in Figure 4-6):

4-6

Figure 4-6

Tables and Functions

You saw earlier that functions follow the same equality rule as tables. Another thing that functions have in common with tables is mutability. Closure functions can be mutated by calling them, as is done with Counter in the following example:

> do

>> local Count = 0

>>

>> function Counter()

>> Count = Count + 1

>> return Count end

>> end

>> end

> print(Counter())

1

> print(Counter())

2

> print(Counter())

3

A difference between tables and functions is that tables do not have upvalues. A local variable inside a function is evaluated every time the function is called, but a local variable inside a table constructor is evaluated only once, while the table is being constructed. That’s why, in the following code, Tbl.Str is still "before" even after Str has been set to "after":

> do

>> local Str = "before"

>> Fnc = function() return Str end

>> Tbl = {Str = Str}

>> Str = "after"

>> end

> print(Fnc())

after

> print(Tbl.Str)

before

It’s easy to get an upvalue-like effect by assigning to a table field instead of a local, like this:

> do

>> local Str = "before"

>> Tbl = {Str = Str}

>> Tbl.Str = "after"

>> end

> print(Tbl.Str)

after

If you want multiple tables to share state, have them share a subtable, as Tbl1and Tbl2 share SubTbl here:

> do

>> local SubTbl = {Str = "before"}

>> Tbl1 = {SubTbl = SubTbl}

>> Tbl2 = {SubTbl = SubTbl}

>> end

> Tbl1.SubTbl.Str = "after"

> print(Tbl2.SubTbl.Str)

after

If you’re familiar with the distinction between pass by value and pass by reference, you may think that Lua passes immutable values by value and mutable values by reference, but that isn’t true—arguments are always passed by value. A function’s caller can tell whether the function did an indexing assignment to a table the caller gave it, but not whether the function did a regular assignment to one of its arguments.

If you absolutely needed to use the language of values versus references to describe Lua’s treatment of mutable values, you could say that mutable values themselves are references.

Copying Tables

Sometimes you need to make a copy of a table. For example, if you want to sort a table without altering the unsorted table, you need to make a copy and sort that. The simplest way (which is all you need in many circumstances) is to make a shallow copy:

-- Makes a shallow copy of a table:

function ShallowCopy(Src)

local Dest = {}

for Key, Val in pairs(Src) do

Dest[Key] = Val

end

return Dest

end

In this example, ShallowCopycreates a fresh table (Dest) and then loops through all key-value pairs in Src, putting each value into Dest at the appropriate key. (Src and Dest stand for “source” and “destination.”) This is called a shallow copy because it doesn’t burrow deep into Src if any of Src’s values or keys are tables, those very tables will be put into Dest, not copies of them. Copying subtables as well as the top level of a table is called making a deep copy. You can change a ShallowCopy to make a deep copy by adding the following two lines (and the name, of course):

-- Makes a deep copy of a table. Doesn't properly handle

-- duplicate subtables.

function DeepCopy(Src)

local Dest = {}

for Key, Val in pairs(Src) do

Key = type(Key) == "table" and DeepCopy(Key) or Key

Val = type(Val) == "table" and DeepCopy(Val) or Val

Dest[Key] = Val

end

return Dest

end

The new lines test whether a key or value is a table. If it is, they call DeepCopy recursively to make a deep copy of it. Unlike ShallowCopy, a copy made by DeepCopy will never have any subtables in common with the original, which means that the copy can have side effects without affecting the original. For example:

> Bodyl = {Head = {"Eyes", "Nose", "Mouth", "Ears"},

>> Arms = {Hands = {"Fingers"}},

>> Legs = {Feet = {"Toes"}}}

> Body2 = DeepCopy(Bodyl)

> print(Body1.Legs.Feet[1], Body2.Legs.Feet[1])

Toes Toes

> Body2.Legs.Feet[1] = "Piggies"

> -- If ShallowCopy had been used, this would print

> -- Piggies Piggies:

> print(Body1.Legs.Feet[1], Body2.Legs.Feet[1])

Toes Piggies

>

There are two problems with this version of DeepCopy. One is that it treats functions the same as it treats anything else that isn’t a table—it doesn’t make copies of them. There are ways to copy functions, but none of them is completely general, unless you use an add-on library such as Pluto. (A general solution needs to treat upvalues correctly, including upvalues shared between functions.) Copying functions is seldom necessary, though, so you can ignore this problem here.

Pluto is a persistence library, which means that it allows arbitrary Lua values to be saved to disk and reloaded later, even after Lua has been restarted. It’s available at luaforge.net.

The other problem is more serious. If a table appears more than once within the table being copied, it shows up as different tables in the copy. For example:

> SubTbl = {}

> Orig = {SubTbl, SubTbl, SubTbl}

> Copy = DeepCopy(Orig)

> -- Orig contains the same table three times:

> for I, SubTbl in ipairs(Orig) do

>> print(I, SubTbl)

>> end

1 table: 0x4a0538

2 table: 0x4a0538

3 table: 0x4a0538

> -- Copy contains three different tables:

> for I, SubTbl in ipairs(Copy) do

>> print(I, SubTbl)

>> end

1 table: 0x4a0a08

2 table: 0x4a0a48

3 table: 0x4a0a98

Something even more interesting happens when you pass DeepCopy a table that has a cycle. A table is said to have a cycle if it contains any table (including itself) that directly or indirectly contains itself. In the following example, T is such a table; to copy it, DeepCopy first needs to copy T.T, but to copy that, it needs to copy T.T.T, and so on. Because these are all the same table, DeepCopy keeps recursing until it runs out of stack space or you interrupt it:

> T = {}

> T.T = T

> -- The same table, within itself:

> print(T, T.T.T.T.T.T.T)

table: 0x495478 table: 0x495478

> T2 = DeepCopy(T)

stdin:3: stack overflow

stack traceback

stdin:3 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

...

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:5 in function 'DeepCopy'

stdin:1 in main chunk

[C]: ?

There is a general solution to this problem, and it involves keeping track of what tables have already been copied. You use it in the following Try It Out.

Try It Out

Copying Subtables Correctly

1. Enter this version of DeepCopy into an interpreter session:

-- Makes a deep copy of a table. This version of DeepCopy

-- properly handles duplicate subtables, including cycles.

-- (The Seen argument is only for recursive calls.)

function DeepCopy(Src, Seen)

local Dest

if Seen then

-- This will only set Dest if Src has been seen before:

Dest = Seen[Src]

else

-- Top-level call; create the Seen table:

Seen = {}

end

--If Src is new, copy it into Dest:

if not Dest then

-- Make a fresh table and record it as seen:

Dest = {}

Seen[Src] = Dest

for Key, Val in pairs(Src) do

Key = type(Key) == "table" and DeepCopy(Key, Seen) or Key

Val = type(Val) == "table" and DeepCopy(Val, Seen) or Val

Dest[Key] = Val

end

end

return Dest

end

2. Now test it with a particularly hairy case—a table that contains itself as both a key and a value:

> T = {}

> T[T] = T

> T2 = DeepCopy(T)

> -- T2 really is cyclical:

> print(T2[T2][T2][T2][T2][T2][T2])

table: 0x703198

> -- And a side effect to it isn't visible in the source

> -- table:

> T2.Test = "test"

> print(T2[T2][T2][T2].Test)

test

> print(T[T][T][T].Test)

nil

How It Works

This version of DeepCopy works by storing every copy it makes as a value in an associative table, the key being that copy’s source table. That lets it avoid making more than one copy of a given table. This mapping between already seen source and destination tables is in the second argument of DeepCopy, which is Seen.

When you called DeepCopy, you gave it only one argument. It saw that Seen was nil and initialized it to an empty table. It then saw that Dest was nil (because it hadn’t been found in Seen), so it created a fresh destination table, assigned it to Dest, and established an association between the source table and the destination table in Seen. Then it looped through Src. This part of DeepCopy is almost the same as the previous version. The only difference is that recursive calls pass the Seen argument. When the first recursive call (for Key) was made, that call saw that Seen was set, so it assigned Seen[Src]to Dest. If Src had not been seen yet, that assignment would have done nothing, and the loop inside the next if statement would have been entered. But in this case, a table got assigned to Dest, so the loop was skipped. The same happened with the second recursive call (for Val ).

Building Other Data Structures from Tables

In Lua, tables serve the same purposes as what other languages call tables, dictionaries, associative arrays, or hash tables, such as the following:

Potluck = {John = "chips", Jane = "lemonade",

Jolene = "egg salad"}

and what other languages call arrays or vectors, such as these:

Days = {"Monday", "Tuesday", "Wednesday", "Thursday",

"Friday", "Saturday", "Sunday"}

You can build other data structures out of tables as well. For example, you can use table.insert and table.remove to treat a table as a stack, and use tables within tables to represent tree-structured data—data that branches out like a tree, as shown in Figure 4-7.

4-7

Figure 4-7

The diagram in the figure could be represented as follows:

{Person = {

Living = {"Roberto lerusalimschy", "Gary Larson"},

Dead = {"Jane Austen", "Archimedes"}},

Place = {"Rio de Janeiro", "The North Pole"}}

Special-purpose data structures like these can be accessed and manipulated like ordinary tables, but if they behave differently enough from tables, you can write special-purpose functions that work with them. It may be convenient to use the colon syntax to attach such functions to the data structures themselves. The following Try It Out is an example of this. It’s an implementation of a ring, a data structure that is something like a stack, except the top (referred to in the exercise as the current element) can be moved, and the top and bottom act like they’re hooked onto each other.

Try It Out

Using a Table as a Ring

1. Save the following file as ring.lua:

-- A ring data structure:

-- Returns X mod Y, but one-based: the return value will

-- never be less than 1 or greater than Y. (Y is assumed to

-- be positive.)

local function OneMod(X, Y)

return (X - 1) % Y + 1

end

-- A table in which to create the methods:

local Methods = {}

-- Inserts a new element into self:

function Methods:Push(Elem)

table.insert(self, self.Pos, Elem)

end

-- Removes the current element from self; returns nil if

-- self is empty:

function Methods:Pop()

local Ret

if #self > 0 then

Ret = table.remove(self, self.Pos)

-- Keep self.Pos from pointing outside the array by

-- wrapping it around:

if self.Pos > #self then

self.Pos = 1

end

end

return Ret

end

-- Rotates self to the left:

function Methods:RotateL()

if #self > 0 then

self.Pos = OneMod(self.Pos + 1, #self)

end

end

-- Rotates self to the right:

function Methods:RotateR()

if #self > 0 then

self.Pos = OneMod(self.Pos - 1, #self)

end

end

--Returns the ring's size:

function Methods:Size()

return #self

end

-- Returns a string representation of self:

function Methods:ToString()

-- Convert the parts of self to the left and to the right

-- of self.Pos to strings:

local LeftPart = table.concat(self, ", ", 1, self.Pos - 1)

local RightPart = table.concat(self, ", ", self.Pos, #self)

-- Only put a separator between them if neither is the

-- empty string:

local Sep

if LeftPart == "" or RightPart == "" then

Sep = ""

else

Sep = ",

end

-- RightPart's first element is self.Pos, so put it first:

return RightPart .. Sep .. LeftPart

end

-- Instantiates a ring:

function MakeRing(Ring)

-- Make an empty ring if an array of initial ring values

-- wasn't passed in:

Ring = Ring or {}

-- Ring.Pos is the position of the current element of the

-- ring; initialize it to 1 (all methods that expect

-- there to be a current element first make sure the ring

-- isn't empty):

Ring.Pos = 1

-- Give the ring methods and return it:

for Name, Fnc in pairs(Methods) do

Ring[Name] = Fnc end

return Ring

end

2. Start lua as follows (this will run ring.lua and then enter interactive mode):

lua -i ring.lua

3. Within interactive mode, use the function MakeRing to create a ring, and use that ring’s methods to manipulate it:

> R = MakeRing{"the", "time", "has", "come"} -- Another use

> -- for the syntax from the "Keyword Arguments" section.

> print(R:ToString())

the, time, has, come

> print(R:Pop())

the

> R:Push("today")

> print(R:ToString())

today, time, has, come

> R:RotateL()

> print(R:ToString())

time, has, come, today

> print(R:Pop(), R:Pop(), R:Pop())

time has come

> R:Push("here")

> print(R:ToString(), R:Size())

here, today 2

> R:Push("tomorrow")

> R:Push("gone")

> print(R:ToString())

gone, tomorrow, here, today

> R:RotateR()

> print(R:ToString())

today, gone, tomorrow, here

> R:RotateR()

> print(R:ToString())

here, today, gone, tomorrow

How It Works

MakeRing instantiates a new ring. If you call it with no argument, it makes an empty ring, but if you call it with an array, it uses that array’s elements as the elements of the ring (the first element is the initial current element). A ring has the following six methods:

· Push—Adds a new element to the ring.

· Pop— Removes the current element from the ring and returns it.

· RotateL— Rotates the ring left by one element.

· RotateR—Rotates the ring right by one element.

· Size— Returns the size of the ring.

· ToString— Returns a string listing all elements of the ring, with the current one first.

When used in the context of stacks and related structures, Push and Pop mean insert and remove.

A ring is represented as an array with a Pos field that points to its current element. For example, take a look at the ring in Figure 4-8.

4-8

Figure 4-8

This ring could be represented as any of the following (the methods are left out for clarity):

{"the", "time", "has", "come", Pos = 1}

{"come", "the", "time", "has", Pos = 2}

{"has", "come", "the", "time", Pos = 3}

{"time", "has", "come", "the", Pos = 4}

Rotating a ring takes the same amount of time no matter how big the ring is. But pushing or popping an element can take an amount of time proportional to the size of the ring. More specifically, it takes an amount of time proportional to the number of elements from Pos to the end of the array. That’s because when table.insert and table.remove insert or remove an item into or from the middle of an array, they need to go through every element between there and the end of the array and shift them up or down to compensate. This implementation of rings was written that way because it’s simple and easy to understand, and the time that it takes to push or pop is not even noticeable in most circumstances.

If pushing and popping does consume a problematic amount of time, either because of the sheer size of a ring, or because a bunch of pushes or pops were being done in a time-critical section of code, then you could optimize the rings—reimplementing them in a more efficient way using a different representation. One simple optimization would be to arrange for Pos to go up on pushes and down on pops. This way, when Pos hits the sweet spot at the end of the array, it would stay at the end unless the ring was rotated.

Another optimization would be to represent each element as a table with a Val field (that element’s value), and Left and Right fields (the tables of the elements counterclockwise and clockwise from that element). Doing it that way has the advantage of making pushes and pops take the same amount of time no matter how big the ring is. It’s more complicated, though, because pushes and pops have to do the correct relinking of the Left and Right fields of the element in question, and those of its two neighbors. Additionally, each push creates a new table, which takes more time than simply inserting a value sufficiently close to the end of an array, so for rings with less than 50 or 60 elements, this approach is actually slower than the worst case of the version given in ring.lua.

A good rule of thumb is to first write something in as simple and clear a way as possible, test it to make sure it’s correct, and then don’t optimize it—unless it’s slowing the whole program down enough to detract from the program’s usability.

Here are a few more comments:

· Other than ToString, which expects everything in the ring to be a string or a number, ring.lua’s rings can hold any value—except for nil. There are (at least) a couple ways to fix this. One is to replace table.insert and table.remove with code that can handle a nil, and use a field in self to keep track of the size instead of #. The other is to create a value that won’t be equal to any other value that might be pushed into the ring, and use that value to represent nil. So Push, when given a nil, would push that value instead, and Pop, when popping that value, would return nil . You can use a do-nothing function or an empty table for the value.

· The size of the ring is measurable with the # operator, but that might no longer be true if the implementation were changed. That’s why there’s a Size method: it hides the details of how the size is kept track of.

· The only global variable set by ring.lua is MakeRing because it’s the only thing needed to create a ring.

· The local function OneMod is there to make the rotation methods easier to read by abstracting away a bit of arithmetic.

· You can use lua -i filename to write, test, and debug code. If you make a change in the file, and you don’t want to exit the interpreter just to reload it, call dofile with the filename, such as dofile("ring.lua"). If you want to test a local function (such as OneMod), you can make it global for long enough to test it and then relocalize it.

· The guts of this implementation are not hidden, which means that goofy fiddling like R.Pos = -1 can be done. This is fine in a prototype, but you want to prevent it in code intended for serious use. One way would be to put nothing but methods in the tables returned by MakeRing. There would be an upvalue containing a table whose keys would be the tables returned by MakeRing, and whose values would be tables with Pos and the contents of the corresponding ring. Only MakeRing and the methods would have access to that upvalue. When a method wanted to get at the contents or position of its ring, it would index the upvalue with self. Another way of protecting an object’s guts from fiddling is described in Chapter 11.

Because of tables’ flexibility, you often don’t need a customized data structure. Just ask yourself how you most often want to access your data—usually an associative table or an array will do the job. For instance, the task of finding a user’s information based on his or her username is obviously suited to an associative table whose keys are usernames and whose values are tables of information about each user. The task of displaying all users in alphabetical order by username is suited to a sorted array. It’s common to create ad hoc tables to do something that the main table you’re using can’t do. If you were working with an associative table like the one described previously, keyed by usernames, but you wanted to do something that grouped users by last name, you could create a table like the following, with last names as keys and arrays of users as values:

local LastUsers = {} -- Keys: last names; vals: arrays of

-- Userlnfos.

for Username, Userlnfo in pairs(Users) do

-- If this last name hasn’t been seen yet, make an empty array:

LastUsers[UserInfo.LastName] = LastUsers[UserInfo.LastName] or {}

table.insert(LastUsers[UserInfo.LastName], Userlnfo)

end

Custom-Made Loops

Most times when you want to loop through a table, pairs or ipairs is appropriate, but if neither is, you can instead write and use your own function. Here, for example, is a function similar to ipairs, but it goes through the array given to it in reverse order:

-- An iterator factory -- like ipairs, but goes through the

-- array in reverse order:

function ReverseIpairs(Arr)

local I = #Arr

local function Iter()

local Ret1, Ret2

if I > 0 then

Ret1, Ret2 = I, Arr[I]

I = I - 1

end

return Ret1, Ret2

end

return Iter

end

for I, Str in ReverseIpairs({"one", "two", "three"}) do

print(I, Str)

end

The output is as follows:

3 three

2 two

1 one

The Iter function is what is known in the Lua world as an iterator. In the simplest terms, an iterator is a function that, each time you call it, returns the next element or elements from the thing you’re looping through. The generic for expects to find an iterator to the right of the keyword in. ReverseIpairs is not an iterator—it, like ipairs and pairs, is an iterator factory—a function that returns an iterator.

(This book follows that terminological distinction, but elsewhere, you may see iterator factories referred to as iterators, when the context makes it clear what’s being talked about.) In this example, the for does not find ReverseIpairs to the right of the in, but Iter, because that’s what the call toReverseIpairs returns. The for then calls Iter, puts its results into the newly created local I and Str variables, and executes the body of the loop. It keeps on doing this until Iter’s first return value is nil, at which point the loop is exited.

That last sentence is a rule about all iterators. That is, when an iterator returns nil as its first value (or when it returns nothing, which for adjusts to nil), the loop is ended. For this reason, the leftmost loop variable (which receives the iterator’s first return value) is called the loop’s control variable.

This implementation of ReverseIpairs returns what is called a stateful iterator because it includes (as the upvalues Arr and I) the current state of the iteration. You can also write stateless iterators, which depend on for to keep track of the current state of the iteration for them.

Try this stateless version of ReverseIpairs:

do -- Local scope for Iter.

-- ReverseIpairs's iterator; Arr is the "invariant state",

-- and I is the control variable's previous value:

local function Iter(Arr, I)

if I > 1 then

I = I - 1

return I, Arr[I] -- Violates structured programming

-- (not a severe misdeed in such a small function).

end

end

-- An iterator factory -- like ipairs, but goes through

-- the array in reverse order:

function ReverseIpairs(Arr)

return Iter, Arr, #Arr + 1

end

end

for I, Str in ReverseIpairs({"one", "two", "three"}) do

print(I, Str)

end

It prints the same thing as the stateful version:

3 three

2 two

1 one

The generic for actually expects up to three values to the right of the in:

· The iterator itself

· An invariant state (nil in a stateful iterator, and usually the table being looped through in a stateless iterator)

· A seed value for the loop’s control variable (nil in a stateful iterator and in some stateless iterators)

Every time for calls the iterator, it passes it two arguments: the invariant state and the value of the control variable from the previous iteration. That’s why it needs the seed value—to have something to pass the iterator before the first iteration.

In the example, ReverseIpairs returns Iter, the array it was given (the invariant state), and the length of the array plus one (the seed value for the control variable). for then calls Iter with the array and the seed value. Iter doesn’t know that it’s being called on the first iteration. It just sees that its second argument is 4 and, in effect, thinks to itself: “If the last value of the control variable was 4, then it’s time for me to return 3 and the 3rd element of my first argument.” for assigns these values to I and Str, and executes the body of the loop. Then it calls Iter again, with the array and 3 as arguments. This process continues until the last time Iter is called. Because its second argument is 1, it returns nothing. for looks at Iter’s first return value and, seeing it to be (after adjustment) nil, ends the loop.

The built-in function next is a stateless iterator. It takes a table and a key in that table and returns the “next” key-value pair in the table, like this:

> NameToInstr = {John = "rhythm guitar",

>> Paul = "bass guitar",

>> George = "lead guitar",

>> Ringo = "drumkit"}

> print(next(NameToInstr, "Ringo"))

George lead guitar

> print(next(NameToInstr, "George"))

John rhythm guitar

> print(next(NameToInstr, "John"))

Paul bass guitar

If given the seed value nil , it returns the “first” key-value pair in the table, and if given the “last” key in the table, it returns nil , like this:

> print(next(NameToInstr))

Ringo drumkit

> print(next(NameToInstr, "Paul"))

nil

“Next,” “first,” and “last” are in quotes here because the order of the table is arbitrary. But it’s the same arbitrary order in which pairs loops through a table. In fact, all pairs does is return next, the table given to it, and nil , so that if you write one of the following:

for Key, Val in next, Tbl, nil do

for Key, Val in next, Tbl do

it’s the same as writing this:

for Key, Val in pairs(Tbl) do

pairs has parentheses after it and next doesn’t because pairs is an iterator factory and needs to be called, but next is an iterator and needs to be given straight to for.

If you check it out, you’ll see that next and the iterator returned by pairs are actually different functions, but that’s just a quirk of the Lua implementation. Both are just wrappers around the same function written in C.

ipairs is similar. It returns a stateless iterator, the table given to it, and a seed value of 0. The iterator (called Ipairslter in the following example) works by adding one to its second argument and returning that key-value pair of its first argument:

> Ipairslter, Arr, Seed = ipairs({"one", "two", "three"})

> print(IpairsIter, Arr, Seed)

function: 0x480c68 table: 0x496230 0

> print(IpairsIter(Arr, 0))

1 one

> print(IpairsIter(Arr, 1))

2 two

> print(IpairsIter(Arr, 2))

3 three

> print(IpairsIter(Arr, 3)) -- This will return nothing.

IpairsIter is not a built-in function, but it’s easy to write. Here’s how:

function IpairsIter(Arr, PrevI)

local CurI = PrevI + 1

Val = Arr[CurI]

if Val ~= nil then

return CurI, Val

end

end

The behavior of the generic for is complex, but it allows for to be both flexible and efficient—flexible because an iterator can be written for anything you want to loop through, and efficient because a stateless iterator factory can return the same iterator every time, rather than creating a new closure each time it’s called. If you’re writing an iterator, and you can easily tell just by looking at the control variable’s previous value what the next value should be, then write a stateless iterator; otherwise write a stateful one.

All of the iterator factories you’ve seen so far have only taken one argument, and all their iterators have returned two values after each iteration, but that’s only because they’re all for iterating through key-value pairs of tables. The following Subseqs iterator factory takes an array and a Len number, , and then returns an iterator that loops through the array’s subsequences, each subsequence being Len elements long:

> -- Returns an iterator that goes through all Len-long

> -- subsequences of Arr:

> function Subseqs(Arr, Len)

>> local Pos = 0

>>

>> return function()

>> Pos = Pos + 1

>> if Pos + Len - 1 <= #Arr then

>> return unpack(Arr, Pos, Pos + Len - 1)

>> end

>> end

>> end

>

> Nums = {"one", "two", "three", "four", "five", "six"}

> for Val1, Val2, Val3, Val4 in Subseqs(Nums, 4) do

>> print(Val1, Val2, Val3, Val4)

>> end

one two three four

two three four five

three four five six

Here’s an iterator (meant to be part of ring.lua) that only returns one value after each iteration—it loops through all values in the ring, starting with the current one:

-- Returns an iterator that iterates through all self's

-- elements.

function Methods:Elems()

local IterPos -- The position of the element the iterator

-- needs to return

return function()

local Ret

if IterPos then

if IterPos ~= self.Pos then

Ret = self[IterPos]

else

-- Back at the beginning; do nothing (which ends the

-- loop by returning nil).

end

else

-- At the beginning: initialize IterPos:

IterPos = self.Pos

Ret = self[IterPos] -- If the ring is empty, this'll

-- end the loop by returning nil.

end

IterPos = OneMod(IterPos + 1, #self)

return Ret

end

end

Here it is in use:

> dofile("ring.lua")

> Days = MakeRing{"Monday", "Tuesday", "Wednesday",

>> "Thursday", "Friday", "Saturday", "Sunday"}

> Days:RotateR()

> for Day in Days:Elems() do

>> print(Day)

>> end

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

The normal rules for adjustment apply to iterators and loop variables:

> Letters = {"a", "b", "c"}

> for I in ipairs(Letters) do

>> print(I)

>> end

1

2

3

> for I, Letter, Junk1, Junk2 in ipairs(Letters) do

>> print(I, Letter, Junk1, Junk2)

>> end

1 a nil nil

2 b nil nil

3 c nil nil

You can write stateless iterators that ignore their second argument (and keep track of where they are by side-effecting the table given as the first argument), but this technique is seldom used.

You’ll learn one more method for writing iterators in Chapter 12.

Global Variable Environments

You may have noticed similarities between global variables and table keys, such as their lack of distinction between nil and nonexistence. These similarities are not coincidental. Global variables are actually stored in a table. This table can be found in the global variable _G, as you can see in the following example:

You can tell by looking at the numbers that print and _G.print are the same value, as are _G itself and _G._G._G._G, because _G, being both a global variable and a table containing all the global variables, contains itself.

> print(print, _G.print)

function: 0x481720 function: 0x481720

> MyGlobal = "Hello!"

> print(MyGlobal, _G.MyGlobal)

Hello! Hello!

> print(_G, _G._G._G._G)

table: 0x4806e8 table: 0x4806e8

This means that you can access global variables whose names are strings built at run time (without resorting to loadstring), like this:

> Abcd = "test 1"

> print(_G["Ab" .. "cd"])

test 1

> _G["Wx" .. "yz"] = "test 2"

> print(Wxyz)

test 2

For more on loadstring, see Chapter 3.

You can also loop through all global variables. Give the following code a try:

print already puts a tab character between the things it prints. The extra print in the following example just makes it easier to read.

for Name, Val in pairs(_G) do

print(Name, "\t", Val)

end

A table used to store global variables is called an environment, and every function has one. Usually they’re all the same environment, but you can give a function its own environment with the function setfenv (short for set function environment). In the following example, you create a Greetfunction and give it an empty environment. This function tries to call print, but the "print" key in its environment is not set, so there’s an error:

> function Greet(Name)

>> print("Hello, " .. Name .. ".")

>> end

> setfenv(Greet, {})

> Greet("Syd")

stdin:2: attempt to call global 'print† (a nil value)

stack traceback:

stdin:2: in function 'Greet†

stdin:1: in main chunk

[C]: ?

When you give it an environment whose "print" key is set to the print function, it works as desired:

> setfenv(Greet, {print = print})

> Greet("Syd")

Hello, Syd.

Of course, the function could be called print but could do something different than the real print, as in the following example:

> Env = {print = function(Str)

>> print("<<<" .. Str .. ">>>")

>> end}

> setfenv(Greet, Env)

> Greet("Syd")

<<<Hello, Syd.>>>

Earlier it was said that some closures can be side-effected by calling them. Now you see that you can give side effects to all functions by calling setfenv on them.

A function’s environment can be retrieved with the getfenv ("get function environment") function. For example:

> PrintEnv, GreetEnv = getfenv(print), getfenv(Greet)

> -- print's environment is different than Greet's environment:

> print(PrintEnv, GreetEnv)

table: 0x4806e8 table: 0x493958

> -- print's environment is the one in _G, Greet's isn't:

> print(PrintEnv == _G, GreetEnv == _G)

true false

Environment tables are just ordinary tables, and behave accordingly. In the previous example, because PrintEnv is the same as _G, any changes you make to PrintEnv will show up in _G:

> PrintEnv.Test = "test"

> print(_G.Test)

test

> print(Test)

test

Function environments can be used to modularize code by making global variables not so global. An example of this (the module function) is given in Chapter 7.

Another common use is sandboxing, which means running code with a limited or otherwise specialized set of global variables. The environment you gave to Greet that only had print in it was a sandbox—Greet had no way of accessing any of Lua’s other built-in functions. This sort of thing is useful if your program needs to run code supplied by a user, but you want to limit the power of the user to mess things up (either by assigning to global variables, or by calling functions that he or she shouldn’t be calling). One way to make a sandbox is to make a table with all the globals that the sandboxed function is supposed to have access to—this is the technique you used in the Greet example. (It may be easier to use a function like DeepCopy to copy _G, and then remove anything you don’t want it to contain.) Another, more flexible technique is explained in Chapter 11.

The first argument to setfenv or getfenv can be a number instead of a function. The number is treated as a stack level, and the function at that level of the stack has its environment set or gotten. Lower numbers are closer to the top of the stack: 1 is the current function (the one calling setfenv orgetfenv); 2 is the function that called the current function (and hence is right below it on the stack); 3 is the function that called the function that called the current function; and so on. (Calling getfenv with no argument is the same as calling it with an argument of 1—it returns the current function’s environment.) For example:

> -- Gives its caller an empty environment:

> function PwnMe()

>> setfenv(2, {})

>> end

>

> do

>> PwnMe()

>> -- Global variables are missing now:

>> print("test")

>> end

stdin:4: attempt to call global 'print' (a nil value)

stack traceback:

stdin:4: in main chunk

[C]: ?

> -- Since this is a new chunk now, things are back to

> -- normal. That wouldn't be the case if this were a

> -- script, and therefore all one chunk.

> print("test")

test

For more on chunks, see Chapter 3.

This means that if you’re sandboxing a function to prevent it from messing with its caller’s global vari-ables, you should not include either setfenv or getfenv in the sandbox.

When you create a function, it inherits the environment of its source function. That’s why DoNothing1 and DoNothing2 have different environments in the following code, and they’ll keep these environments until and unless you use setfenv on them:

> -- Returns a do-nothing function:

> function MakeDoNothing()

>> return function() end

>> end

>

> EmptyTbl1, EmptyTbl2 = {}, {}

> setfenv(MakeDoNothing, EmptyTbl1)

> DoNothing1 = MakeDoNothing()

> setfenv(MakeDoNothing, EmptyTbl2)

> DoNothing2 = MakeDoNothing()

> -- DoNothings 1 and 2 have the environments that

> -- MakeDoNothing had when it created them:

> print(getfenv(DoNothing1) == EmptyTbl1)

true

> print(getfenv(DoNothing2) == EmptyTbl2)

true

You cannot alter an environment of a function not written in Lua, as shown here:

> -- This causes an error because print is written in C:

> setfenv(print, {})

stdin:1: 'setfenv' cannot change environment of given object

stack traceback:

[C]: in function 'setfenv'

stdin:1: in main chunk

[C]: ?

This restriction is to protect functions written in low-level languages like C (which have much more power to mess things up) from unauthorized meddling by Lua functions.

Like many restrictions in Lua, this one can be bypassed with the debug library, described in Chapter 10.

There is a way, though, for you to set the environment used by the bottom frame of the stack. You’ll learn how to do this in a moment, but first, here’s a bit of background. The bottom frame of the stack is always in C (which is why stack tracebacks always end with “ [C]: ?”). Its environment is called the global environment. (This term is a bit confusing because all references to global variables are resolved by looking in an environment, but even if various functions/stack frames have different environments, only one of them is the global environment.)

If you try to change the global environment by figuring out what stack level it’s at, you’ll get the same error given previously when you tried to give setfenv the print function:

> setfenv(2, {})

stdin:1: 'setfenv' cannot change environment of given object

stack traceback:

[C]: in function 'setfenv'

stdin:1: in main chunk

[C]: ?

The way to work around this is to use the magic number 0 as the first argument to setfenv. Chunks typed into the interpreter inherit the global environment. Then you set the global environment to a table with nothing in it but _G (the original global environment). After you do this, every chunk inherits that environment and thus can’t access the standard global variables under their usual names. For example:

> setfenv(0, {_G = _G})

> -- Error message won't have tracebacks, as explained below.

> print("This won't work.")

stdin:1: attempt to call global 'print' (a nil value)

> _G.print("This won't either, since print calls tostring.")

attempt to call a nil value

> tostring = _G.tostring

> _G.print("Now that tostring is in place, this will work.")

Now that tostring is in place, this will work.

> -- Undo the damage by putting the original global

> -- environment back in place:

> _G.setfenv(0, _G)

> print("This works again!")

This works again!

Note the following in this example:

· Although you can’t access it with setfenv or getfenv, print has its own environment. That environment is unaffected by changing the global environment, because it is the original global environment, inherited by print when it was created. If print looked for tostring, it would find it, but it looks instead in the current global environment, where tostring is nil. (There’s no hidden meaning to print looking in the global environment rather than its own; that’s just how it was written.)

· The error message has no stack traceback. That’s because the stack traceback is created by calling the function debug.traceback. When the interpreter sees that this function is missing, it doesn’t even try to call it.

setfenv returns the function whose environment it just set, or nothing if it’s used to set the global environment, as shown here:

> DoNothing = function() end

> print(setfenv(DoNothing, _G) == DoNothing)

true

> print(select("#", setfenv(0, _G)))

0

In Lua 5.0, setfenv never returns a value.

If getfenv is passed a non-Lua function, it returns the global environment (not that function’s real environment).

A function returned by loadstring (or by one of its cousins load and loadfile) inherits the current global environment.

_G is just a regular global variable, with no magic behavior. When Lua starts up, it puts the global environment into _G, but if the global environment is changed to a different table, _G is not updated, and if you put a different table into _G, no environments are changed. _G is purely a convenience, and in the previous examples, it saved you from having to type a lot of getfenv(0).

Summary

In this chapter, you examined tables and some other things that are easier to understand in relation to tables. What you learned included the following:

· Tables are collections of key-value pairs, and they are Lua’s only data structure.

· Curly braces are used to create tables; square brackets and dots are used to read and assign to table fields.

· There is no difference between a key not existing in a table, and that key existing but being associated with the value nil .

· You can use tables as arrays by using consecutive integer keys (starting at one).

· You can use pairs and ipairs with the generic for to loop through tables.

· The table global variable contains a table, which in turn contains functions useful for dealing with tables (as in table.sort).

· The colon syntax makes writing in an object-oriented style more convenient.

· You can access the extra arguments of a vararg function with ... (the vararg expression). You can use the select and unpack functions when dealing with vararg functions.

· All chunks are compiled into functions. For example, scripts are vararg functions, which allows access to their command-line arguments.

· Tables are mutable, so that when you alter a table’s contents, it is still the same table.

· pairs and ipairs are iterator factories. You can write iterator factories of your own.

· Global variables are stored in tables called environments. You use the getfenv and setfenv functions to manipulate these environments.

Numeric, string, Boolean, and nil values; operators; expressions; statements; control structures; functions; tables—these are the building blocks of Lua and, as now that you know them, you know Lua.

You may have noticed that through this and the previous two chapters, there were an increasing proportion of examples that look like they could be useful in real programs. This trend continues in the next chapter, which has lots of code relevant to the real world—code for searching, matching, substituting, and otherwise manipulating strings and their characters and substrings. First, though, try the exercises for this chapter (answers are in the appendix).

Exercises

1. In your head, figure out what the following prints:

A = {}

B = "C"

C = "B"

D = {

[A] = {B = C},

[B] = {[C] = B},

[C] = {[A] = A}}

print(D.C["B"])

2. By default, table.sort uses < to compare array elements, so it can only sort arrays of numbers or arrays of strings. Write a comparison function that allows table.sort to sort arrays of mixed types. In the sorted array, all values of a given type should be grouped together. Within each such group, numbers and strings should be sorted as usual, and other types should be sorted in some arbitrary but consistent way.

Test the function out on an array like this:

{{}, {}, {}, "", “a", "b", "c", 1, 2, 3, -100, 1.1,

function() end, function() end, false, false, true}

3. The print function converts all its arguments to strings, separates them with tab characters, and outputs them, along with a trailing newline. Write a function that, instead of giving that as output, returns it as a string, so that the following:

Sprint("Hi", {}, nil)

returns this:

"Hi\ttable: 0x484048\tnil\n"

which, if printed, would look like this:

Hi table: 0x484048 nil

4. In ring.lua, the RotateL method only rotates its object one element to the left, as shown here:

-- Rotates self to the left:

function Methods:RotateL()

if #self > 0 then

self.Pos = OneMod(self.Pos + 1, #self)

end

end

Rewrite it to take an optional numeric argument (defaulting to 1) and rotate the object by that many elements. (This requires neither a loop nor recursion.)

5. Write a stateful iterator generator, SortedPairs, that behaves just like pairs, except that it goes through key-value pairs in order by key. Use the CompAll function from exercise 2 to sort the keys.