Completing a Program - The Basics - Programming: Principles and Practice Using C++ (2014)

Programming: Principles and Practice Using C++ (2014)

Part I: The Basics

7. Completing a Program

“It ain’t over till the fat lady sings.”

—Opera proverb

Writing a program involves gradually refining your ideas of what you want to do and how you want to express it. In Chapter 6, we produced the initial working version of a calculator program. Here, we’ll refine it. Completing the program — that is, making it fit for users and maintainers — involves improving the user interface, doing some serious work on error handling, adding a few useful features, and restructuring the code for ease of understanding and modification.


7.1 Introduction

7.2 Input and output

7.3 Error handling

7.4 Negative numbers

7.5 Remainder: %

7.6 Cleaning up the code

7.6.1 Symbolic constants

7.6.2 Use of functions

7.6.3 Code layout

7.6.4 Commenting

7.7 Recovering from errors

7.8 Variables

7.8.1 Variables and definitions

7.8.2 Introducing names

7.8.3 Predefined names

7.8.4 Are we there yet?


7.1 Introduction

Image

When your program first starts running “reasonably,” you’re probably about halfway finished. For a large program or a program that could do harm if it misbehaved, you will be nowhere near halfway finished. Once the program “basically works,” the real fun begins! That’s when we have enough working code to experiment with ideas.

In this chapter, we will guide you through the considerations a professional programmer might have trying to improve the calculator from Chapter 6. Note that the questions asked about the program and the issues considered here are far more interesting than the calculator itself. What we do is to give an example of how real programs evolve under the pressure of requirements and constraints and of how a programmer can gradually improve code.

7.2 Input and output

If you look back to the beginning of Chapter 6, you’ll find that we decided to prompt the user with

Expression:

and to report back answers with

Result:

In the heat of getting the program to run, we forgot all about that. That’s pretty typical. We can’t think of everything all the time, so when we stop to reflect, we find that we have forgotten something.

For some programming tasks, the initial requirements cannot be changed. That’s usually too rigid a policy and leads to programs that are unnecessarily poor solutions to the problems that they are written to solve. So, let’s consider what we would do, assuming that we can change the specification of what exactly the program should do. Do we really want the program to write Expression: and Result:? How would we know? Just “thinking” rarely helps. We have to try and see what works best.

2+3; 5*7; 2+9;

currently gives

= 5
= 35
= 11

If we used Expression: and Result:, we’d get

Expression: 2+3; 5*7; 2+9;
Result : 5
Expression: Result: 35
Expression: Result: 11
Expression:

We are sure that some people will like one style and others will like the other. In such cases, we can consider giving people a choice, but for this simple calculator that would be overkill, so we must decide. We think that writing Expression: and Result: is a bit too “heavy” and distracting. Using those, the actual expressions and results are only a minor part of what appears on the screen, and since expressions and results are what matters, nothing should distract from them. On the other hand, unless we somehow separate what the user types from what the computer outputs, the result can be confusing. During initial debugging, we added = as a result indicator. We would also like a short “prompt” to indicate that the program wants input. The > character is often used as a prompt:

> 2+3;
= 5
> 5*7;
= 35
>

This looks much better, and we can get it by a minor change to the main loop of main():

double val = 0;
while (cin) {
cout << "> " ; // print prompt
Token t = ts.get();
if (t.kind == 'q') break;
if (t.kind == ';')
cout << "= " << val << '\n'; // print result
else
ts.putback(t);
val = expression();
}

Unfortunately, the result of putting several expressions on a line is still messy:

> 2+3; 5*7; 2+9;
= 5
> = 35
> = 11
>

The basic problem is that we didn’t think of multiple expressions on a line when we started out (at least we pretended not to). What we want is

> 2+3; 5*7; 2+9;
= 5
= 35
= 11
>

This looks right, but unfortunately there is no really obvious way of achieving it. We first looked at main(). Is there a way to write out > only if it is not immediately followed by a =? We cannot know! We need to write > before the get(), but we do not know if get() actually reads new characters or simply gives us a Token from characters that it had already read from the keyboard. In other words, we would have to mess with Token_stream to make this final improvement.

For now, we decide that what we have is good enough. If we find that we have to modify Token_stream, we’ll revisit this decision. However, it is unwise to make major structural changes to gain a minor advantage, and we haven’t yet thoroughly tested the calculator.

7.3 Error handling

Image

The first thing to do once you have a program that “basically works” is to try to break it; that is, we try to feed it input in the hope of getting it to misbehave. We say “hope” because the challenge here is to find as many errors as possible, so that you can fix them before anybody else finds them. If you go into this exercise with the attitude that “my program works and I don’t make errors!” you won’t find many bugs and you’ll feel bad when you do find one. You’d be playing head games with yourself! The right attitude when testing is “I’ll break it! I’m smarter than any program — even my own!” So, we feed the calculator a mix of correct and incorrect expressions. For example:

1+2+3+4+5+6+7+8
1-2-3-4
!+2
;;;
(1+3;
(1+);
1*2/3%4+5-6;
();
1+;
+1
1++;
1/0
1/0;
1++2;
-2;
-2;;;;
1234567890123456;
'a';
q
1+q
1+2; q


Image Try This

Feed a few such “problematic” expressions to the calculator and try to figure out in how many ways you can get it to misbehave. Can you get it to crash, that is, to get it past our error handling and give a machine error? We don’t think you can. Can you get it to exit without a useful error message? You can.


Technically, this is known as testing. There are people who do this — break programs — for a living. Testing is a very important part of software development and can actually be fun. Chapter 26 examines testing in some detail. One big question is: “Can we test the program systematically, so that we find all of the errors?” There is no general answer to this question; that is, there is no answer that holds for all programs. However, you can do rather well for many programs when you approach testing seriously. You try to create test cases systematically, and just in case your strategy for selecting tests isn’t complete, you do some “unreasonable” tests, such as

Mary had a little lamb
srtvrqtiewcbet7rewaewre-wqcntrretewru754389652743nvcqnwq;
!@#$%^&*()~:;

Image

Once, when testing compilers, I got into the habit of feeding email that reported compiler errors straight to the compiler — mail headers, user’s explanation, and all. That wasn’t “sensible” because “nobody would do that.” However, a program ideally catches all errors, not just the sensible ones, and soon that compiler was very resilient against “strange input.”

The first really annoying thing we noticed when testing the calculator was that the window closed immediately after inputs such as

+1;
()
!+2

A little thought (or some tracing of the program’s execution) shows that the problem is that the window is closed immediately after the error message has been written. This happens because our mechanism for keeping a window alive was to wait for you to enter a character. However, in all three cases above, the program detected an error before it had read all of the characters, so that there was a character left on the input line. The program can’t tell such “leftover” characters from a character entered in response to the Enter a character to close window prompt. That “leftover” character then closed the window.

We could deal with that by modifying main() (see §5.6.3):

catch (runtime_error& e) {
cerr << e.what() << '\n';
// keep_window_open():
cout << "Please enter the character ~ to close the window\n";
for (char ch; cin >> ch; ) // keep reading until we find a ~
if (ch=='~') return 1;
return 1;
}

Basically, we replaced keep_window_open() with our own code. Note that we still have our problem if a ~ happens to be a character to be read after an error, but that’s rather unlikely.

When we encountered this problem we wrote a version of keep_window_open() that takes a string as its argument and closes the window only when you enter that string after getting the prompt, so a simpler solution is

catch (runtime_error& e) {
cerr << e.what() << '\n';
keep_window_open("~~");
return 1;
}

Now examples such as

+1
!1~~
()

will cause the calculator to give the proper error messages, then say

Please enter ~~ to exit

and not exit until you enter the string ~~.

The calculator takes input from the keyboard. That makes testing tedious: each time we make an improvement, we have to type in a lot of test cases (yet again!) to make sure we haven’t broken anything. It would be much better if we could store our test cases somewhere and run them with a single command. Some operating systems (notably Unix) make it trivial to get cin to read from a file without modifying the program, and similarly to divert the output from cout to a file. If that’s not convenient, we must modify the program to use a file (see Chapter 10).

Now consider:

1+2; q

and

1+2 q

We would like both to print the result (3) and then exit the program. Curiously enough,

1+2 q

does that, but the apparently cleaner

1+2; q

elicits a Primary expected error. Where would we look for this error? In main() where ; and q are handled, of course. We added those “print” and “quit” commands rather quickly to get the calculator to work (§6.7). Now we are paying for that haste. Consider again:

double val = 0;
while (cin) {
cout << "> ";
Token t = ts.get();
if (t.kind == 'q') break;
if (t.kind == ';')
cout << "= " << val << '\n';
else
ts.putback(t);
val = expression();
}

If we find a semicolon, we straightaway proceed to call expression() without checking for q. The first thing that expression() does is to call term(), which first calls primary(), which finds q. The letter q isn’t a Primary so we get our error message. So, we should test for q after testing for a semicolon. While we were at it, we felt the need to simplify the logic a bit, so the complete main() reads

int main()
try
{
while (cin) {
cout << "> ";
Token t = ts.get();
while (t.kind == ';') t=ts.get(); // eat ‘;’
if (t.kind == 'q') {
keep_window_open();
return 0;
}
ts.putback(t);
cout << "= " << expression() << '\n';
}
keep_window_open();
return 0;
}
catch (exception& e) {
cerr << e.what() << '\n';
keep_window_open("~~");
return 1;
}
catch (...) {
cerr << "exception \n";
keep_window_open("~~");
return 2;
}

This makes for reasonably robust error handling. So we can start considering what else we can do to improve the calculator.

7.4 Negative numbers

If you tested the calculator, you found that it couldn’t handle negative numbers elegantly. For example, this is an error:

-1/2

We have to write

(0-1)/2

That’s not acceptable.

Image

Finding such problems during late debugging and testing is common. Only now do we have the opportunity to see what our design really does and get the feedback that allows us to refine our ideas. When planning a project, it is wise to try to preserve time and flexibility to benefit from the lessons we learn here. All too often, “release 1.0” is shipped without needed refinements because a tight schedule or a rigid project management strategy prevents “late” changes to the specification; “late” addition of “features” is especially dreaded. In reality, when a program is good enough for simple use by its designers but not yet ready to ship, it isn’t “late” in the development sequence; it’s the earliest time when we can benefit from solid experience with the program. A realistic schedule takes that into account.

In this case, we basically need to modify the grammar to allow unary minus. The simplest change seems to be in Primary. We have

Primary:
Number
"(" Expression ")"

and we need something like

Primary:
Number
"(" Expression ")"
"-" Primary
"+" Primary

We added unary plus because that’s what C++ does. When we have unary minus, someone always tries unary plus and it’s easier just to implement that than to explain why it is useless. The code that implements Primary becomes

double primary()
{
Token t = ts.get();
switch (t.kind) {
case '(': // handle ‘(’ expression ‘)’
{
double d = expression();
t = ts.get();
if (t.kind != ')') error("')' expected");
return d;
}
case '8': // we use ‘8’ to represent a number
return t.value; // return the number’s value
case '-':
return - primary();
case '+':
return primary();
default:
error("primary expected");
}
}

That’s so simple that it actually worked the first time.

7.5 Remainder: %

When we first analyzed the ideals for a calculator, we wanted the remainder (modulo) operator: %. However, % is not defined for floating-point numbers, so we backed off. Now we can consider it again. It should be simple:

1. We add % as a Token.

2. We define a meaning for %.

We know the meaning of % for integer operands. For example:

> 2%3;
= 2
> 3%2;
= 1
> 5%3;
= 2

But how should we handle operands that are not integers? Consider:

> 6.7%3.3;

What should be the resulting value? There is no perfect technical answer. However, modulo is often defined for floating-point operands. In particular, x%y can be defined as x-y=x-y*int(x/y), so that 6.7%3.3==6.7-3.3*int(6.7/3.3), that is, 0.1. This is easily done using the standard library function fmod() (floating-point modulo) from <cmath> (§24.8). We modify term() to include

case '%':
{ double d = primary();
if (d == 0) error("divide by zero");
left = fmod(left,d);
t = ts.get();
break;
}

The <cmath> library is where we find all of the standard mathematical functions, such as sqrt(x) (square root of x), abs(x) (absolute value of x), log(x) (natural logarithm of x), and pow(x,e) (x to the power of y).

Alternatively, we can prohibit the use of % on a floating-point argument. We check if the floating-point operands have fractional parts and give an error message if they do. The problem of ensuring int operands for % is a variant of the narrowing problem (§3.9.2 and §5.6.4), so we could solve it using narrow_cast:

case '%':
{ int i1 = narrow_cast<int>(left);
int i2 = narrow_cast<int>(primary());
if (i2 == 0) error("%: divide by zero");
left = i1%i2;
t = ts.get();
break;
}

For a simple calculator, either solution will do.

7.6 Cleaning up the code

Image

We have made several changes to the code. They are, we think, all improvements, but the code is beginning to look a bit messy. Now is a good time to review the code to see if we can make it clearer and shorter, add and improve comments, etc. In other words, we are not finished with the program until we have it in a state suitable for someone else to take over maintenance. Except for the almost total absence of comments, the calculator code really isn’t that bad, but let’s do a bit of cleanup.

7.6.1 Symbolic constants

Looking back, we find the use of '8' to indicate a Token containing a numeric value odd. It doesn’t really matter what value is used to indicate a number Token as long as the value is distinct from all other values indicating different kinds of Tokens. However, the code looks a bit odd and we had to keep reminding ourselves in comments:

case '8': // we use '8' to represent a number
return t.value; // return the number’s value
case '-':
return - primary();

Image

To be honest, we also made a few mistakes, typing '0' rather than '8', because we forgot which value we had chosen to use. In other words, using '8' directly in the code manipulating Tokens was sloppy, hard to remember, and error-prone; '8' is one of those “magic constants” we warned against in §4.3.1. What we should have done was to introduce a symbolic name for the constant we used to represent a number:

const char number = '8'; // t.kind==number means that t is a number Token

The const modifier simply tells the compiler that we are defining an object that is not supposed to change: for example, an assignment number='0' would cause the compiler to give an error message. Given that definition of number, we don’t have to use '8' explicitly anymore. The code fragment from primary above now becomes

case number:
return t.value; // return the number’s value
case '-':
return - primary();

Image

This requires no comment. We should not say in comments what can be clearly and directly said in code. Repeated comments explaining something are often an indication that the code should be improved.

Similarly, the code in Token_stream::get() that recognizes numbers becomes

case '.':
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
{ cin.putback(ch); // put digit back into the input stream
double val;
cin >> val; // read a floating-point number
return Token(number,val);
}

We could consider symbolic names for all tokens, but that seems overkill. After all, '(' and '+' are about as obvious a notation for ( and + as anyone could come up with. Looking through the tokens, only ';' for “print” (or “terminate expression”) and 'q' for “quit” seem arbitrary. Why not 'p' and'e'? In a larger program, it is only a matter of time before such obscure and arbitrary notation becomes a cause of a problem, so we introduce

const char quit = 'q'; // t.kind==quit means that t is a quit Token
const char print = ';'; // t.kind==print means that t is a print Token

Now we can write main()’s loop like this:

while (cin) {
cout << "> ";
Token t = ts.get();
while (t.kind == print) t=ts.get();
if (t.kind == quit) {
keep_window_open();
return 0;
}
ts.putback(t);
cout << "= " << expression() << '\n';
}

Introducing symbolic names for “print” and “quit” makes the code easier to read. In addition, it doesn’t encourage someone reading main() to make assumptions about how “print” and “quit” are represented on input. For example, it should come as no surprise if we decide to change the representation of “quit” to 'e' (for “exit”). That would now require no change in main().

Now the strings "> " and "= " stand out. Why do we have these “magical” literals in the code? How would a new programmer reading main() guess their purpose? Maybe we should add a comment? Adding a comment might be a good idea, but introducing a symbolic name is more effective:

const string prompt = "> ";
const string result = "= "; // used to indicate that what follows is a result

Should we want to change the prompt or the result indicator, we can just modify those consts. The loop now reads

while (cin) {
cout << prompt;
Token t = ts.get();
while (t.kind ==print) t=ts.get();
if (t.kind == quit) {
keep_window_open();
return 0;
}
ts.putback(t);
cout << result << expression() << '\n';
}

7.6.2 Use of functions

The functions we use should reflect the structure of our program, and the names of the functions should identify the logically separate parts of our code. Basically, our program so far is rather good in this respect: expression(), term(), and primary() directly reflect our understanding of the expression grammar, and get() handles the input and token recognition. Looking at main(), though, we notice that it does two logically separate things:

1. main() provides general “scaffolding”: start the program, end the program, and handle “fatal” errors.

2. main() handles the calculation loop.

Image

Ideally, a function performs a single logical action (§4.5.1). Having main() perform both of these actions obscures the structure of the program. The obvious solution is to make the calculation loop into a separate function calculate():

void calculate() // expression evaluation loop
{
while (cin) {
cout << prompt;
Token t = ts.get();
while (t.kind == print) t=ts.get(); // first discard all “prints”
if (t.kind == quit) return;
ts.putback(t);
cout << result << expression() << '\n';
}
}
int main()
try {
calculate();
keep_window_open(); // cope with Windows console mode
return 0;
}
catch (runtime_error& e) {
cerr << e.what() << '\n';
keep_window_open("~~");
return 1;
}
catch (. . .) {
cerr << "exception \n";
keep_window_open("~~");
return 2;
}

This reflects the structure much more directly and is therefore easier to understand.

7.6.3 Code layout

Looking through the code for ugly code, we find

switch (ch) {
case 'q': case ';': case '%': case '(': case ')': case '+': case'-': case '*': case '/':
return Token{ch}; // let each character represent itself

This wasn’t too bad before we added 'q', ';', and '%', but now it’s beginning to become obscure. Code that is hard to read is where bugs can more easily hide. And yes, a potential bug lurks here! Using one line per case and adding a couple of comments help. So, Token_stream’s get() becomes

Token Token_stream::get()
// read characters from cin and compose a Token
{
if (full) { // check if we already have a Token ready
full = false;
return buffer;
}
char ch;
cin >> ch; // note that >> skips whitespace (space, newline, tab, etc.)

switch (ch) {
case quit:
case print:
case '(':
case ')':
case '+':
case '-':
case '*':
case '/':
case '%':
return Token{ch}; // let each character represent itself
case '.': // a floating-point-literal can start with a dot
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9': // numeric literal
{ cin.putback(ch); // put digit back into the input stream
double val;
cin >> val; // read a floating-point number
return Token{number,val};
}
default:
error("Bad token");
}
}

We could of course have put each digit case on a separate line also, but that didn’t seem to buy us any clarity. Also, doing so would prevent get() from being viewed in its entirety on a screen at once. Our ideal is for each function to fit on the screen; one obvious place for a bug to hide is in the code that we can’t see because it’s off the screen horizontally or vertically. Code layout matters.

Note also that we changed the plain 'q' to the symbolic name quit. This improves readability and also guarantees a compile-time error if we should make the mistake of choosing a value for quit that clashes with another token name.

Image

When we clean up code, we might accidentally introduce errors. Always retest the program after cleanup. Better still, do a bit of testing after each set of minor improvements so that if something went wrong you can still remember exactly what you did. Remember: Test early and often.

7.6.4 Commenting

Image

We added a few comments as we went along. Good comments are an important part of writing code. We tend to forget about comments in the heat of programming. When you go back to the code to clean it up is an excellent time to look at each part of the program to see if the comments you originally wrote are

1. Still valid (you might have changed the code since you wrote the comment)

2. Adequate for a reader (they usually are not)

3. Not so verbose that they distract from the code

Image

To emphasize that last concern: what is best said in code should be said in code. Avoid comments that explain something that’s perfectly clear to someone who knows the programming language. For example:

x = b+c; // add b and c and assign the result to x

You’ll find such comments in this book, but only when we are trying to explain the use of a language feature that might not yet be familiar to you.

Comments are for things that code expresses poorly. An example is intent: code says what it does, not what it was intended to do (§5.9.1). Look at the calculator code. There is something missing: the functions show how we process expressions and tokens, but there is no indication (except the code) of what we meant expressions and tokens to be. The grammar is a good candidate for something to put in comments or into some documentation of the calculator.

/*
Simple calculator

Revision history:

Revised by Bjarne Stroustrup November 2013
Revised by Bjarne Stroustrup May 2007
Revised by Bjarne Stroustrup August 2006
Revised by Bjarne Stroustrup August 2004
Originally written by Bjarne Stroustrup
(bs@cs.tamu.edu) Spring 2004.

This program implements a basic expression calculator.
Input from cin; output to cout.
The grammar for input is:

Statement:
Expression
Print
Quit
Print:
;

Quit:
q

Expression:
Term
Expression + Term
Expression - Term
Term:
Primary
Term * Primary
Term / Primary
Term % Primary
Primary:
Number
( Expression )
- Primary
+ Primary
Number:
floating-point-literal


Input comes from cin through the Token_stream called ts.
*/

Here we used the block comment, which starts with a /* and continues until a */. In a real program, the revision history would contain indications of what corrections and improvements were made.

Note that the comments are not the code. In fact, this grammar simplifies a bit: compare the rule for Statement with what really happens (e.g., have a peek at the code in the following section). The comment fails to explain the loop in calculate() that allows us to do several calculations in a single run of the program. We’ll return to that problem in §7.8.1.

7.7 Recovering from errors

Why do we exit when we find an error? That seemed simple and obvious at the time, but why? Couldn’t we just write an error message and carry on? After all, we often make little typing errors and such an error doesn’t mean that we have decided not to do a calculation. So let’s try to recover from an error. That basically means that we have to catch exceptions and continue after we have cleaned up any messes that were left behind.

Until now, all errors have been represented as exceptions and handled by main(). If we want to recover from errors, calculate() must catch exceptions and try to clean up the mess before trying to evaluate the next expression:

void calculate()
{
while (cin)
try {
cout << prompt;
Token t = ts.get();
while (t.kind == print) t=ts.get(); // first discard all “prints”
if (t.kind == quit) return;
ts.putback(t);
cout << result << expression() << '\n';
}
catch (exception& e) {
cerr << e.what() << '\n'; // write error message
clean_up_mess();
}
}

We simply made the while-loop’s block into a try-block that writes an error message and cleans up the mess. Once that’s done, we carry on as always.

What would “clean up the mess” entail? Basically, getting ready to compute again after an error has been handled means making sure that all our data is in a good and predictable state. In the calculator, the only data we keep outside an individual function is the Token_stream. So what we need to do is to ensure that we don’t have tokens related to the aborted calculation sitting around to confuse the next calculation. For example,

1++2*3; 4+5;

will cause an error, and 2*3; 4+5 will be left in the Token_stream’s and cin’s buffers after the second + has triggered an exception. We have two choices:

1. Purge all tokens from the Token_stream.

2. Purge all tokens from the current calculation from the Token_stream.

The first choice discards all (including 4+5;), whereas the second choice just discards 2*3;, leaving 4+5 to be evaluated. Either could be a reasonable choice and either could surprise a user. As it happens, both are about equally simple to implement. We chose the second alternative because it simplifies testing.

So we need to read input until we find a semicolon. This seems simple. We have get() to do our reading for us so we can write a clean_up_mess() like this:

void clean_up_mess() // naive
{
while (true) { // skip until we find a print
Token t = ts.get();
if (t.kind == print) return;
}
}

Unfortunately, that doesn’t work all that well. Why not? Consider this input:

1@z; 1+3;

The @ gets us into the catch-clause for the while-loop. Then, we call clean_up_mess() to find the next semicolon. Then, clean_up_mess() calls get() and reads the z. That gives another error (because z is not a token) and we find ourselves in main()’s catch(...) handler, and the program exits. Oops! We don’t get a chance to evaluate 1+3. Back to the drawing board!

We could try more elaborate trys and catches, but basically we are heading into an even bigger mess. Errors are hard to handle, and errors during error handling are even worse than other errors. So, let’s try to devise some way to flush characters out of a Token_stream that couldn’t possibly throw an exception. The only way of getting input into our calculator is get(), and that can — as we just discovered the hard way — throw an exception. So we need a new operation. The obvious place to put that is in Token_stream:

class Token_stream {
public:
Token get(); // get a Token
void putback(Token t); // put a Token back
void ignore(char c); // discard characters up to and including a c
private:
bool full {false}; // is there a Token in the buffer?
Token buffer; // here is where we keep a Token put back using
// putback()
};

This ignore() function needs to be a member of Token_stream because it needs to look at Token_stream’s buffer. We chose to make “the thing to look for” an argument to ignore() — after all, the Token_stream doesn’t have to know what the calculator considers a good character to use for error recovery. We decided that argument should be a character because we don’t want to risk composing Tokens — we saw what happened when we tried that. So we get

void Token_stream::ignore(char c)
// c represents the kind of Token
{
// first look in buffer:
if (full && c==buffer.kind) {
full = false;
return;
}
full = false;

// now search input:
char ch = 0;
while (cin>>ch)
if (ch==c) return;
}

This code first looks at the buffer. If there is a c there, we are finished after discarding that c; otherwise, we need to read characters from cin until we find a c.

We can now write clean_up_mess() rather simply:

void clean_up_mess()
{
ts.ignore(print);
}

Dealing with errors is always tricky. It requires much experimentation and testing because it is extremely hard to imagine what errors can occur. Trying to make a program foolproof is always a very technical activity; amateurs typically don’t care. Quality error handling is one mark of a professional.

7.8 Variables

Having worked on style and error handling, we can return to looking for improvements in the calculator functionality. We now have a program that works quite well; how can we improve it? The first wish list for the calculator included variables. Having variables gives us better ways of expressing longer calculations. Similarly, for scientific calculations, we’d like built-in named values, such as pi and e, just as we have on scientific calculators.

Adding variables and constants is a major extension to the calculator. It will touch most parts of the code. This is the kind of extension that we should not embark on without good reason and sufficient time. Here, we add variables and constants because it gives us a chance to look over the code again and try out some more programming techniques.

7.8.1 Variables and definitions

Obviously, the key to both variables and built-in constants is for the calculator program to keep (name,value) pairs so that we can access the value given the name. We can define a Variable like this:

class Variable {
public:
string name;
double value;
};

We will use the name member to identify a Variable and the value member to store the value corresponding to that name.

How can we store Variables so that we can search for a Variable with a given name string to find its value or to give it a new value? Looking back over the programming tools we have encountered so far, we find only one good answer: a vector of Variables:

vector<Variable> var_table;

We can put as many Variables as we like into the vector var_table and search for a given name by looking at the vector elements one after another. We can write a get_value() function that looks for a given name string and returns its corresponding value:

double get_value(string s)
// return the value of the Variable named s
{
for (const Variable& v : var_table)
if (v.name == s) return v.value;
error("get: undefined variable ", s);
}

The code really is quite simple: go through every Variable in var_table (starting with the first element and continuing until the last) and see if its name matches the argument string s. If that is the case, return its value.

Similarly, we can define a set_value() function to give a Variable a new value:

void set_value(string s, double d)
// set the Variable named s to d
{
for (Variable& v : var_table)
if (v.name == s) {
v.value = d;
return;
}
error("set: undefined variable ", s);
}

We can now read and write “variables” represented as Variables in var_table. How do we get a new Variable into var_table? What does a user of our calculator have to write to define a new variable and later to get its value? We could consider C++’s notation

double var = 7.2;

That would work, but all variables in this calculator hold double values, so saying “double” would be redundant. Could we make do with

var = 7.2;

Possibly, but then we would be unable to tell the difference between the declaration of a new variable and a spelling mistake:

var1 = 7.2; // define a new variable called var1
var1 = 3.2; // define a new variable called var2

Oops! Clearly, we meant var2 = 3.2; but we didn’t say so (except in the comment). We could live with this, but we’ll follow the tradition in languages, such as C++, that distinguish declarations (with initializations) from assignments. We could use double, but for a calculator we’d like something short, so — drawing on another old tradition — we choose the keyword let:

let var = 7.2;

The grammar would be

Calculation:
Statement
Print
Quit
Calculation Statement

Statement:
Declaration
Expression

Declaration:
"let" Name "=" Expression

Calculation is the new top production (rule) of the grammar. It expresses the loop (in calculate()) that allows us to do several calculations in a run of the calculator program. It relies on the Statement production to handle expressions and declarations. We can handle a statement like this:

double statement()
{
Token t = ts.get();
switch (t.kind) {
case let:
return declaration();
default:
ts.putback(t);
return expression();
}
}

We can now use statement() instead of expression() in calculate():

void calculate()
{
while (cin)
try {
cout << prompt;
Token t = ts.get();
while (t.kind == print) t=ts.get(); // first discard all “prints”
if (t.kind == quit) return; // quit
ts.putback(t);
cout << result << statement() << '\n';
}
catch (exception& e) {
cerr << e.what() << '\n'; // write error message
clean_up_mess();
}
}

We now have to write declaration(). What should it do? It should make sure that what comes after a let is a Name followed by a = followed by an Expression. That’s what our grammar says. What should it do with the name? We should add a Variable with that name string and the value of the expression to our vector<Variable> called var_table. Once that’s done we can retrieve the value using get_value() and change it using set_value(). However, before writing this, we have to decide what should happen if we define a variable twice. For example:

let v1 = 7;
let v1 = 8;

We chose to consider such a redefinition an error. Typically, it is simply a spelling mistake. Instead of what we wrote, we probably meant

let v1 = 7;
let v2 = 8;

There are logically two parts to defining a Variable with the name var with the value val:

1. Check whether there already is a Variable called var in var_table.

2. Add (var,val) to var_table.

We have no use for uninitialized variables. We defined the functions is_declared() and define_name() to represent those two logically separate operations:

bool is_declared(string var)
// is var already in var_table?
{
for (const Variable& v : var_table)
if (v.name == var) return true;
return false;
}
double define_name(string var, double val)
// add (var,val) to var_table
{
if (is_declared(var)) error(var," declared twice");
var_table.push_back(Variable(var,val));
return val;
}

Adding a new Variable to a vector<Variable> is easy; that’s what vector’s push_back() member function does:

var_table.push_back(Variable(var,val));

The Variable(var,val) makes the appropriate Variable and push_back(), then adds that Variable to the end of var_table. Given that, and assuming that we can handle let and name tokens, declaration() is straightforward to write:

double declaration()
// assume we have seen "let”
// handle: name = expression
// declare a variable called "name” with the initial value "expression”
{
Token t = ts.get();
if (t.kind != name) error ("name expected in declaration");
string var_name = t.name;

Token t2 = ts.get();
if (t2.kind != '=') error("= missing in declaration of ", var_name);

double d = expression();
define_name(var_name,d);
return d;
}

Note that we returned the value stored in the new variable. That’s useful when the initializing expression is nontrivial. For example:

let v = d/(t2-t1);

This declaration will define v and also print its value. Additionally, printing the value of a declared variable simplifies the code in calculate() because every statement() returns a value. General rules tend to keep code simple, whereas special cases tend to lead to complications.

This mechanism for keeping track of Variables is what is often called a symbol table and could be radically simplified by the use of a standard library map; see §21.6.1.

7.8.2 Introducing names

This is all very good, but unfortunately, it doesn’t quite work. By now, that shouldn’t come as a surprise. Our first cut never — well, hardly ever — works. Here, we haven’t even finished the program — it doesn’t yet compile. We have no '=' token, but that’s easily handled by adding a case toToken_stream::get() (§7.6.3). But how do we represent let and name as tokens? Obviously, we need to modify get() to recognize these tokens. How? Here is one way:

const char name = 'a'; // name token
const char let = 'L'; // declaration token
const string declkey = "let"; // declaration keyword

Token Token_stream::get()
{
if (full) {
full = false;
return buffer;
}
char ch;
cin >> ch;
switch (ch) {
// as before
default:
if (isalpha(ch)) {
cin.putback(ch);
string s;
cin>>s;
if (s == declkey) return Token(let); // declaration keyword
return Token{name,s};
}
error("Bad token");
}
}

Note first of all the call isalpha(ch). This call answers the question “Is ch a letter?”; isalpha() is part of the standard library that we get from std_lib_facilities.h. For more character classification functions, see §11.6. The logic for recognizing names is the same as that for recognizing numbers: find a first character of the right kind (here, a letter), then put it back using putback() and read in the whole name using >>.

Unfortunately, this doesn’t compile; we have no Token that can hold a string, so the compiler rejects Token{name,s}. To handle that, we must modify the definition of Token to hold either a string or a double, and handle three forms of initializers, such as

• Just a kind; for example, Token{'*'}

• A kind and a number; for example, Token{number,4.321}

• A kind and a name; for example, Token{name,"pi"}

We handle that by introducing three initialization functions, known as constructors because they construct objects:

class Token {
public:
char kind;
double value;
string name;
Token(char ch) :kind{ch} { } // initialize kind with ch
Token(char ch, double val) :kind{ch}, value{val} { } // initialize kind
// and value
Token(char ch, string n) :kind{ch}, name{n} { } // initialize kind
// and name
};

Constructors add an important degree of control and flexibility to initialization. We will examine constructors in detail in Chapter 99.4.2, §9.7).

We chose 'L' as the representation of the let token and the string let as our keyword. Obviously, it would be trivial to change that keyword to double, var, #, or whatever by changing the string declkey that we compare s to.

Now we try the program again. If you type this, you’ll see that it all works:

let x = 3.4;
let y = 2;
x + y * 2;

However, this doesn’t work:

let x = 3.4;
let y = 2;
x+y*2;

What’s the difference between those two examples? Have a look to see what happens.

The problem is that we were sloppy with our definition of Name. We even “forgot” to define our Name production in the grammar (§7.8.1). What characters can be part of a name? Letters? Certainly. Digits? Certainly, as long as they are not the starting character. Underscores? Eh? The +character? Well? Eh? Look at the code again. After the initial letter we read into a string using >>. That accepts every character until it sees whitespace. So, for example, x+y*2; is a single name — even the trailing semicolon is read as part of the name. That’s unintended and unacceptable.

What must we do instead? First we must specify precisely what we want a name to be, and then we must modify get() to do that. Here is a workable specification of a name: a sequence of letters and digits starting with a letter. Given this definition,

a
ab
a1
Z12
asdsddsfdfdasfdsa434RTHTD12345dfdsa8fsd888fadsf

are names and

1a
as_s
#
as*
a car

are not. Except for leaving out the underscore, this is C++’s rule. We can implement that in the default case of get():

default:
if (isalpha(ch)) {
string s;
s += ch;
while (cin.get(ch) && (isalpha(ch) || isdigit(ch))) s+=ch;
cin.putback(ch);
if (s == declkey) return Token{let}; // declaration keyword
return Token{name,s};
}
error("Bad token");

Instead of reading directly into the string s, we read characters and put those into s as long as they are letters or digits. The s+=ch statement adds (appends) the character ch to the end of the string s. The curious statement

while (cin.get(ch) && (isalpha(ch) || isdigit(ch))) s+=ch;

reads a character into ch (using cin’s member function get()) and checks if it is a letter or a digit. If so, it adds ch to s and reads again. The get() member function works just like >> except that it doesn’t by default skip whitespace.

7.8.3 Predefined names

Now that we have names, we can easily predefine a few common ones. For example, if we imagine that our calculator will be used for scientific calculations, we’d want pi and e. Where in the code would we define those? In main() before the call of calculate() or in calculate() before the loop. We’ll put them in main() because those definitions really aren’t part of any calculation:

int main()
try {
// predefine names:
define_name("pi",3.1415926535);
define_name("e",2.7182818284);

calculate();

keep_window_open(); // cope with Windows console mode
return 0;
}
catch (exception& e) {
cerr << e.what() << '\n';
keep_window_open("~~");
return 1;
}
catch (...) {
cerr << "exception \n";
keep_window_open("~~");
return 2;
}

7.8.4 Are we there yet?

Not really. We have made so many changes that we need to test everything again, clean up the code, and review the comments. Also, we could do more definitions. For example, we “forgot” to provide an assignment operator (see exercise 2), and if we have an assignment we might want to distinguish between variables and constants (exercise 3).

Initially, we backed off from having named variables in our calculator. Looking back over the code that implements them, we may have two possible reactions:

1. Implementing variables wasn’t all that bad; it took only about three dozen lines of code.

2. Implementing variables was a major extension. It touched just about every function and added a completely new concept to the calculator. It increased the size of the calculator by 45% and we haven’t even implemented assignment!

Image

In the context of a first program of significant complexity, the second reaction is the correct one. More generally, it’s the right reaction to any suggestion that adds something like 50% to a program in terms of both size and complexity. When that has to be done, it is more like writing a new program based on a previous one than anything else, and it should be treated that way. In particular, if you can build a program in stages as we did with the calculator, and test it at each stage, you are far better off doing so than trying to do the whole program all at once.

Image Drill

1. Starting from the file calculator08buggy.cpp, get the calculator to compile.

2. Go through the entire program and add appropriate comments.

3. As you commented, you found errors (deviously inserted especially for you to find). Fix them; they are not in the text of the book.

4. Testing: prepare a set of inputs and use them to test the calculator. Is your list pretty complete? What should you look for? Include negative values, 0, very small, very large, and “silly” inputs.

5. Do the testing and fix any bugs that you missed when you commented.

6. Add a predefined name k meaning 1000.

7. Give the user a square root function sqrt(), for example, sqrt(2+6.7). Naturally, the value of sqrt(x) is the square root of x; for example, sqrt(9) is 3. Use the standard library sqrt() function that is available through the header std_lib_facilities.h. Remember to update the comments, including the grammar.

8. Catch attempts to take the square root of a negative number and print an appropriate error message.

9. Allow the user to use pow(x,i) to mean “Multiply x with itself i times”; for example, pow(2.5,3) is 2.5*2.5*2.5. Require i to be an integer using the technique we used for %.

10. Change the “declaration keyword” from let to #.

11. Change the “quit keyword” from quit to exit. That will involve defining a string for quit just as we did for let in §7.8.2.

Review

1. What is the purpose of working on the program after the first version works? Give a list of reasons.

2. Why does 1+2; q typed into the calculator not quit after it receives an error?

3. Why did we choose to make a constant character called number?

4. We split main() into two separate functions. What does the new function do and why did we split main()?

5. Why do we split code into multiple functions? State principles.

6. What is the purpose of commenting and how should it be done?

7. What does narrow_cast do?

8. What is the use of symbolic constants?

9. Why do we care about code layout?

10. How do we handle % (remainder) of floating-point numbers?

11. What does is_declared() do and how does it work?

12. The input representation for let is more than one character. How is it accepted as a single token in the modified code?

13. What are the rules for what names can and cannot be in the calculator program?

14. Why is it a good idea to build a program incrementally?

15. When do you start to test?

16. When do you retest?

17. How do you decide what should be a separate function?

18. How do you choose names for variables and functions? List possible reasons.

19. Why do you add comments?

20. What should be in comments and what should not?

21. When do we consider a program finished?

Terms

code layout

commenting

error handling

feature creep

maintenance

recovery

revision history

scaffolding

symbolic constant

testing

Exercises

1. Allow underscores in the calculator’s variable names.

2. Provide an assignment operator, =, so that you can change the value of a variable after you introduce it using let. Discuss why that can be useful and how it can be a source of problems.

3. Provide named constants that you really can’t change the value of. Hint: You have to add a member to Variable that distinguishes between constants and variables and check for it in set_value(). If you want to let the user define constants (rather than just having pi and e defined as constants), you’ll have to add a notation to let the user express that, for example, const pi = 3.14;.

4. The get_value(), set_value(), is_declared(), and define_name() functions all operate on the variable var_table. Define a class called Symbol_table with a member var_table of type vector<Variable> and member functions get(), set(), is_declared(), and declare(). Rewrite the calculator to use a variable of type Symbol_table.

5. Modify Token_stream::get() to return Token(print) when it sees a newline. This implies looking for whitespace characters and treating newline ('\n') specially. You might find the standard library function isspace(ch), which returns true if ch is a whitespace character, useful.

6. Part of what every program should do is to provide some way of helping its user. Have the calculator print out some instructions for how to use the calculator if the user presses the H key (both upper- and lowercase).

7. Change the q and h commands to be quit and help, respectively.

8. The grammar in §7.6.4 is incomplete (we did warn you against overreliance on comments); it does not define sequences of statements, such as 4+4; 5-6;, and it does not incorporate the grammar changes outlined in §7.8. Fix that grammar. Also add whatever you feel is needed to that comment as the first comment of the calculator program and its overall comment.

9. Suggest three improvements (not mentioned in this chapter) to the calculator. Implement one of them.

10. Modify the calculator to operate on ints (only); give errors for overflow and underflow. Hint: Use narrow_cast (§7.5).

11. Revisit two programs you wrote for the exercises in Chapter 4 or 5. Clean up that code according to the rules outlined in this chapter. See if you find any bugs in the process.

Postscript

As it happens, we have now seen a simple example of how a compiler works. The calculator analyzes input broken down into tokens and understood according to a grammar. That’s exactly what a compiler does. After analyzing its input, a compiler then produces a representation (object code) that we can later execute. The calculator immediately executes the expressions it has analyzed; programs that do this are called interpreters rather than compilers.