A Mathematical Expression Parser - 10 Lessons About C++ (2015)

10 Lessons About C++(2015)

Chapter 6: A Mathematical Expression Parser

When implementing your own calculator it is necessary to be able to transform mathematical expressions into a format that is suitable for evaluation by computers. Expressions like ( 1 + 8 ) - ( ( 3 * 4 ) / 2 ) (known as “infix notation“) appear simple and intuitive to humans, but are less straightforward to implement in a programming language.

The shunting-yard algorithm is a method for parsing mathematical expressions written in infix notation into a format known as Reverse Polish Notation (RPN). RPN notation is different to infix notation in that every operator (+, -, * etc) comes after the operands (numbers) and there are no parentheses (brackets). This structure is more suited to evaluation by computers.

The infix expression 3 * 4 for example becomes 3 4 *.

Pseudocode for the shunting-yard algorithm to convert an infix expression into Reverse Polish Notation:

While ( tokens exist )

Read token.

If ( token is a number}

add token to output queue

If ( token is a function )

push token onto stack

If ( token is a function argument separator eg a comma)

While ( token at the top of the stack is NOT a left parenthesis )

pop operators off stack onto output queue.

If ( no left parentheses were encountered )

Error - invalid input

If ( token is an operator, o1)

While ( operator token, o2, exists at the top of the stack AND ( o1 is left-associative and its precedence <= o2 OR o1 is right associative and its precedence < o2 )

pop o2 off the stack onto the output queue

push o1 onto the stack

If ( token is a left parenthesis )

push token onto the stack.

If ( token is right parenthesis )

While( token at the top of the stack is NOT a left parenthesis )

pop operators off the stack onto the output queue

Pop left parenthesis from the stack, but not onto output queue.

If ( token at the top of the stack is a function token )

pop token onto the output queue.

If ( stack has run out without finding a left parenthesis )

Error - mismatched parentheses.

While ( operator tokens remain in the stack )

If ( operator token on the top of the stack is a parenthesis )

Error - mismatched parentheses.

Pop the operator onto the output queue.

When the input expression becomes available in Reverse Polish Notation, it is possible to employ another stack-based algorithm to evaluate the RPN expression and hence the arithmetic result.

The pseudocode is as follows:

For each token

If ( token is a number )

Push value onto stack

If ( token is an operator )

Pop the 2 x top values from the stack

Evaluate the operator using popped values ar arguments

Push result on to the stack

The single value remaining on the stack is the evaluation.

Full code implementation as follows:

ExpressionParser.h

#include <algorithm>

#include <string>

#include <vector>

#include <list>

#include <iostream>

#include <fstream>

#include <iterator>

#include <queue>

#include <stack>

#include <sstream>

#include <locale>

#include <stdlib.h>

#include <math.h>

class ExpressionParser

{

public:

ExpressionParser( const std::string& input );

bool MatchingParetheses();

bool Evaluate( const std::vector<std::string>& rpn, std::string& result );

bool InfixToRPN( std::vector<std::string>& inputs );

private:

void ReplaceAll( std::string& str,

const std::string& from,

const std::string& to );

void Tokenize( std::list<std::string>& tokens,

const std::string& delimiter );

private:

std::string m_strInput;

};

ExpressionParser.cpp

#include "ExpressionParser.h"

const std::string charSet[] = { "(", ")", "%", "+", "-", "*", "/", "^", "," };

const double pi = 3.1415927;

const double e = 2.71828182846;

template<class BidirIt>

BidirIt Prev(BidirIt it,

typename std::iterator_traits<BidirIt>::difference_type n = 1)

{

std::advance(it, -n);

return it;

}

int Modulo(int num, int div)

{

int mod = num % div;

return ( num >= 0 || mod == 0 ) ? mod : div + mod;

}

unsigned int OpArgCount( const std::string& s )

{

unsigned int val = 1;

if ( s == "*" || s == "/" || s == "%" ||

s == "+" || s == "-" || s == "=" ||

s == "^" || s == "POW" )

{

val = 2;

}

else if ( s == "!" )

{

val = 1;

}

return val;

}

// Return operator precedence

int OpPrecedence(const std::string& s)

{

int precedence = 1;

if ( s == "!" )

precedence = 4;

else if ( s == "*" || s == "/" || s == "%" )

precedence = 3;

else if ( s == "+" || s == "-" )

precedence = 2;

else if ( s == "=" )

precedence = 1;

return precedence;

}

// Return true if left associative; false otherwise

bool OpLeftAssoc( const std::string& s )

{

bool opLeftAssoc = false;

if ( s == "*" || s == "/" ||

s == "%" || s == "+" || s == "-" )

{

opLeftAssoc = true;

}

else if ( s == "=" || s == "!" )

{

opLeftAssoc = false;

}

return opLeftAssoc;

}

// Is token an operator

bool IsOperator( const std::string& s )

{

return s == "+" || s == "-" ||

s == "/" || s == "*" ||

s == "!" || s == "%" ||

s == "=";

}

// Is token a function argument separator eg comma

bool IsComma( const std::string& s )

{

return s == ",";

}

// Convert string into all uppercase

std::string UpperCase( std::string input )

{

for ( std::string::iterator it = input.begin();

it != input.end();

++it )

*it = toupper(*it);

return input;

}

// Is token PI

bool IsPi( const std::string& s )

{

if ( UpperCase( s ) == "PI" )

return true;

return false;

}

// Is token Euler's constant

bool IsE( const std::string& s )

{

if ( UpperCase( s ) == "E" )

{

return true;

}

return false;

}

// Is the token a function

bool IsFunction( const std::string& s )

{

std::string str = UpperCase( s );

bool isFunction = false;

if ( str.find( "^" ) != std::string::npos ||

str.find( "SIN" ) != std::string::npos ||

str.find( "COS" ) != std::string::npos ||

str.find( "TAN" ) != std::string::npos ||

str.find( "LN" ) != std::string::npos ||

str.find( "LOG" ) != std::string::npos ||

str.find( "EXP" ) != std::string::npos ||

str.find( "POW" ) != std::string::npos ||

str.find( "SQRT" ) != std::string::npos )

{

isFunction = true;

}

return isFunction;

}

// Is the number a float

bool IsFloat( const std::string& s )

{

std::istringstream iss( s );

float f;

iss >> std::noskipws >> f;

return iss.eof() && !iss.fail();

}

// Is the string a number

bool IsNumber( const std::string& s )

{

std::string::const_iterator it = s.begin();

while ( it != s.end() && std::isdigit(*it, std::locale() ) )

{

++it;

}

return !s.empty() && it == s.end();

}

ExpressionParser::ExpressionParser( const std::string& input )

{

m_strInput = input;

}

// Determine if matching number of left and right parentheses

bool ExpressionParser::MatchingParetheses()

{

const size_t nLeft = std::count( m_strInput.begin(),

m_strInput.end(), '(');

const size_t nRight = std::count( m_strInput.begin(),

m_strInput.end(), ')');

return nLeft == nRight && !m_strInput.empty();

}

// Split selected text into delimited vector array of strings

void ExpressionParser::Tokenize( std::list<std::string>& tokens,

const std::string& delimiter )

{

// Insert whitepaces before and after each special characters

size_t size = sizeof( charSet ) / sizeof( std::string );

for ( int i = 0; i < static_cast<int>( size ); i++ ) {

std::string s = charSet[ i ];

ReplaceAll( m_strInput, s, " " + s + " " );

}

size_t next_pos = 0;

size_t init_pos = m_strInput.find_first_not_of( delimiter, next_pos );

while ( next_pos != std::string::npos &&

init_pos != std::string::npos ) {

next_pos = m_strInput.find( delimiter, init_pos );

std::string token = m_strInput.substr( init_pos, next_pos - init_pos );

tokens.push_back( token );

init_pos = m_strInput.find_first_not_of( delimiter, next_pos );

}

std::string firstToken = tokens.front();

if ( firstToken == "-" ) {

std::list<std::string>::iterator it = tokens.begin();

it++;

if ( it == tokens.end() ) return;

std::string nextToken = *( it );

if ( IsNumber( nextToken ) || IsFloat( nextToken ) ) {

tokens.pop_front();

tokens.front() = firstToken + nextToken;

}

else if ( nextToken == "(" || IsFunction( nextToken ) ) {

tokens.front() = firstToken + "1";

tokens.insert( it, "*" );

}

// minus minus is a plus

else if ( nextToken == "-" && firstToken == "-" ) {

std::list<std::string>::iterator nit = it;

std::advance ( nit, -1 );

tokens.erase( it );

tokens.erase( nit );

}

}

// Deal with minus sign after opening parenthesis or operator

typedef std::list<std::string>::iterator t_iter;

std::string prevToken = "";

for ( t_iter it = tokens.begin(); it != Prev( tokens.end() ); it++ ) {

std::string token = *it;

std::list<std::string>::iterator nit = it;

std::advance ( nit, 1 );

if ( nit == tokens.end() ) break;

std::string ntoken = *nit;

if ( token == "-" && prevToken == "(" ) {

if ( IsNumber( ntoken ) || IsFloat( ntoken ) ) {

tokens.erase( nit );

*it = "-" + ntoken;

token = *it;

}

}

else if ( token == "-" &&

( IsOperator( prevToken ) || prevToken == "^" || prevToken == "%" ) ) {

// Minus minus becomes a plus

if ( token == "-" && prevToken == "-" ) {

std::list<std::string>::iterator nit = it;

std::list<std::string>::iterator nnit = nit;

nnit++;

std::advance ( nit, -1 );

tokens.erase( it );

*nit = "+";

std::list<std::string>::iterator pnit = nit;

std::advance ( pnit, -1 );

if ( IsOperator( *pnit ) || *pnit == "(" )

tokens.erase( nit );

token = *nnit; it = nnit;

if ( it == Prev( tokens.end() ) ) break;

}

else if ( IsNumber( ntoken ) ||

IsFloat( ntoken ) ||

IsFunction( ntoken ) ) {

bool exit = false;

if ( nit == Prev( tokens.end() ) ) exit = true;

tokens.erase( nit );

*it = "-" + ntoken;

token = *it;

if ( exit ) break;

}

else if ( ntoken == "(" ) {

*it = "-1";

token = *it;

tokens.insert( nit, "*" );

}

}

prevToken = token;

}

// Deal with minus sign before opening parenthesis

prevToken = ""; t_iter prevIt;

for ( t_iter it = tokens.begin(); it != tokens.end(); it++ ) {

std::string token = *it;

if ( token == "(" && prevToken == "-" ) {

tokens.insert( it, "1" );

tokens.insert( it, "*" );

}

prevToken = token;

prevIt = it;

}

}

// Replace all instances of selected string with replacement string

void ExpressionParser::ReplaceAll( std::string& str, const std::string& from,

const std::string& to )

{

size_t start_pos = 0;

while( (start_pos = str.find(from, start_pos)) != std::string::npos)

{

str.replace(start_pos, from.length(), to);

start_pos += to.length();

}

}

// Deduce the numerical result from the RPN string passed to it

bool ExpressionParser::Evaluate(

const std::vector<std::string>& rpn,

std::string& result )

{

typedef std::vector<std::string>::const_iterator rpn_iter;

std::stack<std::string> stack;

for ( rpn_iter it = rpn.begin(); it != rpn.end(); it++ )

{

std::string token = *it;

if( IsNumber( token ) || IsFloat( token ) ||

IsPi( token ) || IsE( token ) )

{

if ( IsPi( token ) )

{

std::stringstream s;

s << pi;

token = s.str();

}

else if ( IsE( token ) )

{

std::stringstream s;

s << e;

token = s.str();

}

stack.push( token );

}

else if( IsOperator( token ) || IsFunction( token ) )

{

double result = 0.0;

unsigned int nargs = OpArgCount( UpperCase( token ) );

bool isUnary = false;

unsigned int stackArgs = stack.size();

std::vector<double> args;

if( stackArgs < nargs)

{

// For dealing with unary '-' or unary '+'

if ( stackArgs == 1 &&

nargs == 2 && ( token == "+" || token == "-" ) )

{

std::string value = stack.top();

result = strtod( value.c_str(), NULL );

stack.pop(); isUnary = true;

}

else { // (Error) Insufficient values in the expression.

return false;

}

}

else

{

// Else, Pop the top n values from the stack.

while ( nargs > 0 )

{

std::string value = stack.top();

double d = strtod( value.c_str(), NULL );

args.push_back( d );

stack.pop(); nargs--;

}

}

// Evaluate the operator, with the values as arguments.

if ( IsOperator( token ) && !isUnary )

{

double d2 = args[ 0 ];

double d1 = args[ 1 ];

if ( token == "+" ) {

result = d1 + d2;

}

else if ( token == "-" ) {

result = d1 - d2;

}

else if ( token == "*" ) {

result = d1 * d2;

}

else if ( token == "/" ) {

result = d1 / d2;

}

else if ( token == "%" ) {

int i2 = (int) args[ 0 ];

int i1 = (int) args[ 1 ];

double iresult = Modulo( i1, i2 );

result = iresult;

}

}

else if ( IsFunction( token ) )

{

double d0 = args[ 0 ];

std::string capToken = UpperCase( token );

double mult = token.find( "-" ) != std::string::npos ? -1 : 1;

if ( capToken.find( "SIN" ) != std::string::npos ) {

result = sin( d0 );

}

else if ( capToken.find( "COS" ) != std::string::npos ) {

result = cos( d0 );

}

else if ( capToken.find( "TAN" ) != std::string::npos ) {

result = tan( d0 );

}

else if ( capToken.find( "LN" ) != std::string::npos ) {

result = log( d0 );

}

else if ( capToken.find( "LOG" ) != std::string::npos ) {

result = log10( d0 );

}

else if ( capToken.find( "EXP" ) != std::string::npos ) {

result = exp( d0 );

}

else if ( capToken.find( "^" ) != std::string::npos ) {

double d2 = args[ 0 ]; double d1 = args[ 1 ];

if ( d1 < 0 ) mult = -1.0;

result = pow( (double) d1, d2);

}

else if ( capToken.find( "POW" ) != std::string::npos ) {

double d2 = args[ 0 ]; double d1 = args[ 1 ];

result = pow( d1, d2);

}

else if ( capToken.find( "SQRT" ) != std::string::npos ) {

result = sqrt( d0 );

}

result *= mult;

}

// Push the returned results, if any, back onto the stack

if ( result == (long) result )

{

result = long( result );

}

std::stringstream s; s << result;

stack.push( s.str() );

}

}

if ( stack.size() == 1 )

{

result = stack.top();

double res = strtod( result.c_str(), NULL );

if ( res == (long) res )

{

long lres = (long) res;

std::stringstream s;

s << lres;

result = s.str();

}

return true;

}

return false; // (Error) The user input has too many values

}

// Convert infix expression format into reverse Polish notation

bool ExpressionParser::InfixToRPN( std::vector<std::string>& inputs )

{

typedef std::list<std::string>::const_iterator tok_iter;

std::list<std::string> infix;

std::stack<std::string> stack;

std::queue<std::string> outputQueue;

Tokenize( infix, " " );

bool success = true;

for ( tok_iter it = infix.begin(); it != infix.end(); it++ )

{

std::string token = *it;

if ( IsNumber( token ) ||

IsFloat( token ) ||

IsPi( token ) ||

IsE( token ) )

{

outputQueue.push( token );

}

else if ( IsFunction( token ) )

{

stack.push( token );

}

else if ( IsComma( token ) )

{

std::string stackToken = stack.top();

while ( stackToken != "(" )

{

outputQueue.push( stackToken );

stack.pop(); stackToken = stack.top();

}

if ( stackToken == "(" )

{

success = true;

}

else

{

success = false;

}

}

else if ( IsOperator( token ) )

{

while( !stack.empty() &&

IsOperator( stack.top() ) &&

( ( OpLeftAssoc( token ) && OpPrecedence( token ) == OpPrecedence( stack.top() ) ) ||

( OpPrecedence( token ) < OpPrecedence( stack.top() ) ) ) ) {

std::string stackToken = stack.top();

stack.pop(); outputQueue.push( stackToken );

}

stack.push( token );

}

else if ( token == "(" )

{

stack.push(token);

}

else if ( token == ")" )

{

while ( !stack.empty() && stack.top() != "(" )

{

outputQueue.push( stack.top() );

stack.pop();

}

if ( !stack.empty() )

{

std::string stackToken = stack.top();

if ( stackToken != "(" )

success = false;

}

else

{

return false;

}

stack.pop();

if ( !stack.empty() )

{

std::string stackToken = stack.top();

if ( IsFunction( stackToken ) )

{

outputQueue.push( stackToken );

stack.pop();

}

}

}

}

while ( !stack.empty() )

{

outputQueue.push( stack.top() );

stack.pop();

}

while ( !outputQueue.empty() ) {

std::string token = outputQueue.front();

inputs.push_back( token );

outputQueue.pop();

}

return success;

}

main.cpp

#include "ExpressionParser.h"

#include <assert.h>

#include <sstream>

// Print iterators in a generic way

template<typename T, typename InputIterator>

void Print( const std::string& message,

const InputIterator& itbegin,

const InputIterator& itend,

const std::string& delimiter)

{

std::cout << message << std::endl;

std::copy(itbegin, itend,

std::ostream_iterator<T>(std::cout, delimiter.c_str()));

std::cout << std::endl;

}

std::pair<std::string, double> tests[] =

{

std::make_pair<std::string, double>( "((2*(6-1))/2)*4", 20.0 ),

std::make_pair<std::string, double>( "-11 ^ -7", 5.13158e-08 ),

std::make_pair<std::string, double>( "-1*- sin( Pi / -2)", -1.0 ),

std::make_pair<std::string, double>( "-8 + 5", -3.0 ),

std::make_pair<std::string, double>( "2--2", 4.0 ),

std::make_pair<std::string, double>( "34.5*(23+1.5)/2", 422.625 ),

std::make_pair<std::string, double>( "3/2 + 4*(12+3)", 61.5 ),

std::make_pair<std::string, double>( "pow(2, 3)", 8 ),

std::make_pair<std::string, double>( "((2*(6-1))/2)*4", 20 ),

std::make_pair<std::string, double>( "ln(2)+3^5", 243.693147181 ),

std::make_pair<std::string, double>( "-8+3", -5 ),

std::make_pair<std::string, double>( "2.5^3", 15.625 ),

std::make_pair<std::string, double>( "SQRT(4 )", 2 ),

std::make_pair<std::string, double>( "5 + (-1 + 2 )", 6 ),

std::make_pair<std::string, double>( "-(2+3)*(4*-10-1)+100", 305 ),

std::make_pair<std::string, double>( "(2+3)*-(4*10-1)+100", -95 ),

std::make_pair<std::string, double>( "1 - (-2^2) - 1", 4 ),

std::make_pair<std::string, double>( "1 - (--2^2) -- 1", -2 ),

std::make_pair<std::string, double>( "(4*-10-1)", -41 ),

std::make_pair<std::string, double>( "-(2+3)*(4*-10-1)", 205 ),

std::make_pair<std::string, double>( "-3/2 + 4*-( 12+3)", -61.5 ),

std::make_pair<std::string, double>( "-3/2 + 4*(-12-3)", -61.5 ),

std::make_pair<std::string, double>( "-3/2 + -4*(-12-3)", 58.5 ),

std::make_pair<std::string, double>( "1--cos(PI)", 0.0 ),

};

double string_to_double( const std::string& s )

{

std::istringstream i(s);

double x;

if (!(i >> x))

return 0;

return x;

}

int main(int argc, char** argv)

{

std::string result;

size_t test_size = sizeof( tests ) / sizeof( tests[ 0 ] );

for ( int i = 0; i < test_size; ++i )

{

std::string originalInput = tests[ i ].first;

double expected_result = tests[ i ].second;

Print<char, std::string::iterator>(

"Input expression:",

originalInput.begin(),

originalInput.end(), "" );

ExpressionParser parser( originalInput );

if ( !parser.MatchingParetheses() )

{

std::cout << "Error: mismatched parentheses or empty input"

<< std::endl;

return 1;

}

std::vector<std::string> RPN;

if ( parser.InfixToRPN( RPN ) )

{

Print<std::string, std::vector<std::string>::const_iterator>(

"RPN tokens: ", RPN.begin(),

RPN.end(),

" " );

std::string str_result;

if ( parser.Evaluate( RPN, str_result ) )

{

std::cout << "Result = " << str_result << std::endl;

std::cout << std::endl;

double result = string_to_double( str_result );

// Compare expected vs actual to within a precision: 1

assert( fabs(result-expected_result) < 0.001 );

}

}

else

{

std::cout << "Error: mismatched parentheses" << std::endl;

}

}

return 0;

}

The result of running through each of the example test inputs and then comparing their expected output with the actual output is here:

Chapter Summary

As a result of completing this lesson, you now have:

·A working example of a mathematical expression parser based on the shunting-yard algorithm.

·A better understanding of how to convert pseudocode (code written for human understanding, not a computer) into working C++ code.