Advanced Topics and Pitfalls in JavaScript - Functional Programming in JavaScript (2015)

Functional Programming in JavaScript (2015)

Chapter 6. Advanced Topics and Pitfalls in JavaScript

JavaScript has been called the "assembly language of the web". The analogy (it isn't perfect, but which analogy is?) draws from the fact that JavaScipt is often a target for compilation, namely from Clojure and CoffeeScript, but also from many other sources such as pyjamas (python to JS) and Google Web Kit (Java to JS).

But the analogy also references the foolish idea that JavaScript is as expressive and low-level as x86 assembly. Perhaps this notion stems from the fact that JavaScript has been bashed for its design flaws and oversights ever since it was first shipped with Netscape back in 1995. It was developed and released in a hurry, before it could be fully developed. And because of that, some questionable design choices made its way into JavaScript, the language that soon became the de-facto scripting language of the web. Semicolons were a big mistake. So were its ambiguous methods for defining functions. Is it var foo = function(); or function foo();?

Functional programming is an excellent way to side-step some of these mistakes. By focusing on the fact that JavaScript is truly a functional language, it becomes clear that, in the preceding example about the different ways to declare a function, it's best to declare functions as variables. And that semicolons are mostly just syntactic sugar to make JavaScript appear more C-like.

But always remember the language you are working with. JavaScript, like any other language, has its pitfalls. And, when programming in a style that often skirts the bleeding edge of what's possible, those minor stumbles can become non-recoverable gotchas. Some of these gotchas include:

· Recursion

· Variable scope and closures

· Function declarations vs. function expressions

However, these issues can be overcome with a little attention.

Recursion

Recursion is very important to functional programming in any language. Many functional languages go so far as to require recursion for iteration by not providing for and while loop statements; this is only possible when tail-call elimination is guaranteed by the language, which is not the case for JavaScript. A quick primer on recursion was given in Chapter 2, Fundamentals of Functional Programming. But in this section, we'll dig deeper into exactly how recursion works in JavaScript.

Tail recursion

JavaScript's routine for handling recursion is known as tail recursion, a stack-based implementation of recursion. This means that, for every recursive call, there is a new frame in the stack.

To illustrate the problems that can arise from this method, let's use the classic recursive algorithm for factorials.

var factorial = function(n) {

if (n == 0) {

// base case

return 1;

}

else {

// recursive case

return n * factorial(n-1);

}

}

The algorithm will call itself n times to get the answer. It's literally computing (1 x 1 x 2 x 3 x … x N). That means the time complexity is O(n).

Note

O(n), pronounced "big oh to the n," means that the complexity of the algorithm will grow at a rate of n as the size of the input grows, which is leaner growth. O(n2) is exponential growth, O(log(n)) is logarithmic growth, and so on. This notation can be used for time complexity as well as space complexity.

But, because a new frame in the memory stack is allocated for each iteration, the space complexity is also O(n). This is a problem. This means that memory will be consumed at such a rate the memory limit will be exceeded far too easily. On my laptop, factorial(23456) returns Uncaught Error: RangeError: Maximum call stack size exceeded.

While calculating the factorial of 23,456 is a frivolous endeavor, you can be assured that many problems that are solved with recursion will grow to that size without too much trouble. Consider the case of data trees. The tree could be anything: search applications, file systems, routing tables, and so on. Below is a very simple implementation of the tree traversal function:

var traverse = function(node) {

node.doSomething(); // whatever work needs to be done

node.childern.forEach(traverse); // many recursive calls

}

With just two children per node, both time complexity and space complexity, (in the worst case, where the entire tree must be traversed to find the answer), would be O(n2) because there would be two recursive calls each. With many children per node, the complexity would be O(nm) where mis the number of children. And recursion is the preferred algorithm for tree traversal; a while loop would be much more complex and would require the maintenance of a stack.

Exponential growth like this would mean that it would not take a very large tree to throw a RangeError exception. There must be a better way.

The Tail-call elimination

We need a way to eliminate the allocation of new stack frames for every recursive call. This is known as tail-call elimination.

With tail-call elimination, when a function returns the result of calling itself, the language doesn't actually perform another function call. It turns the whole thing into a loop for you.

OK, so how do we do this? With lazy evaluation. If we could rewrite it to fold over a lazy sequence, such that the function returns a value or it returns the result of calling another function without doing anything with that result, then new stack frames don't need to be allocated.

To put it in "tail recursion form", the factorial function would have to be rewritten such that the inner procedure fact calls itself last in the control flow, as shown in the following code snippet:

var factorial = function(n) {

var _fact = function(x, n) {

if (n == 0) {

// base case

return x;

}

else {

// recursive case

return _fact(n*x, n-1);

}

}

return fact(1, n);

}

Note

Instead of having the result produced by the first function in the recursion tail (like in n * factorial(n-1)), the result is computed going down the recursion tail (with the call to _fact(r*n, n-1)) and is produced by the last function in this tail (with return r;). The computation goes only one way down, not on its way up. It's relatively easy to process it as an iteration for the interpreter.

However, tail-call elimination does not work in JavaScript. Put the above code into your favorite JavaScript engine and factorial(24567) still returns Uncaught Error: RangeError: Maximum call stack size exceeded exception. Tail-call elimination is listed as a new feature to be included in the next release of ECMAScript, but it will be some time before all browsers implement it.

JavaScript cannot optimize functions that are put into tail recursion form. It's a feature of the language specification and runtime interpreter, plain and simple. It has to do with how the interpreter acquires resources for stack frames. Some languages will reuse the same stack frame when it doesn't need to remember anything new, like in the preceding function. This is how tail-call elimination reduces both time and space complexity.

Unfortunately, JavaScript does not do this. But if it did, it would reorganize the stack frames from this:

call factorial (3)

call fact (3 1)

call fact (2 3)

call fact (1 6)

call fact (0 6)

return 6

return 6

return 6

return 6

return 6

into the following:

call factorial (3)

call fact (3 1)

call fact (2 3)

call fact (1 6)

call fact (0 6)

return 6

return 6

Trampolining

The solution? A process known as trampolining. It's a way to "hack" the concept of tail-call elimination into a program by using thunks.

Note

Thunks are, for this purpose, expressions with arguments that wrap anonymous functions with no arguments of their own. For example: function(str){return function(){console.log(str)}}. This prevents the expression from being evaluated until a receiving function calls the anonymous function.

A trampoline is a function that takes a function as input and repeatedly executes its returned value until something other than a function is returned. A simple implementation is shown in the following code snippet:

var trampoline = function(f) {

while (f && f instanceof Function) {

f = f.apply(f.context, f.args);

}

return f;

}

To actually implement tail-call elimination, we need to use thunks. For this, we can use the bind() function that allows us to apply a method to one object with the this keyword assigned to another. Internally, it's the same as the call keyword, but it's chained to the method and returns a new bound function. The bind() function actually does partial application, though in a very limited way.

var factorial = function(n) {

var _fact = function(x, n) {

if (n == 0) {

// base case

return x;

}

else {

// recursive case

return _fact.bind(null, n*x, n-1);

}

}

return trampoline(_fact.bind(null, 1, n));

}

But writing the fact.bind(null, ...) method is cumbersome and would confuse anybody reading the code. Instead, let's write our own function for creating thunks. There are a few things the thunk() function must do:

· thunk() function must emulate the _fact.bind(null, n*x, n-1) method that returns a non-evaluated function

· The thunk() function should enclose two more functions:

o For processing the give function, and

o For processing the function arguments that will be used when the given function is invoked

With that, we're ready to write the function. We only need a few lines of code to write it.

var thunk = function (fn) {

return function() {

var args = Array.prototype.slice.apply(arguments);

return function() { return fn.apply(this, args); };

};

};

Now we can use the thunk() function in our factorial algorithm like this:

var factorial = function(n) {

var fact = function(x, n) {

if (n == 0) {

return x;

}

else {

return thunk(fact)(n * x, n - 1);

}

}

return trampoline(thunk(fact)(1, n));

}

But again, we can simplify it just a bit further by defining the _fact() function as a thunk() function. By defining the inner function as a thunk() function, we're relieved of having to use the thunk() function both inside the inner function definition and in the return statement.

var factorial = function(n) {

var _fact = thunk(function(x, n) {

if (n == 0) {

// base case

return x;

}

else {

// recursive case

return _fact(n * x, n - 1);

}

});

return trampoline(_fact(1, n));

}

The result is beautiful. What seems like the function _fact() being recursively called for a tail-free recursion is almost transparently processed as an iteration!

Finally, let's see how the trampoline() and thunk() functions work with our more meaningful example of tree traversal. The following is a crude example of how a data tree could be traversed using trampolining and thunks:

var treeTraverse = function(trunk) {

var _traverse = thunk(function(node) {

node.doSomething();

node.children.forEach(_traverse);

}

trampoline(_traverse(trunk));

}

We've solved the issue of tail recursion. But is there an even better way? What if we could simply convert the recursive function to a non-recursive function? Up next, we'll look at how to do just that.

The Y-combinator

The Y-combinator is one of those things in computer science that amaze even the deftest of programming masterminds. Its ability to automatically convert recursive functions to non-recursive functions is why Douglas Crockford calls it "one of the most strange and wonderful artifacts of computer science", and Sussman and Steele once said, "That this manages to work is truly remarkable".

So a truly-remarkable, wonderfully strange artifact of computer science that brings recursive functions to their knees must be massive and complex, right? No, not exactly. Its implementation in JavaScript is only nine, very odd, lines of code. They are as follows:

var Y = function(F) {

return (function (f) {

return f(f);

} (function (f) {

return F(function (x) {

return f(f)(x);

});

}));

}

Here's how it works: it finds the "fixed point" of the function passed in as an argument. Fixed points offer another way to think about functions rather than recursion and iteration in the theory of computer programming. And it does this with only the use of anonymous function expressions, function applications, and variable references. Note that Y does not reference itself. In fact, all those functions are anonymous.

As you might have guessed, the Y-combinator came out of lambda calculus. It's actually derived with the help of another combinator called the U-combinator. Combinators are special higher-order functions that only use function application and earlier defined combinators to define a result from its input.

To demonstrate the Y-combinator, we'll again turn to the factorial problem, but we need to define the factorial function a little differently. Instead of writing a recursive function, we write a function that returns a function that is the mathematical definition of factorials. Then we can pass this into the Y-combinator.

var FactorialGen = function(factorial) {

return (function(n) {

if (n == 0) {

// base case

return 1;

}

else {

// recursive case

return n * factorial(n – 1);

}

});

};

Factorial = Y(FactorialGen);

Factorial(10); // 3628800

However, when we give it a significantly large number, the stack overflows just as if tail recursion without trampolining was used.

Factorial(23456); // RangeError: Maximum call stack size exceeded

But we can use trampolining with the Y-combinator as in the following:

var FactorialGen2 = function (factorial) {

return function(n) {

var factorial = thunk(function (x, n) {

if (n == 0) {

return x;

}

else {

return factorial(n * x, n - 1);

}

});

return trampoline(factorial(1, n));

}

};

var Factorial2 = Y(FactorialGen2)

Factorial2(10); // 3628800

Factorial2(23456); // Infinity

We can also rearrange the Y-combinator to perform something called memoization.

Memoization

Memoization is the technique of storing the result of expensive function calls. When the function is later called with the same arguments, the stored result is returned rather than computing the result again.

Although the Y-combinator is much faster than recursion, it is still relatively slow. To speed it up, we can create a memoizing fixed-point combinator: a Y-like combinator that caches the results of intermediate function calls.

var Ymem = function(F, cache) {

if (!cache) {

cache = {} ; // Create a new cache.

}

return function(arg) {

if (cache[arg]) {

// Answer in cache

return cache[arg] ;

}

// else compute the answer

var answer = (F(function(n){

return (Ymem(F,cache))(n);

}))(arg); // Compute the answer.

cache[arg] = answer; // Cache the answer.

return answer;

};

}

So how much faster is it? By using http://jsperf.com/, we can compare the performance.

The following results are with random numbers between 1 and 100. We can see that the memoizing Y-combinator is much, much faster. And adding trampolining to it does not slow it down by much. You can view the results and run the tests yourself at this URL: http://jsperf.com/memoizing-y-combinator-vs-tail-call-optimization/7.

Memoization

The bottom line is: the most efficient and safest method of performing recursion in JavaScript is to use the memoizing Y-combinator with tail-call elimination via trampolining and thunks.

Variable scope

The scope of variables in JavaScript is not natural. In fact, sometimes it's downright counter-intuitive. They say that JavaScript programmers can be judged by how well they understand scope.

Scope resolutions

First, let's go over the different scope resolutions in JavaScript.

JavaScript uses scope chains to establish the scope of variables. When resolving a variable, it starts at the innermost scope and searches outwards.

Global scope

Variables, functions, and objects defined at this level are available to any code in the entire program. This is the outermost scope.

var x = 'hi';

function a() {

console.log(x);

}

a(); // 'hi'

Local scope

Each function described has its own local scope. Any function defined within another function has a nested local scope that is linked to the outer function. Almost always, it's the position in the source that defines the scope.

var x = 'hi';

function a() {

console.log(x);

}

function b() {

var x = 'hello';

console.log(x);

}

b(); // hello

a(); // hi

Local scope is only for functions and not for any expression statements (if, for, while, and so on), which is different from how most languages treat scope.

function c() {

var y = 'greetings';

if (true) {

var y = 'guten tag';

}

console.log(y);

}

function d() {

var y = 'greetings';

function e() {

var y = 'guten tag';

}

console.log(y)

}

c(); // 'guten tag'

d(); // 'greetings'

In functional programming, this isn't as much of a concern because functions are used more often and expression statements less often. For example:

function e(){

var z = 'namaste';

[1,2,3].foreach(function(n) {

var z = 'aloha';

}

isTrue(function(){

var z = 'good morning';

});

console.log(z);

}

e(); // 'namaste'

Object properties

Object properties have their own scope chains as well.

var x = 'hi';

var obj = function(){

this.x = 'hola';

};

var foo = new obj();

console.log(foo.x); // 'hola'

foo.x = 'bonjour';

console.log(foo.x); // 'bonjour'

The object's prototype is further down the scope chain.

obj.prototype.x = 'greetings';

obj.prototype.y = 'konnichi ha';

var bar = new obj();

console.log(bar.x); // still prints 'hola'

console.log(bar.y); // 'konnichi ha'

This isn't even close to being comprehensive, but these three types of scope are enough to get started.

Closures

One problem with this scope structure is that it leaves no room for private variables. Consider the following code snippet:

var name = 'Ford Focus';

var year = '2006';

var millage = 123456;

function getMillage(){

return millage;

}

function updateMillage(n) {

millage = n;

}

These variables and functions are global, which means it would be too easy for code later down the program to accidentally overwrite them. One solution would be to encapsulate them into a function and call that function immediately after defining it.

var car = function(){

var name = 'Ford Focus';

var year = '2006';

var millage = 123456;

function getMillage(){

return Millage;

}

function updateMillage(n) {

millage = n;

}

}();

Nothing is happening outside the function, so we ought to discard the function name by making it anonymous.

(function(){

var name = 'Ford Focus';

var year = '2006';

var millage = 123456;

function getMillage(){

return millage;

}

function updateMillage(n) {

millage = n;

}

})();

To make the functions getValue() and updateMillage() available outside the anonymous function, we'll need to return them in an object literal as shown in the following code snippet:

var car = function(){

var name = 'Ford Focus';

var year = '2006';

var millage = 123456;

return {

getMillage: function(){

return millage;

},

updateMillage: function(n) {

millage = n;

}

}

}();

console.log( car.getMillage() ); // works

console.log( car.updateMillage(n) ); // also works

console.log( car.millage ); // undefined

This gives us pseudo-private variables, but the problems don't stop there. The following section explores more issues with variable scope in JavaScript.

Gotchas

Many variable scope nuances can be found throughout JavaScript. The following is by no means a comprehensive list, but it covers the most common cases:

· The following will output 4, not 'undefined' as one would expect:

for (var n = 4; false; ) { } console.log(n);

This is due to the fact that, in JavaScript, variable definition happens at the beginning of the corresponding scope, not just when it is declared.

· If you define a variable in the outer scope, and then have an if statement define a variable inside the function with the same name, even if that if branch isn't reached, it is redefined. An example:

· var x = 1;

· function foo() {

· if (false) {

· var x = 2;

· }

· return x;

· }

· foo(); // Return value: 'undefined', expected return value:

2

Again, this is caused by moving the variable definition at the beginning of the scope with the undefined value.

· In the browser, global variables are really stored in the window object.

· window.a = 19;

console.log(a); // Output: 19

a in the global scope means a as an attribute of the current context, so a===this.a and window object in a browser act as an equivalent of the this keyword in the global scope.

The first two examples are a result of a feature of JavaScript known as hoisting, which will be a critical concept in the next section about writing functions.

Function declarations versus function expressions versus the function constructor

What is the difference between these three statements?

function foo(n){ return n; }

var foo = function(n){ return n; };

var foo = new Function('n', 'return n');

At first glance, they're merely different ways to write the same function. But there's a little more going on here. And if we're to take full advantage of functions in JavaScript in order to manipulate them into a functional programming style, then we'd better be able to get this right. If there is a better way to do something in computer programming, then that one way should be the only way.

Function declarations

Function declarations, sometimes called function statements, define a function by using the function keyword.

function foo(n) {

return n;

}

Functions that are declared with this syntax are hoisted to the top of the current scope. What this actually means is that, even if the function is defined several lines down, JavaScript knows about it and can use it earlier in the scope. For example, the following will correctly print 6 to the console:

foo(2,3);

function foo(n, m) {

console.log(n*m);

}

Function expressions

Named functions can also be defined as an expression by defining an anonymous function and assigning it to a variable.

var bar = function(n, m) {

console.log(n*m);

};

They are not hoisted like function declarations are. This is because, while function declarations are hoisted, variable declarations are not. For example, this will not work and will throw an error:

bar(2,3);

var bar = function(n, m) {

console.log(n*m);

};

In functional programming, we're going to want to use function expressions so we can treat the functions like variables, making them available to be used as callbacks and arguments to higher-order functions such as map() functions. Defining functions as expressions makes it more obvious that they're variables assigned to a function. Also, if we're going to write functions in one style, we should write all functions in that style for the sake of consistency and clarity.

The function constructor

JavaScript actually has a third way to create functions: with the Function() constructor. Just like function expressions, functions defined with the Function() constructor are not hoisted.

var func = new Function('n','m','return n+m');

func(2,3); // returns 5

But the Function() constructor is not only confusing, it is also highly dangerous. No syntax correction can happen, no optimization is possible. It's far easier, safer, and less confusing to write the same function as follows:

var func = function(n,m){return n+m};

func(2,3); // returns 5

Unpredictable behavior

So the difference is that function declarations are hoisted while function expressions are not. This can cause unexpected things to happen. Consider the following:

function foo() {

return 'hi';

}

console.log(foo());

function foo() {

return 'hello';

}

What's actually printed to the console is hello. This is due to the fact that the second definition of the foo() function is hoisted to the top, making it the definition that is actually used by the JavaScript interpreter.

While at first this may not seem like a critical difference, in functional programming this can cause mayhem. Consider the following code snippet:

if (true) {

function foo(){console.log('one')};

}

else {

function foo(){console.log('two')};

}

foo();

When the foo() function is called, two is printed to the console, not one!

Finally, there is a way to combine both function expressions and declarations. It works as follows:

var foo = function bar(){ console.log('hi'); };

foo(); // 'hi'

bar(); // Error: bar is not defined

It makes very little sense to use this method because the name used in the declaration (the bar() function in the preceding example) is not available outside the function and causes confusion. It would only be appropriate for recursion, for example:

var foo = function factorial(n) {

if (n == 0) {

return 1;

}

else {

return n * factorial(n-1);

}

};

foo(5);


Summary

JavaScript has been called the "assembly language of the web," because it's as ubiquitous and unavoidable as x86 assembly. It's the only language that runs on all browsers. It's also flawed, yet referring to it as a low-level language is missing the mark.

Instead, think of JavaScript as the raw coffee beans of the web. Sure, some of the beans are damaged and a few are rotten. But if the good ones are selected, roasted, and brewed by a skilled barista, the beans can be transformed into a brilliant jamocha that cannot be had just once and forgotten. It's consumption becomes a daily custom, life without it would be static, harder to perform, and much less exciting. Some even prefer to enhance the brew with plug-ins and add-ons such as cream, sugar, and cocoa, which complement it very well.

One of JavaScript's biggest critics, Douglas Crawford, was quoted as saying "There are certainly a lot of people who refuse to consider the possibility that JavaScript got anything right. I used to be one of those guys. But now I continue to be amazed by the brilliance that is in there".

JavaScript turned out to be pretty awesome.