Pure Functions, Referential Transparency, and Immutability - Functional PHP (2017)

Functional PHP (2017)

Chapter 2. Pure Functions, Referential Transparency, and Immutability

Those who have read the appendix about functional programming will have seen that it revolves around pure functions, or in other words, functions that only use their input to produce a result.

It might seem easy to determine whether a function is pure or not. It's just about checking that you don't call any global state, right? Sadly, it's not that simple. There are also multiple ways a function can produce side effects. Some of them are pretty easy to spot; others are more difficult.

This chapter will not cover the benefits of using functional programming. If you are interested in the benefits, I suggest you read the appendix which tackles the subject in depth. However, we will discuss the advantages offered by immutability and referential transparency, as they are quite specific and are glossed over in the appendix.

In this chapter, we will cover the following topics:

· Hidden input and output

· Function purity

· Immutability

· Referential transparency

Two sets of input and output

Let's start with a simple function:

<?php

function add(int $a, int $b): int

{

return $a + $b;

}

The input and output of this function are pretty obvious to spot. We have two parameters and one return value. We can say without doubt that this function is pure.

Parameters and return values are the first set of input and output a function can have. But there's a second set, which is usually more difficult to spot. Have a look at the following two functions:

<?php

function nextMessage(): string

{

return array_pop($_SESSION['message']);

}

// A simple score updating method for a game

function updateScore(Player $player, int $points)

{

$score = $player->getScore();

$player->setScore($score + $points);

}

The first function has no obvious input. However, it's pretty clear that we get some data from the $_SESSION variable to create the output value, so we have one hidden input. We also have a hidden side-effect on the session because the array_pop method removes the message we just got from the list of messages.

The second method has no obvious output. But, updating the score of the player is clearly a side effect. Besides, the $score that we get from the player might be considered as a second input to the function.

In such simple and code examples, the hidden input and output is pretty easy to spot. However, it quickly becomes more difficult, especially in an object-oriented codebase. And make no mistake, anything hidden like that, even in the most obvious way, can have consequences such as:

· Increasing the cognitive burden. Now you have to think about what happens in the Session or Player classes.

· Test results can vary for identical input parameters, as some other state of the software has changed, leading to difficult to understand behaviors.

· The function signature, or API, is not clear about what you can expect from the functions, making it necessary to read the code or documentation for them.

The problem with those two simple looking functions is that they need to read and update the existing state of your program. It is not yet the topic of this chapter to show you ways to write them better, we will look at that in Chapter 6, Real-life monads.

For readers accustomed to Dependency Injection, the first function is using a static call and it can be avoided by injecting an instance of the Session variable. Doing so will solve the hidden input issue, but the fact that we modify the state of the $_SESSION variable remains as a side-effect.

The remainder of this chapter will try to teach you how to spot impure functions and why they are important, both for functional programming and code quality in general

For the rest of this book, we will use the terms side cause for hidden inputs, and side effects for hidden output. This dichotomy is not always used, but I think it helps with being able to describe with more accuracy when we are speaking about a hidden dependency or a hidden output of the code we will discuss.

Although a broader concept, available functional literature might use the term free variable to refer to side causes. A Wikipedia article about the topic states the following:

In computer programming, the term free variable refers to variables used in a function that are not local variables nor parameters of that function. The term non-local variable is often a synonym in this context.

Given this definition, variables passed using the use keyword to a PHP closure could be called a free variable; this is why I prefer using the term side cause to clearly separate the two.

Pure functions

Let's say you have the function signature function getCurrentTvProgram (Channel $channel). Without any indication of the purity of the function, you have no idea of the complexity that may be hidden behind such a function.

You will probably get the program that is actually playing on the given channel. What you don't know is whether the function checked if you are logged into the system. Maybe there's some kind of database update for analytic purposes. Maybe the function will return an exception because the log file is in a read-only state. You cannot know for sure, and all of those are side causes or side effects.

Regarding all the complexity associated with those hidden dependencies, you are faced with three options:

· Dive deep down into the documentation or code to understand all that is happening

· Make the dependencies apparent

· Do nothing and pray for the best

The last option is clearly better in the really short term, but you might get bitten real hard. The first option might seem better, but what about your colleague who will also need to use this function somewhere else in the application, will they need to follow the same path as you?

The second option is probably the most difficult one, and it will require tremendous effort in the beginning because we are not at all accustomed to doing it this way. But the benefits will start to pile up as soon as you have finished. And it gets a lot easier with experience.

What about encapsulation?

Encapsulation is about hiding implementation detail. Purity is about making hidden dependencies known. Both are useful, good practices and they aren't in any kind of conflict. You can achieve both if you are careful enough and this is usually what functional programmers strive for. They like clean, simple solutions.

To explain this in simple terms:

· Encapsulation is about hiding internal implementation

· Avoiding side causes is about making external inputs known

· Avoiding side effects is about making external changes known

Spotting side causes

Let's get back to our getCurrentTvProgram function. The implementation that follows isn't pure, can you spot why?

To help you a bit, I will tell you that what we've learned so far about pure functions implies that they always return the same result when called with the same arguments:

<?php

function getCurrentTvProgram(Channel $channel ): string

{

// let's assume that getProgramAt is a pure method.

return $channel->getProgramAt(time());

}

Got it? Our suspect is the call to the time() method. Because of it, if you call the function at a different time, you will get different results. Let's fix this:

<?php

functiongetTvProgram(Channel $channel, int $when): string

{

return $channel->getProgramAt($when);

}

Not only is our function now pure, which is clearly an achievement in itself, we have just gained two benefits:

· We can now get the program for any time of the day as implied by the name change

· The function can now be tested without having to use some kind of magic trick to change the current time

Let's quickly look at some other examples of side causes. Try to spot the issue yourself as an exercise before reading:

<?php

$counter = 0;

function increment()

{

global $counter;

return ++$counter;

}

function increment2()

{

static $counter = 0;

return ++$counter;

}

function get_administrators(EntityManager $em)

{

// Let's assume $em is a Doctrine EntityManager allowing

// to perform DB queries

return $em->createQueryBuilder()

->select('u')

->from('User', 'u')

->where('u.admin = 1')

->getQuery()->getArrayResult();

}

function get_roles(User $u)

{

return array_merge($u->getRoles(), $u->getGroup()->getRoles());

}

The use of the global keyword makes it pretty obvious that the first function uses some variable from the global scope, thus making the function impure. The key takeaway from this example is that PHP scoping rules work to our advantage. Any function where you can spot this keyword is most probably impure.

The static keyword in the second example is a good indicator that we might try to store a state between function calls. In this example, it is a counter that is incremented at each run. The function is clearly impure. However, contrary to the global variable, the use of the static keyword might only be a way to cache data between calls, so you will have to check why it is used before drawing a conclusion.

The third function is without a doubt impure because some database access is made. You might ask yourself how to get data from a database or the user if you are only allowed pure functions. The sixth chapter will dig deeper into this subject if you want to write purely functional code. If you can't or won't be functional all the way, I suggest you regroup your impure calls as much as possible and then try to call only pure functions from there to limit the place where you have side cause and side effects.

Concerning the fourth function, you cannot tell if it is pure just by looking at it. You will have to look at the code of the methods that are called. This is what you will encounter in most cases, a function calling other functions and methods, which you will also have to read in order to determine purity.

Spotting side effects

Usually a spotting side effects is a bit easier than spotting side causes. Anytime you change a value that will have visible effects on the outside, or call another function in doing so, you are creating a side effect.

If we go back to our two increment functions previously defined, what would you say about them? Consider the following code:

<?php

$counter = 0;

function increment()

{

global $counter;

return ++$counter;

}

function increment2()

{

static $counter = 0;

return ++$counter;

}

The first one clearly has a side effect on the global variable. But what about the second version? The variable itself is not accessible from the outside, so could we consider that the function is free of side-effects? The answer is no. Since the change implies that a following call to the function will return another value, this also qualifies as a side effect.

Let's look at some functions to see if you can spot the side effects:

<?php

function set_administrator(EntityManager $em, User $u)

{

$em->createQueryBuilder()

->update('models\User', 'u')

->set('u.admin', 1)

->where('u.id = ?1')

->setParameter(1, $u->id)

->getQuery()->execute();

}

function log_message($message)

{

echo $message."\n";

}

function updatePlayers(Player $winner, Player $loser, int $score)

{

$winner->updateScore($score);

$loser->updateScore(-$score);

}

The first function obviously has a side effect because we update a value in the database.

The second method prints something to the screen. Usually this is considered a side effect because the function has an effect on something outside its scope, in our case, the screen.

Finally, the last function probably has side effects. This is a good, educated guess based on the name of the methods. But we can't say for sure until we've seen the code of the methods to verify it. As when spotting side causes, you will often have to dig a little deeper than just the one function in order to ascertain if it causes side effects or not.

What about object methods?

In a purely functional language, as soon as you need to change a value inside an object, an array or any kind of collection, you will in fact return a copy with the new value. This means any method, such as the updateScore method, for example, will not modify an inner property of the object, but will return a new instance with the new score.

This may not seem practical at all, and given the possibilities offered by PHP out of the box, I agree with you. However, we will see that there are some functional techniques that really help to manage this.

Another option would be to decide that the instance is not the same value after a change. In a way, this is already the case with PHP. Consider the following example:

<?php

class Test

{

private $value;

public function __construct($v)

{

$this->set($v);

}

public function set($v) {

$this->value = $v;

}

}

function compare($a, $b)

{

echo ($a == $b ? 'identical' : 'different')."\n";

}

$a = new Test(2);

$b = new Test(2);

compare($a, $b);

// identical

$b->set(10);

compare($a, $b);

// different

$c = clone $a;

$c->set(5);

compare($a, $c);

When doing a simple equality comparison between two objects, PHP considers the inner value and not the instances themselves to make the comparison. It's important to note that as soon as you use a strict comparison (such as, using the === operator), PHP verifies that both variables hold the same instance, returning the 'different' string in all three cases.

However, this is incompatible with the idea of referential transparency, which we will discuss later in this chapter.

Closing words

As we tried to show in the preceding examples, trying to determine if a function is pure or not can be tricky in the beginning. But as you start to get a feel for it, you will be a lot quicker and comfortable.

The best course of action to check whether a function is pure is to verify the following :

· The use of the global keyword is a dead give away

· Check if you use any value that is not a parameter of the function itself

· Verify that all functions called by yours are also pure

· Any access to outside storage is impure (database and files)

· Pay special attention to functions whose return value depends on an outside state (time, random)

Now that you know how to spot those impure functions, you might be wondering how to make them pure. There is no easy answer to this request sadly. The following chapters will try to give recipes and patterns to avoid impurity.

Immutability

We say a variable is immutable if, once it has been assigned a value, you cannot change its content. After function purity, this is the second most important thing about functional programming.

In some academic languages such as Haskell, you cannot declare variables at all. Everything has to be a function. Since all those functions are also pure, this means you have immutability for free. Some of these languages offers some kind of syntactic sugar that resembles variable declaration to avoid the potential tediousness of always declaring functions.

Most functional languages let you only declare immutable variables or constructs that serves the same purpose. This means you have a way of storing values but it is impossible to change the value after the initial assignment. There are also languages that let you choose what you want for each variable. In Scala, for example, you have the var keyword to declare traditional mutable variables and the val keyword to declare immutable variables.

Most languages however, as is the case for PHP, have no notion of immutability for variables.

Why does immutability matter?

First of all, it helps to reduce cognitive burden. It's already quite hard to keep in mind all the variables involved in an algorithm. Without immutability you also need to remember all value changes. It's a lot easier for the human mind to associate a value to a particular label (that is the variable name). If you can be sure that the value won't change, it will be a lot easier to reason about what is happening.

Also, if you have some global state you can't get rid of, as long as it is immutable, you can just note the values on a piece of paper near you and keep it for reference. Whatever happens during execution, what is written will always be the current state of the program, meaning you don't have to fire up a debugger or echo the variable to ensure that the value has not changed.

Imagine that you pass an object to a function. You don't know whether the function is pure or not, meaning the object properties could be changed. This introduces worry in your mind and distracts you from your line of thought. The fact that you have to ask yourself if an internal state has changed, reduces your ability to reason about your code. If your object is immutable, you can be 100% assured that it is exactly as it was before, speeding up your understanding of what is happening.

You also have advantages linked to thread safety and parallelization. If all your state is immutable, is is much easier to ensure your program will be able to run on multiple cores or computers at the same time. Most concurrency issues happens because some thread modified a value without correctly synchronizing with other threads. This leads to inconsistency between them, and often, error in computations. If your variables are immutable, as long as all threads were sent the same data, this scenario is a lot less likely to happen. This is however not really useful as PHP is primarily used in non-threaded scenarios.

Data sharing

Another benefit of immutability is that when it is enforced by the language itself, the compiler can perform an optimization called data sharing. Since PHP does not support this yet, I will only present it in a few words.

Data sharing is the process of sharing a common memory location for multiple variables containing the same data. This allows for smaller memory footprints, and "copying" data from one variable to another with almost no cost at all.

For example, imagine the following piece of code:

<?php

//let's assume we have some big array of data

$array= ['one', 'two', 'three', '...'];

$filtered = array_filter($array, function($i) { /* [...] */ });

$beginning = array_slice($array, 0, 10);

$final = array_map(function($i) { /* [...] */ }, $array);

In PHP, each new variable will be a new copy of the data. Meaning we have a memory and time cost that could become problematic the bigger our array is.

A functional language might, using clever techniques, only store the data once in memory and then describe using another mean which part of the data each variable contains. This will still require some computation, but with big structures you will gain a lot of memory and time.

Such optimizations are also implementable in non-immutable languages. But it's often not done because you have to keep track of each write access to each variable to ensure data coherence. The implied complexity for the compiler is thought to outweigh the benefits of such an approach.

However, the time and memory penalty is not big enough in PHP to warrant avoiding using immutability. PHP has a pretty good garbage collector, meaning the memory is cleaned up pretty efficiently when an object is not used anymore. Also we often work with relatively small data structures, meaning the creation of nearly identical data is quite fast.

Using constants

You could use constants and class constants to have some kind of immutability, but they work only for scalar values. You currently have no way to use them for objects or more complex data structures. Since it's the only available option out-of-the-box, let's have a look anyway.

You can declare globally available constants containing any scalar value. Beginning with PHP 5.6, you can also store an array of scalar values inside constants when using the const keyword and, since PHP 7, it also works with the define syntax.

Constant names must start with a letter or an underscore, not a number. Usually, constants are in full caps so they can be easily spotted. It is also discouraged to begin with an underscore as it may collide with any constant defined by the PHP core:

<?php

define('FOO', 'something');

const BAR=42;

//this only works since PHP 5.6

const BAZ = ['one', 'two', 'three'];

// the 'define' syntax for array work since PHP 7

define('BAZ7', ['one', 'two', 'three']);

// names starting and ending with underscores are discouraged

define('__FOO__', 'possible clash');

You can use the result of a function to populate the constant. This is possible only when using the defined syntax, however. If you use the const keyword you must use a scalar value directly:

<?php

define('UPPERCASE', strtoupper('Hello World !'));

If you try to access a constant that does not exist, PHP will assume that you are in fact trying to use the value as a string:

<?php

echo UPPERCASE;

//display 'HELLO WORLD !'

echo I_DONT_EXISTS;

//PHPNotice: Use of undefined constant

I_DONT_EXISTS

//- assumed'I_DONT_EXISTS'

//display 'I_DONT_EXISTS'anyway

This can be pretty misleading, as the assumed string will evaluate to true, potentially breaking your code if you expected your constant to hold a false value.

If you want to avoid this pitfall, you can use the defined or constant function. Sadly, this will add a lot of verbosity:

<?php

echo constant('UPPERCASE');

// display 'HELLO WORLD !'

echo defined('UPPERCASE') ? 'true' : 'false';

// display 'true'

echo constant('I_DONT_EXISTS');

// PHP Warning: constant(): Couldn't find constant I_DONT_EXISTS

// display nothings as 'constant' returns 'null' in this case

echo defined('I_DONT_EXISTS') ? 'true' : 'false';

// display 'false'

PHP also allows you to declare constants inside of classes:

<?php

class A

{

const FOO='some value';

public static function bar()

{

echo self::FOO;

}

}

echo A::FOO;

// display 'some value'

echo constant('A::FOO');

// display 'some value'

echo defined('A::FOO') ? 'true' : 'false';

// display 'true'

A::bar();

// display 'some value'

Sadly, you can only use scalar values directly when doing so; there is no way to use the return value of a function, as is the case with the define keyword:

<?php

class A

{

const FOO=uppercase('Hello World !');

}

// This will generate an error when parsing the file :

// PHP Fatal error: Constant expression contains invalid operations

However, beginning with PHP 5.6, you can use any scalar expression or previously declared constants with the const keyword:

<?php

const FOO=6;

class B

{

const BAR=FOO*7;

const BAZ="The answer is ": self::BAR;

}

There is also one other fundamental difference between constants and variables besides their immutability. The usual scoping rule does not apply. You can use a constant anywhere in your code as soon as it is declared:

<?php

const FOO='foo';

$bar='bar';

function test()

{

// here FOO is accessible

echo FOO;

// however, if you want to access $bar, you have to use

// the 'global' keyword.

global $bar;

echo $bar;

}

At the time of writing, PHP 7.1 is still in the beta phase. The release is planned at the end of fall 2016. This new version will introduce class constants visibility modifiers:

<?php

class A

{

public const FOO='public const';

protected const BAR='protected const';

private const BAZ='private const';

}

// public constants are accessible as always

echo A::FOO;

// this will however generate an error

echo A::BAZ;

// PHP Fatal error: Uncaught Error: Cannot access private const A::BAR

A final word of warning. Although they are immutable, constants are global, and this makes them a state of your application. Any function using a constant is de facto impure, so you should use them with caution.

An RFC is on its way

As we just saw, constants, are at best, a wooden leg when it comes to immutability. They're quite alright to store simple information like the number of items we want displayed per page. But as soon as you want to have some complex data structures you will be stuck.

Fortunately, members of the PHP core team are well aware that immutability is important and there is currently some work being done on an RFC to include it at the language level (https://wiki.php.net/rfc/immutability).

For those not privy to the process involved for new PHP features, a Request for Comment (RFC), is a proposition from on the core team members to add something new to PHP. The proposition first gets through a draft phase, where it is written and some example implementation is done. Afterwards, there is a discussion phase where other people can give advice and recommendation. Finally, a vote occurs to decide whether the feature will be included in the next PHP version.

At the time of writing, the Immutable classes and properties RFC is still in draft phase. There was no real argument either for or against it. Only time will tell if it is accepted or not.

Value objects

From https://en.wikipedia.org/wiki/Value_object:

In computer science, a value object is a small object that represents a simple entity whose equality is not based on identity: i.e. two value objects are equal when they have the same value, not necessarily being the same object.

[...]

Value objects should be immutable: this is required for the implicit contract that two value objects created equal, should remain equal. It is also useful for value objects to be immutable, as client code cannot put the value object in an invalid state or introduce buggy behavior after instantiation.

Since there is no mean to obtain real immutability in PHP, it is often achieved by having private properties and no setter on the class. Thus forcing the developer to create a new object when they want to modify a value. The class can also provide utility methods to ease the creation of new objects. Let's look at a short example:

<?php

class Message

{

private $message;

private $status;

public function __construct(string $message, string $status)

{

$this->status = $status;

$this->message = $message;

}

public function getMessage()

{

return $this->message;

}

public function getStatus()

{

return $this->status;

}

public function equals($m)

{

return $m->status === $this->status &&

$m->message === $this->message;

}

public function withStatus($status): Message

{

$new = clone $this;

$new->status = $status;

return $new;

}

}

This kind of pattern can be used to create data entities that are immutable from the point of view of the data consumer. However, you will have to take special care to guarantee that all the methods on the class do not break the immutability; otherwise all your efforts will be moot.

Besides immutability, using value objects has other benefits as well. You can add some business or domain logic inside the object, thus keeping everything related in the same place. Also, if you use them instead of arrays, you can:

· Use them as type hint instead of simply array

· Avoid any possible error due to a misspelled array key

· Enforce the presence or format of some items

· Provide methods that format your values for different context

A common use of value objects is to store and manipulate money related data. You can have a look at http://money.rtfd.org which is a great example of how to efficiently use them.

Another use of value objects for a really important piece of code is the PSR-7: "HTTP message interfaces". This standard introduced and formalized a way for frameworks and applications to manage HTTP requests and responses in an inter-operable way. All major frameworks either have core support or plugins available. I invite you to read their full rationale as to why you should use immutability for such an important part of the PHP ecosystem: http://www.php-fig.org/psr/psr-7/meta/#why-value-objects.

In essence, modeling HTTP messages as value objects ensures the integrity of the message state, and prevents the need for bi-directional dependencies, which can often go out of sync or lead to debugging or performance issues.

All in all, value objects are a good way to obtain some kind of immutability in PHP. You don't get all the benefits, especially those related to performance, but most of the cognitive burden is removed. Going further on this topic is out of the scope of this book; if you want to learn more, there is a dedicated website: http://www.phpvalueobjects.info/.

Libraries for immutable collections

If you want to go further down the path of immutability, there are at least two libraries that offer immutable collections: Laravel Collections and immutable.php.

Both these libraries harmonize the discrepancies regarding the parameters order for array-related PHP functions such as array_map and array_filter. They also provide the possibilities to work with any kind of Iterable or Traversable; easily contrary to most PHP functions which require a real array.

This chapter will only present the libraries quickly. Example of usage will be given in Chapter 3, Functional Basis in PHP so that they can be shown alongside other libraries that allow the performance of the same task. Also, we haven't yet covered in detail techniques such as mapping or folding, so examples might not be as clear as possible right now.

Laravel Collection

The Laravel framework contains a class called Collection to supersede PHP arrays. This class uses a simple array internally, but it can be created from any collection like variable using the collect helper function. It then proposes a lot of really useful methods to work with the data, mostly in a functional way. This is also a central part of Laravel, as Eloquent, the ORM, returns database entities as Collectioninstances.

If you are not using Laravel, but still want to benefit from this great library, you can use https://github.com/tightenco/collect, which is only the Collection part separated from the rest of the Laravel support package in order to remain small. You can also refer to the official documentation of the Laravel collection (https://laravel.com/docs/5.3/collections).

Immutable.php

This library defines the ImmArray class, which implements an immutable array like collection.

The ImmArray class is a wrapper around the SplFixedArray class to fix some of the shortcomings of its API by providing methods for performance operation that you usually want to perform on collections. The advantage of using the SplFixedArray class behind the scenes is that the implementation is written in C and is really performant and memory efficient. You can refer to the GitHub repository for more insight on Immutable.php at https://github.com/jkoudys/immutable.php.

Referential transparency

An expression is said to be referentially transparent if you can substitute it by its output at any time without changing the behavior of your program. In order to do that for all expressions of your code base, all your functions have to be pure and all your variables have to be immutable.

What do we gain from referential transparency? Once again, it helps a lot with reducing cognitive burden. Let's imagine we have the following functions and data:

<?php

// The Player implementation is voluntarily simple for brevity.

// Obviously you would use immutable.php in a real project.

class Player

{

public $hp;

public $x;

public $y;

public function __construct(int $x, int $y, int $hp) {

$this->x = $x;

$this->y = $y;

$this->hp = $hp;

}

}

function isCloseEnough(Player $one, Player $two): boolean

{

return abs($one->x - $two->x) < 2 &&

abs($one->y - $two->y) < 2;

}

function loseHitpoint(Player $p): Player

{

return new Player($p->x, $p->y, $p->hp - 1);

}

function hit(Player $p, Player $target): Player

{

return isCloseEnough($p, $target) ?

loseHitpoint($target) :

$target;

}

Now let's simulate a really simple brawl between two people:

<?php

$john=newPlayer(8, 8, 10);

$ted =newPlayer(7, 9, 10);

$ted=hit($john, $ted);

All functions defined above are pure, and since we don't have mutable data structures, they are also referentially transparent by extension. Now, in order to better understand our piece of code, we can use a technique called equational reasoning. The idea is pretty simple, you simply substitute equals for equals to reason about code. In a way, it is like evaluating the code manually.

Let's start by inlining our isCloseEnough function. Doing so, our hit function can be transformed as such:

<?php

return abs($p->x - $target->x) < 2 && abs($p->y - $target->y) < 2 ?

loseHitpoint($target) :

$target;

Our data being immutable, we can now simply use the values as the following:

<?php

return abs(8 - 7) < 2 && abs(8 - 8) < 2 ?

loseHitpoint($target) :

$target;

Let's do some math:

<?php

return 1<2 && 0<2 ?

loseHitpoint($target) :

$target;

The condition clearly evaluates to true so we can keep only the right branch:

<?php

return loseHitpoint($target);

Let's keep at it with the remaining function call:

<?php

return newPlayer($target->x, $target->y, $target->hp-1);

Once again, we replace the values:

<?php

return newPlayer(8, 7, 10-1);

Finally, our initial function call becomes:

<?php

$ted = newPlayer(8, 7, 9);

By using the fact that you can replace a referentially transparent expression with its resulting value, we were able to reduce a relatively lengthy piece of code with multiple function calls to a simple object creation.

This ability applied to refactoring or understanding code is very useful. If you have trouble understanding some code and you know some part of it is pure, you can simply replace it with the result while you are trying to understand it. This will probably help you get to the heart of the matter.

Non-strictness or lazy evaluation

One of the great benefits of referential transparency is the possibility for a compiler or parser to evaluate values lazily. For example, Haskell allows you to have an infinite list defined by a mathematical function. The lazy nature of the language ensures that values of the list will be computed only when you need the value.

In the glossary, we defined non-strict languages as languages where evaluation happens lazily. In fact, there's a slight difference between laziness and non-strictness. If you are interested in the details, you can head to https://wiki.haskell.org/Lazy_vs._non-strict and read about it. For the purpose of this book, we will use those terms interchangeably.

You might ask yourself how this can be useful. Let's gloss over use cases.

Performance

By using lazy evaluation, you ensure that only the values that are needed are effectively computed. Let's have a look at a short and naive example to illustrate this benefit:

<?php

function wait(int $value): int

{

// let's imagine this is a function taking a while

// to compute a value

sleep(10);

return $value;

}

function do_something(bool $a, int $b, int $c): int

{

if($a) {

return $b;

} else {

return $c;

}

}

do_something(true, sleep(10), sleep(8));

Since PHP does not perform lazy evaluation on function parameters, when calling do_something you will first have to wait two times 10 seconds before even starting to execute the function. If PHP were a non-strict language, only the value we need would have been computed, thus dividing by two the time needed. It gets even better, since the return value isn't even saved in a new variable, it might be possible to not execute the function at all.

There is one case where PHP performs a kind of lazy evaluation: Boolean operator short-circuits. When you have a series of Boolean operations, as soon as PHP can be certain of the outcome, it will stop the execution:

<?php

// 'wait' will never get called as those operators are short- circuited

$a= (false && sleep(10));

$b = (true || sleep(10));

$c = (false and sleep(10));

$d = (true or sleep(10));

We could rewrite our previous example to take advantage of that. But as you can see in the following example, it is at the expense of readability. Also, our example was really simple, not something you would encounter in real-life application code. Imagine doing the same for some complex function with multiple possible branches? This is shown in the following snippet:

<?php

($a && sleep(10)) || sleep(8);

There are also two bigger issues with the previous code:

· If, for any reason, the first call to sleep returns a false value, the second call will also be executed

· The return value of your methods will automatically be cast to Boolean

Code readability

When your variable and function evaluation are lazy, you can spend less time considering which is the best order of declaration, or even whether the data you are computing will be used at all. Instead, you can concentrate on writing readable code. Imagine a blogging application with lots of posts, tags, categories, and archived by year. Would you rather have to write custom queries for each page, or use lazy evaluation, demonstrated as follows:

<?php

// let's imagine $blogs is a lazily evaluated collection

// containing all the blog posts of your application order by date

$posts = [ /* ... */ ];

// last 10 posts for the homepage

return $posts->reverse()->take(10);

// posts with tag 'functional php'

return $posts->filter(function($b) {

return $b->tags->contains('functional-php');

})->all();

// title of the first post from 2014 in the category 'life'

return $posts->filter(function($b) {

return $b->year == 2014;

})->filter(function($b) {

return $b->category == 'life';

})->pluck('title')->first();

To be clear, this code would probably work just fine if we loaded all posts into $posts, but the performance would be pretty bad. However, if we had lazy evaluation and an ORM powerful enough, the database queries could be delayed to the last moment. At that time, we would know exactly the data we need and the SQL will be tailored for this exact page automatically, leaving us with easy to read code and great performance.

As far as I can tell, this idea is purely hypothetical. I am not currently aware of any ORM powerful enough, even in the most functional languages, to attain this level of laziness. But wouldn't it be great if it were?

If you are wondering about the syntax used in the example, it is inspired by the API of the Laravel's Collection we were discussing earlier.

Infinite lists or streams

Lazy evaluation allows you to create infinite lists. In Haskell, to get the list of all positive integers, you can simply do [1..]. Then, if you want the first ten numbers, you can take 10 [1..]. I admit this example isn't very exciting, but more complicated ones are more difficult to understand.

PHP supports generators since version 5.5. You can achieve something akin to infinite lists by using them. For example, our list of all positive integers is as follows:

<?php

function integers()

{

$i=0;

while(true) yield $i++;

}

However, there is at least one notable difference between the lazy infinite list and our generator. You can perform any operation you would normally perform on collections with the Haskell version-computing its length and sorting it, for example. Whereas our generator is an Iterator and if you try to use say iterator_to_array on it there's a good chance that your PHP process will hang until you run out of memory.

How can you compute the length of an infinite list or sort it you ask me? It is in fact pretty simple; Haskell will only compute list values until it has enough to perform its computation. Say we have the the condition count($list) < 10 in PHP, even if you have an infinite list, Haskell will stop counting items as soon as you reach 10 because it will have an answer for the comparison at that time.

Code optimization

Have a look at the next piece of code and try deciding which is faster:

<?php

$array= [1, 2, 3, 4, 5, 6 /* ... */];

// version 1

for($i = 0; $i < count($array); ++$i) {

// do something with the array values

}

// version 2

$length = count($array);

for($i = 0; $i < $length; ++$i) {

// do something with the array values

}

Version 2 should be a lot faster. Because you only compute the length of the array once, whereas in version 1, PHP has to compute the length each time it verifies the condition for the for loop. This example is pretty simple, but there are some cases where such a pattern is harder to spot. If you have referential transparency, this does not matter. The compiler can perform this optimization on its own. Any referentially transparent computation can be moved around without changing the result of the program. This is possible because we have the guarantee that the execution of each function does not depend on a global state. Thus, moving computation around to achieve better performance is possible without changing the outcome.

Another possible improvement is performing common sub expression elimination or CSE. Not only can the compiler move part of the code more freely, it can also transform some operations that share a common computation to use an intermediary value instead. Imagine the following code:

<?php

$a= $foo * $bar + $u;

$b = $foo * $bar * $v;

If computing $foo * $bar has a big cost, the compiler could decide to transform it by using an intermediary value:

<?php

$tmp= $foo * $bar;

$a = $tmp + $u;

$b = $tmp * $v;

Again, this is quite a simple example. This kind of optimization could be performed on the whole span of the code base.

Memoization

Memoization is a technique where you cache the results of a function for a given set of parameters so that you don't have to perform it again on the next call. We will see this in detail in Chapter 8, Performance Efficiency. For now, let me just say that if your language only possesses referentially transparent expressions, it can perform memoization automatically when needed.

This means it can decide, based on the frequency of calls and various other parameters, whether it's worth memoizing a function automatically without any intervention or hint from the developer.

PHP in all that?

Why bother with pure functions, immutability and, ultimately, referential transparency if PHP developers can only benefit from a small number of its advantages?

First of all, as with the RFC for immutability, things are going in the right direction. This means that, eventually, the PHP engine will start to incorporate some of those advanced compiler techniques. When this happens, if your codebase already uses those functional techniques, you will have a huge performance boost.

Secondly, in my opinion, the major benefit of all that is the reduced cognitive burden. Sure, it takes some time to get used to this new style of programming. But once you have practiced a bit, you will quickly discover that your code is easier to read and reason about. The corollary being that your application will contain less bugs.

Finally, if you are willing to use some external libraries, or if you can cope with syntax that are not always well polished, you can already benefit from other improvements right now. Obviously, we won't be able to change the core of PHP to add the compiler optimization we were talking about earlier, but we will see in the following chapters how some of the benefits of referential transparency can be emulated.

Summary

This chapter contained a lot of theory. I hope you didn't mind too much. It was necessary to lay the foundation that will allow us to share a common vocabulary and also explain why the concepts are important. You are now well aware of what purity and immutability are and you learned some tricks to spot impure functions. We also discussed how those two properties lead to something called referential transparency what the benefits are.

We also learned that, sadly, PHP does not support most of the benefits natively. However, the key takeaway is that using a functional approach reduces the cognitive burden of understanding your code, thus making it easier to read. The net benefit being that now your code will be easier to maintain and refactor and you can find and fix bugs quickly. Usually, pure functions are also easier to test, which also results in fewer bugs.

Now that we have the theoretical basis well discussed, the next chapter will focus on techniques that will help us achieve purity and immutability in our software.