Performance Efficiency - Functional PHP (2017)

Functional PHP (2017)

Chapter 9. Performance Efficiency

Now that we have covered various techniques related to functional programming, it is time to analyze how it impacts performance in a language such as PHP, which is still imperative at its core even if there are more and more functional features introduced with each version.

We will also discuss why performance does not matter so much in the end and how we can leverage memoization and other techniques to alleviate this issue in some cases.

We will also explore two optimization techniques enabled by referential transparency. The first one will be memoization, which is a type of caching. We will also speak about running long computations in parallel and how you can take advantage of this in PHP.

In this chapter, we will cover the following topics:

· Performance impact of functional programming

· Memoization

· Parallelization of computation

Performance impact

Since there is no core support for features such as currying and function composition, they need to be emulated using anonymous wrapper functions. Obviously, this comes with a performance cost. Also, as we have already discussed in the part about tail-call recursion in Chapter 7, Functional Techniques and Topics, using a trampoline is also slower. But how much execution time do you lose compared to a more traditional approach?

Let's create a few functions that will serve as a benchmark and test the various speeds we can achieve. The function will execute a really simple task, adding two numbers, in order to ensure we measure the overhead as effectively as possible:

<?php

use Functional as f;

function add($a, $b)

{

return $a + $b;

}

function manualCurryAdd($a, $b = null) {

$func = function($b) use($a) {

return $a + $b;

};

return func_num_args() > 1 ? $func($b) : $func;

}

$curryiedAdd = f\curry('add');

function add2($b)

{

return $b + 2;

}

function add4($b)

{

return $b + 4;

}

$composedAdd4 = f\compose('add2', 'add2');

$composerCurryedAdd = f\compose($curryiedAdd(2), $curryiedAdd(2));

We created the first function add and curryied it; this will be our first benchmark. We will then compare a specialized function adding 4 to a value to two different compositions. The first will be the composition of two specialized functions and the second the composition of two curryied versions of the add function.

We will use the following code to benchmark our functions. It is pretty basic but it should suffice to demonstrate any meaningful differences:

<?php

use Oefenweb\Statistics\Statistics;

function benchmark($function, $params, $expected)

{

$iteration = 10;

$computation = 2000000;

$times = array_map(function() use($computation, $function, $params, $expected) {

$start = microtime(true);

array_reduce(range(0, $computation), function($expected) use ($function, $params) {

if(($res = call_user_func_array($function, $params)) !== $expected) {

throw new RuntimeException("Faulty computation");

}

return $expected;

}, $expected);

return microtime(true) - $start;

}, range(0, $iteration));

echo sprintf("mean: %02.3f seconds\n", Statistics::mean($times));

echo sprintf("std: %02.3f seconds\n", Statistics::standardDeviation($times)); }

The statistics methods are from the oefenweb/statistics package available via composer. We also check that the returned value is the one we expect as an extra precaution. We will run each function 2 million times 10 times in a row and display the mean time for the 2 million runs.

Let's run the benchmark for currying first. The displayed results are for PHP 7.0.12. When trying this with PHP 5.6, all benchmarks are slower but they exhibit the same differences between the various functions:

<?php

benchmark('add', [21, 33], 54);

// mean: 0.447 seconds

// std: 0.015 seconds

benchmark('manualCurryAdd', [21, 33], 54);

// mean: 1.210 seconds

// std: 0.016 seconds

benchmark($curryiedAdd, [21, 33], 54);

// mean: 1.476 seconds

// std: 0.007 seconds

Obviously, the results will vary depending on the system the test is run on, but the relative difference should stay roughly the same.

First, if we look at the standard deviation, we can see that each of the 10 runs took mostly the same time, which indicates that we can trust our numbers to be a good indicator of the performances.

We can see the curryied version is definitively slower. Manual currying is a bit more efficient, but both curryied versions were mostly three times slower than the simple function version. Before drawing conclusions, let's see the results for the composed functions:

<?php

benchmark('add4', [10], 14);

// mean: 0.434 seconds

// std: 0.001 seconds

benchmark($composedAdd4, [10], 14);

// mean: 1.362 seconds

// std: 0.005 seconds

benchmark($composerCurryedAdd, [10], 14);

// mean: 3.555 seconds

// std: 0.018 seconds

Again, the standard deviation is small enough so that we can consider the numbers valid.

Concerning the values themselves, we can see that the composition is also about three times slower and the composition of the curryied function is, without much surprise, nine times slower.

Now if we take our worst case at 3.55 seconds against our best case at 0.434 seconds, this means we have an overhead of 3 seconds when using composition and currying. Does it matter? Does it seem like a lot of lost time? Let's try to imagine these numbers in the context of a web application.

Does the overhead matter?

We performed two million executions of our methods and it accounted for three seconds. A recent project I worked on, an e-commerce application for a luxury brand available in 26 countries and more than 10 languages and written entirely from scratch without the help of any framework, had more or less 25,000 function calls to render one page.

Even if we admit that all of those calls are made to composed functions that were curryied beforehand, this means that the overhead is now around 40 milliseconds in the worst case scenario. The application in question took roughly 180 milliseconds to display a page, so we are speaking of a 20-25% decrease in performance.

It is still a lot, but far from the three times slower figure we were seeing before. The overhead linked to the functional techniques will grow linearly with each function call. In the benchmark it seems great because the performed computation is trivial. In a real-life application, you have external bottlenecks such as databases, third-party APIs, or filesystems. You also have functions performing complex computations taking more time than a simple addition. In such a case the introduced overhead is a smaller part of the total execution time of your application.

This is also a worst-case scenario where we assume everything is composed and curryied. In a real-world application, you might use a traditional framework containing functions and methods without overhead. You will also be able to identify hot paths in your code and manually optimize them using explicit currying and composition instead of the helper functions. It is also not necessary to curry everything; you will have functions with only one argument that won't need it and some functions for which it makes no sense to use currying.

Also, those are numbers to be considered for an application with a cold cache. Any mechanism you already have in place to reduce the rendering time of you pages will continue to work the same. For example, if you have a Varnish instance running, your pages will probably continue to be served at the same speed.

Let's not forget

We compared a really small function to composition and currying. A modern PHP codebase would use classes both for holding business logic and values. Let's simulate this using the following implementation of our add function:

<?php

class Integer {

private $value;

public function __construct($v) { $this->value = $v; }

public function get() { return $this->value; }

}

class Adder {

public function add(Integer $a, Integer $b) {

return $a->get() + $b->get();

}

}

The time taken by the traditional approach would increase:

<?php

benchmark([new Adder, 'add'], [new Integer(21), new Integer(33)], 54);

// mean: 0.767 seconds

// std: 0.019 seconds

Just by wrapping everything in a class and using a getter, the execution time nearly doubled, meaning suddenly the functional approach is only 1.5 times slower in the benchmark, and the overhead in our sample application is now 10-15%, which is already a lot better.

Can we do something?

Sadly, there is not really something that we could do ourselves. We could shave off a little bit of time with more efficient implementation of the curry and compose methods as we demonstrated using the manually curryied version of the add method, but this won't amount to much.

An implementation of both those techniques as a core part of PHP would, however, bring a lot of benefits, probably getting them on a par with traditional functions and methods, or really close. But, as far as I know, there is no plan to do so in the near future.

It could also be possible to create a C language extension for PHP to implement those two functions in a more efficient way. This will, however, be impractical as most PHP-hosting companies do not let people install custom extensions.

Closing words

As we just saw, using techniques such as currying and function composition has an impact on performance that is rather hard to mitigate on your own. In my opinion, the benefits outweigh this cost but it is important to switch to functional programming knowingly.

Also, most web applications nowadays have some kind of caching mechanism in front of the PHP application. So the only cost would be when populating this cache. If you are in such a situation, I see no reason to avoid using the techniques we learned.

Memoization

Memoization is an optimization technique where the result of an expensive function is stored so that it can be returned directly in any subsequent call with the same parameters. It is a specific case of data caching.

Although it can be used on non-pure functions with the same invalidation issues as any other cache mechanism, it is mostly used in functional languages where all functions are pure, thus simplifying greatly its usage.

The idea is to trade computation time for storage space. The first time you call a function for a given input, the result is stored and the next time the same function is called with the same arguments, the already computed result can be returned immediately. This can be achieved fairly easily in PHP using the static keyword inside your function:

<?php

function long_computation($n)

{

static $cache = [];

$key = md5(serialize($n));

if(! isset($cache[$key])) {

// your computation comes here, the rest is boilerplate

sleep(2);

$cache[$key] = $n;

}

return $cache[$key];

}

There are obviously a dozen different ways to do something similar, but this is simple enough to get an idea of how it works. One can also imagine implementing an expiration mechanism, or, since we use memory space instead of computation time, some kind of data structure where values get erased when not used to make room for newer results.

Another option would be to store the information to disk, to keep the values saved between multiple runs of the same script, for example. There exists at least one library in PHP (https://github.com/koktut/php-memoize) doing just that.

The library, however, does not work well with recursive calls out-of-the-box, as the function itself is not modified and thus the value will only be saved for the first call, not the recursive ones. The article (http://eddmann.com/posts/implementing-and-using-memoization-in-php/) linked in the library readme discusses this issue in more detail and proposes a solution.

It is interesting to note that Hack has an attribute that will automatically memoize the results of a function with arguments of a certain type (https://docs.hhvm.com/hack/attributes/special#__memoize). If you are using Hack and want to use the annotation, I recommend you read the Gotchas section first as it might not always do what you want.

Note

Hack is a language adding new features on top of PHP and running on the PHP Virtual Machine written by Facebook-the HipHop Virtual Machine (HHVM). Any PHP code is compatible with Hack, but Hack adds some new syntax, making the code incompatible with the vanilla PHP interpreter. For more information, you can visit http://hacklang.org/.

Haskell, Scala, and memoization

Neither Haskell nor Scala performs memoization automatically. And none of the two has something in their core to do so, although you can find multiple libraries offering this feature.

There is a misconception that Haskell memoizes all functions by default, which is due to the fact that the language is lazy. What really happens is that Haskell tries to delay the computation of a function call as much as possible and, once it does so, it uses the referential transparency property to replace other similar calls with the computed values.

There are, however, multiple cases where this replacement cannot happen automatically, leaving no other choice than to compute the value again. If you are interested in the topic, this Stack Overflow question is a good start with all the right keywords at http://stackoverflow.com/questions/3951012/when-is-memoization-automatic-in-ghc-haskell.

We'll leave the discussion here, as this book is about PHP.

Closing words

This was a really quick presentation of memoization as the technique is fairly simple to implement and there is nothing much to really say about it. I just wanted to present it so you are aware of the term.

If you have some long-running computation that gets called multiple times with the same arguments, I recommend you use the technique as it can really speed up things and requires nothing from the caller. It is really transparent to use.

Beware that it is, however, not a silver bullet. Depending on the data structure of your return values, it can eat up memory pretty quickly. If you encounter this issue, you can use some mechanism to clean up older, or less used, values from the cache.

Parallelization of computation

Another nice benefit of having pure functions is that you can divide a computation into multiple small parts, distribute the workload, and assemble the result. It is possible to do so for any mapping, filtering, and folding operations. The function used for folding needs to be monoidal as we will see. Functions for mapping and filtering have no particular constraint besides purity.

Mapping does not have any particular constraint beside a pure function. Say you have four cores, or computers; you will only need to follow these steps:

1. Split the array into four parts.

2. Send a part to each core to do the mapping.

3. Merge the results.

In this particular case, it might be slower than doing it on a single core as the merging operation adds an overhead. However, as soon as the computation takes longer, you are able to use more of the computing power at your disposal, and thus gain time.

Filtering operates in exactly the same way as mapping, except you send a predicate instead of a function.

Folding can only happen if you have a monoidal operation, because each split needs to start with the empty value otherwise it might skew the results:

1. Split the array into four parts.

2. Send a part to each core to do the folding with the empty value as initial value.

3. Put all results in a new array.

4. Perform the same fold operation on the new array.

If your collection is really big, you can again divide the final fold into multiple parts.

Parallel tasks in PHP

PHP was created when computers had only one core and since then the traditional way of using it is to have a single thread to serve each request. You can have multiple workers declared in your web server to serve different requests using different processes, but one request will usually use only one thread, and thus only one core.

Although a thread-safe version of the PHP binary exists, Linux distributions usually ships the non-thread-safe one because of the aforementioned reason. This does not mean it is impossible to parallelize tasks in PHP, but it sure makes it a lot harder.

The pthreads extension

PHP 7 has seen the release of a new version of the pthreads extension, which allows you to run multiple tasks in parallel using a newly designed object-oriented API. It is really great and there is even a polyfillto perform the tasks sequentially if the extension is not available.

Note

The term polyfill originated in JavaScript development. It is a small piece of code that replaces a feature that is not implemented in the user's browser. Another term that is sometimes used is shim. In our case, the pthreads-polyfill provides an API in all points similar to the one of the extension but that runs the tasks sequentially.

Sadly, using the extension is kind of a challenge. First of all, you need to have a thread-safe PHP binary, also called a ZTS binary for Zend Thread-safe. As we just saw, distributions usually don't ship this version. As far as I know, there are currently no distributions with official PHP packages having ZTS enabled. Google is usually helpful when trying to find instructions to create your own ZTS binary for your Linux distribution.

Windows and Mac OS users are in a better place as you can download ZTS binaries on http://www.php.net and you can enable the option when installing PHP with the homebrew package manager.

The other limitation is that the extension will refuse to load in a CGI context. This means you will only be able to use it on the command line. If you are interested in the reason the maintainer of the pthreads extension chose to put this constraint in place, I suggest you read this blog post he wrote, at http://blog.krakjoe.ninja/2015/09/the-worth-of-advice.html.

Now, if we assume you are able to have a ZTS version of PHP and you are only writing a CLI application, let's see how we could perform a parallel fold using the pthreads extension. The extension is hosted on GitHub at https://github.com/krakjoe/pthreads, and installation instructions can be found in the official PHP documentation at http://docs.php.net/manual/en/book.pthreads.php.

There are obviously multiple ways we could go about implementing folding using threads. We will try to go for a generic method. In some cases, a more specialized version might be quicker but this should cover a whole range of use cases already:

<?php

class Folder extends Thread {

private $collection;

private $callable;

private $initial;

private $results;

private function __construct($callable, $collection, $initial)

{

$this->callable = $callable;

$this->collection = $collection;

$this->initial = $initial;

}

public function run()

{

$this->results = array_reduce($this->collection, $this- >callable, $this->initial);

}

public static function fold($callable, array $collection, $initial, $threads=4)

{

$chunks = array_chunk($collection, ceil(count($collection) / $threads));

$threads = array_map(function($i) use ($chunks, $callable, $initial) {

$t = new static($callable, $chunks[$i], $initial);

$t->start();

return $t;

}, range(0, $threads - 1));

$results = array_map(function(Thread $t) {

$t->join();

return $t->results;

}, $threads);

return array_reduce($results, $callable, $initial);

}

}

The implementation is pretty simple; we have a simple Thread performing the reducing of each chunk and we combine them at the end using a simple array_reduce function. We could have opted to use a Poolinstance to manage the various threads but, in such a simple case, it would have complicated the implementation.

Another possibility would have been to recurse until the resulting arrays contain at most $threads elements; this way, we would have used the full computational power at our disposal until the end. But again, this would have complicated the implementation.

How do you use it? Just call the static method:

<?php

$add = function($a, $b) {

return $a + $b;

};

$collection = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

echo Folder::fold($add, $collection, 0);

// 55

If you want to play with this idea a bit, a little library implements all three higher-order functions in a parallel way (https://github.com/functional-php/parallel). You can install it using composer:

composer require functional-php/parallel

Messaging queues

Another option in PHP to parallelize tasks is to use a messaging queue. Message queues provide an asynchronous communication protocol. You will have a server that will hold the messages until one or multiple clients retrieve them.

We can implement parallel computation by having our application send X messages to the server, one for each distributed task. A certain number of workers will then retrieve the messages and perform the computation, sending back the result to the application as a new message.

There are a lot of different message queue implementations that you can use. Usually, the queue itself is not implemented in PHP, but most of them have a client implementation that you can use. We will use RabbitMQ and the php-amqplib client.

Explaining how to install the server is out of scope for this book, but you have a lot of tutorials available on the Internet. We will also not explain all the details about the implementation, only what is related to our topic. You can install the PHP library using composer:

composer require php-amqplib/php-amqplib

We will need both an implementation for our workers and the application. Let's first create a file containing the common parts, which we will call 09-rabbitmq.php:

<?php

require_once './vendor/autoload.php';

use PhpAmqpLib\Connection\AMQPStreamConnection;

$connection = new AMQPStreamConnection('localhost', 5672, 'guest', 'guest');

$channel = $connection->channel();

list($queue, ,) = $channel->queue_declare($queue_name, false, false, false, false);

$fold_function = function($a, $b) {

return $a + $b;

};

Now we create the worker:

<?php

use PhpAmqpLib\Message\AMQPMessage;

$queue_name = 'fold_queue';

require_once('09-rabbitmq.php');

function callback($r) {

global $fold_function;

$data = unserialize($r->body);

$result = array_reduce($data['collection'], $fold_function, $data['initial']);

$msg = new AMQPMessage(serialize($result));

$r->delivery_info['channel']->basic_publish($msg, '', $r- >get('reply_to'));

$r->delivery_info['channel']->basic_ack($r- >delivery_info['delivery_tag']);

};

$channel->basic_qos(null, 1, null);

$channel->basic_consume('fold_queue', '', false, false, false, false, 'callback');

while(count($channel->callbacks)) {

$channel->wait();

}

$channel->close();

$connection->close();

And now we create the application itself:

<?php

use PhpAmqpLib\Message\AMQPMessage;

$queue_name = '';

require_once('09-rabbitmq.php');

function send($channel, $queue, $chunk, $initial)

{

$data = [

'collection' => $chunk,

'initial' => $initial

];

$msg = new AMQPMessage(serialize($data), array('reply_to' => $queue));

$channel->basic_publish($msg, '', 'fold_queue');

}

class Results {

private $results = [];

private $channel;

public function register($channel, $queue)

{

$this->channel = $channel;

$channel->basic_consume($queue, '', false, false, false, false, [$this, 'process']);

}

public function process($rep)

{

$this->results[] = unserialize($rep->body);

}

public function get($expected)

{

while(count($this->results) < $expected) {

$this->channel->wait();

}

return $this->results;

}

}

$results = new Results();

$results->register($channel, $queue);

$initial = 0;

send($channel, $queue, [1, 2, 3], 0);

send($channel, $queue, [4, 5, 6], 0);

send($channel, $queue, [7, 8, 9], 0);

send($channel, $queue, [10], 0);

echo array_reduce($results->get(4), $fold_function, $initial);

// 55

Obviously, this is a really naive implementation. Requiring files like that is bad practice since PHP 5 at least and the code is really brittle, but it serves the purpose of demonstrating the possibilities offered by a message queue.

When you launch the worker, it registers itself as a consumer of the fold_queue queue. When a message is received, it uses the folding function declared in the common part on the data and sends the result back on the queue defined as the reply to. The loop ensures we wait for incoming messages; given the code, the worker should never exit by itself.

The application has a send function that sends messages on the fold_queue queue. The Results class instance registers itself as a consumer of the default queue in order to receive the results of each worker. Then four messages are sent, and we ask the Results instance to wait for them. Finally, we reduce the received data to get the final result.

If you launch only one worker, the results will be sent sequentially; however, if you launch multiple workers, each one of them will retrieve a message from the RabbitMQ server and process it, thus enabling parallelization.

Compared to using threads, a message queue has multiple benefits:

· The workers can be on multiple computers

· The worker can be implemented in any other language having a client for the chosen queue

· The queue server provides redundancy and failover mechanisms

· The queue server can perform load-balancing between the workers

Using the pthreads library when available is probably a bit easier if you plan on only distributing your workload across the cores of a unique computer, but if you want to have more flexibility, message queues are the way to go.

Other options

There are other ways to start parallelizing computations in PHP, but usually they make retrieving the values more difficult than what we just saw.

One option is to use the curl_multi_exec function to execute multiple HTTP requests asynchronously. The general structure would be similar to what we used in the message queue example. However, the possibilities are also limited compared to the full power of a complete messaging system.

You can also create other PHP processes using one of the multiple related functions. In this case, the difficulty is often to pass and retrieve the data without loss as the way to do so will depend on a number of factors related to the environment. If you want to go this way, the popen, exec, or passthru functions are probably your best bet.

If you don't want to do all the grunt work, you can also use the Parallel.php library, which abstracts most of the complexity away. You can install it using composer:

composer require kzykhys/parallel

The documentation is available on GitHub at https://github.com/kzykhys/Parallel.php. As the library uses Unix sockets, most of the issues related to data loss are gone. However, you won't be able to use it on Windows.

Closing words

As we saw, it might not be the easiest thing to work with multiple threads or processes in PHP, especially when in the context of a web page. This can, however, be achieved and it can greatly speed up long computations.

With the rewrite of pthreads for PHP 7, we can hope that more Linux distributions and hosting companies will start providing a ZTS version.

If this is the case, and parallel computation starts becoming a real thing in PHP, it might be possible to do some light big data processing without having to resort to external libraries in other languages such as the Hadoop framework.

I want to finish with a few words about message queues. Even if you don't use them in a functional way to process data and get results back, they are a great way to perform lengthy operations in the context of a web request. For example, if you give your users a way to upload a bunch of images and you need to process them, you can enqueue the operation and return immediately to the user. The queued message will be processed in due time and your user won't have to wait.

Summary

In this chapter, we discovered that sadly there is a cost to pay when doing functional programming. Since PHP has no core support for features such as currying and function composition, there is an overhead linked to the wrapper functions when using them. This can obviously be an issue in some cases, but caching can often alleviate this cost.

We talked about memoization, a caching technique that, coupled with pure functions, allows you to speed up subsequent calls to a given function without having to invalidate the stored results.

Finally, we discussed parallelizing computations in PHP by leveraging the fact that any pure operation performed on a collection can be distributed across multiple nodes without having any concerns about shared state.

The next chapter will be dedicated to developers using a framework as we will discover how we can leverage the techniques we've learned so far in the context of the most common frameworks currently in use in the PHP world.