Functional PHP (2017)

Chapter 8. Testing

We already asserted multiple times throughout the book that pure functions are easier to test; it is time we prove it. In this chapter, we will first present a small glossary about the topic in order to ensure we speak a common language. We will then continue with how a functional approach helps with traditional testing. Finally, we will learn about a different way to test code, called property-based testing.

None of the subjects of this chapter are strictly confined to functional programming; you will be able to use anything in any legacy codebase. Also, this is not a book about testing, so we will not go into every detail. It is also assumed that you have some prior knowledge about testing code in PHP.

In this chapter, we will cover the following topics:

· Small testing glossary

· Testing pure functions

· Test parallelization as a speed-up technique

· Property-based testing

Testing vocabulary

I won't claim to give you a complete glossary of all testing-related terms and also I won't explain the subtle differences and interpretations that could be made for each of them. The idea of this section is simply to lay some common ground.

The glossary won't be in alphabetical order, but rather terms will be grouped by categories. Also, it must by no means be considered a complete glossary. There are a lot more terms and techniques that pertain to testing than what will be presented here, especially if you include all testing methods related to performance, security, and usability:

· Unit testing: Tests conducted against each individual component separately. What is considered a unit varies-a function/method, a whole class, a whole module. Usually, dependency to other units is mocked to cleanly isolate each part.

· Functional testing: Tests the software as a black box to ensure that it meets the specifications. External dependency is usually mocked.

· Integration testing: Tests conducted against the whole application and its dependencies, including external ones, to ensure that everything integrates correctly.

· Acceptance testing: Tests conducted by the final customer / end user against a set of agreed-upon criteria.

· Regression testing: Repeats a test after some change is made to ensure no issues were introduced in the process.

· Fuzz testing / Fuzzing: Tests conducted by inputting massive amounts of (semi) random data in order to make it crash. This helps discover coding errors or security issues.

· Ad-hoc testing: Tests performed with no formal framework or plan.

· Component testing: See unit testing.

· Blackbox testing: See functional testing.

· Behavioral testing: See functional testing.

· User Acceptance testing (UAT): See acceptance testing.

· Alpha version: Usually, the first version that is tested as a black box. It can be unstable and cause data loss.

· Beta version: Usually, the first version that is feature-complete and in a state good enough to be released to external people. It can still have serious issues and should not be used in a production environment.

· Release Candidate (RC): A version that is deemed stable enough to be released to the public for a final test. Usually the last RC is "promoted" as the released version.

· Mocking (mock): Creating components that imitate other parts of the software or an external service to test only the matter at hand.

· Stubbing (stub): See mocking.

· Code coverage: The percentage of the application code or features that is covered by the tests. It can have different granularity: by lines, by functions, by components, and so on.

· Instrumentation: The process of adding code to the application in order to test and monitor behavior or coverage. It can be done manually or by a tool either in the source, in a compiled form, or in-memory.

· Peer review: A process where one or multiple colleagues examine the produced work such as code, documentation, or anything pertaining to the release.

· Static analysis: Analysis the application without running it, usually done by a tool. It can provide information about coverage, complexity, coding style, or even found issues.

· Static testing: All of the testing and reviews performed without executing the application. See peer review and static analysis.

· Smoke tests: Superficially testing the main parts of an application to ensure the core features work.

· Technical review: See peer review.

· Decision point: A statement in the code where a change in the control flow can happen, typically an if condition.

· Path: The sequence of statements executed from the beginning of the function to the end. A function can have multiple paths depending on its decision point.

· Cyclomatic complexity: A measure of the complexity of a piece of code. There are various algorithms to compute it; one is "number of decision points + 1".

· Defect, failure, issue, or bug: Anything that does not work as expected in the application.

· False-positive: A test result seen as a defect when in fact everything works fine.

· False-negative: A test result seen as a success when in fact there is a defect.

· Test-driven Development (TDD): A development methodology where you start by writing a test and then the minimum amount of code to make it pass before repeating the process.

· Behavior-driven Development (BDD): A development methodology based on TDD where you describe behavior using a domain-specific language instead of writing traditional tests.

· Type-driven Development: A running joke in the functional world where you replace tests with a strong type system. Depending on whom you ask, the idea might be taken more or less seriously.

· X-driven Development: There is a new best development methodology created every week; the website http://devdriven.by/ tries to reference them all.

Testing pure functions

As we just saw in the glossary, there are a lot of potential ways to test an application. In this section, we will, however, limit ourselves to tests at the function level; or in other words, we will do unit testing.

So, what makes pure functions so much easier to test? There are multiple reasons; let's start by enumerating them and we will then see why with real test cases:

· Mocking is simplified as you only need to provide input arguments. No external state to create, no singletons to stub.

· Repeated calls will yield exactly the same result for a given arguments list, whatever the time of day or previously run tests. There is no need to put the application in a certain state.

· Functional programming encourages smaller functions doing exactly one thing. This usually entails test cases that are easier to write and understand.

· Referential transparency usually means you need fewer tests to gain the same level of trust in your code.

· The absence of side-effects guarantees that your test will have no consequences on any other subsequent tests. This means you can run them in any order you want without worrying about resetting the state between each test or running them in isolation.

Some of these claims may seem a bit bold to you, or maybe you are unsure why I made them. Let's take some time to verify why they are true with examples. We will separate our examples into four different parts to makes things easier to follow.

All inputs are explicit

As we discovered earlier, a pure function needs to have all of its inputs as arguments. You cannot rely on some static method from a singleton, generate random numbers, or get any kind of data that can change from an external source.

The corollary is that you can run your test at any time during the day, on any environment, and for any given list of arguments, and the output will stay the same. This simple fact makes both writing and reading tests a lot easier.

Imagine you have to test the following function:

<?php

function greet()

{

$hour = (int) date('g');

if ($hour >= 5 && $hour < 12) {

return "Good morning!";

} elseif ($hour < 18) {

return "Good afternoon!";

} elseif ($hour < 22) {

return "Good evening!";

}

return "Good night!";

}

The problem is that, when you call the function, you need to know what time it is so you can check whether the return value is correct. This fact leads to some issues:

· You basically have to re-implement the function logic inside the test, thus possibly having the same bug in both the test and the function.

· There is a slight chance that, between the time you computed the expected value and the function gets the time again to return a result, a minute elapsed, changing the current hour and thus the function result. Those kinds of false positive are a real headache to debug.

· You cannot test all possible outputs without somehow manipulating the system clock.

· The dependency to the current time being hidden, the person reading the test can only infer what the function is doing.

By simply moving the $hour variable as a parameter, we solve all the previously mentioned issues.

Also, if you use a test runner that allows you to create a data provider for your tests, such as PHPUnit or atoum, testing the function becomes as simple as creating a provider that creates a list of hours associated with the expected return and simply feeds the time to the function and checks the result. This test is a lot simpler to write, understand, and expand than anything else you would have needed to write earlier.

Referential transparency and no side-effects

Referential transparency ensures that you can replace a function call (with certain arguments) with the result of the computation anywhere in your code. This is also an interesting property for testing as it mostly means you will need to test less to gain the same amount of trust. Let me explain.

Usually, when you do unit testing, you try to choose the smallest unit possible that satisfies the trust you want to place in your code. Usually, you will test either at the module, class, or method level. Obviously, when doing functional programming, you will test at the function level.

Your functions will obviously call other functions. In a traditional testing setup, you would try to mock as many as those as possible in order to ensure that you test only the functionality of the current unit and you are not impacted by possible bugs in other functions.

Although not impossible, it's cumbersome to mock functions in PHP, so this becomes a bit difficult in our case. This is especially true for composed functions such as $title = compose('strip_tags', 'trim', 'capitalize'); due to the way composition is implemented in PHP using closures.

So what do we do? Pretty much nothing. The goal of unit testing is to gain confidence in the fact that your code works in the expected way. In a traditional imperative approach, you mock as many dependencies as possible for the following reasons:

· Each dependency can depend on some state you need to provide, making your job tougher. Even worse, dependencies can have dependencies of their own that also require some state, and so on.

· Imperative code can have side effects, which could lead to your function or some dependencies having issues. This means that without mocks, you are not only testing your function, but all other dependencies and the interaction between them; in other words, you are doing integration testing.

· Control structures introduce decision points, which can make reasoning about a function complex; this means that, if you reduce the number of moving pieces to the strict minimum, your function is easier to test. Mocking other function calls reduces this complexity.

When doing functional programming, the first issue is moot as there is no global state. Everything your dependencies will ever need is either already in the arguments to your tested function or will be computed along the way. So mocking dependencies will make you do more work instead of less.

Since our functions are pure and referentially transparent, there is no risk of side effects having any consequences on the computation result, meaning even if we have dependency, we are not doing integration testing. Sure, a bug in one of the functions that is called will result in an error, but hopefully it will also have been caught earlier by another test, making it clear what is happening.

Concerning the complexity, if we go back to our composed function, $title = compose('strip_tags', 'trim', 'capitalize');, I posit it is really easy for anyone to understand what is happening. If all three functions are already tested, there is nothing much that can go wrong, even if we were to rewrite this without the compose command:

<?php

function title(string $string): string

{

$stripped = strip_tags($string);

$trimmed = trim($stripped);

return capitalize($trimmed);

}

There is not much to test here. Obviously, we would have to write some tests to ensure that we pass the right temporary value to each function and that the plumbing works as expected, but if we have confidence in all three called functions, we can have a lot of confidence that this function will work also.

This line of reasoning is only possible because we know due to the properties of referential transparency that none of the three functions will have any impact on any of the others in some subtle way, meaning that their own unit tests give us trust enough in the fact that they will not break.

The result of all this is that usually you will write fewer tests for functional code because you will gain trust quicker. However, it does not mean that the title function does not need to be tested, because you could have made a small mistake somewhere. Each component should still be tested, but probably with a bit less care in correctly isolating everything.

Obviously, we are not talking about database access, or third-party APIs, or services here; those should always be mocked for the same reasons as in any test suite.

Simplified mocking

This might already be clear, but I really want to stress the point that any mocking you will have to do will be greatly simplified.

First of all, you will only need to create the input arguments of the function under test. In some cases, this represents creating some pretty big data structures or instantiating complex classes, but at least you don't have to mock external states or a whole lot of services that are injected in your dependencies.

Also, this might not be true in all cases, but usually your functions operate on a smaller scale because they are a small part of something bigger, meaning that any one function will only take some really precise and concise parameters.

Obviously, there will be exceptions, but not that many, and as we discussed earlier, since all of the parts making the big picture will already be tested. Your degree of confidence should then already be higher than is usually the case in a more imperative application.

Building blocks

Functional programming encourages the creation of small building blocks that get reused as part of bigger functions. Those small functions do usually only one thing. This makes them easier to understand, but also easier to test.

The more decision points a function has, the more difficult it is to come up with a way to test each possible execution path. A small specialized function has usually at most two of those decision points, making it fairly easy to test.

Bigger function usually don't perform any kind of control flow, they are just composed of our smaller blocks in a straightforward way. Since this means there is only one possible execution path, it also means that they are easy to test.

Closing words

Of course, I am not saying that you won't encounter some pure functions that are difficult to test. It's just than in general you will have less trouble writing your tests and you will also gain trust in your code quicker.

With the industry moving ever closer to methodologies such as TDD, this means that functional programming is really a good fit for a modern application. This is especially true once you realize that most advice you'll find in order to write "testable code" is already enforced by using only functional programming techniques.

Speeding up using parallelization

If you have ever searched for a solution to speed up your test suites, chances are that you found something about test parallelization. Usually, users of PHPUnit will find the ParaTest utility, for example.

The main idea is to run multiple PHP processes simultaneously in order to leverage all the processing power of the computer. This approach works for mostly two reasons:

· A single test run has bottlenecks such as disk speed for source file parsing or database access.

· PHP being single-threaded, a multi-core CPU, like nearly all computers have nowadays, is not used to its full potential by a single test run.

By running multiples tests in parallel, both those issues can be solved. The ability to do this is, however, limited by the fact that each test suite is independent from the others, a property that is already enforced by referential transparency in a functional codebase.

This means that, if the functions under test follow the functional principles, you can run all your tests in parallel without having to make any adaptation. In some cases, this could divide by ten the time taken for your whole test suite, greatly improving the feedback loop when you develop in the process.

If you are using PHPUnit utility, the aforementioned ParaTest utility is one of the easiest ways to get started. You can find it on GitHub at https://github.com/brianium/paratest. I advise you to use the -functionalcommand-line parameter so that each function can be tested simultaneously instead of just the test cases.

There is also a brand-new utility for PHPUnit users called PHPChunkIt. I haven't had the opportunity to test it, but I hear it is interesting. You can find it on GitHub at https://github.com/jwage/phpchunkit.

Another more flexible option is using Fastest, available at https://github.com/liuggio/fastest. The examples shown in the tool documentation are for PHPUnit, but in theory it is able to run anything in parallel.

If you are using the atoum utility instead, by default your tests are already in what they call concurrent mode, which means they run in parallel. You can modify this behavior for each test using annotations as stated in the execution engine documentation at https://atoum-en.rtfd.org/en/latest/engine.html.

The behat framework users can use the Parallel Runner extension, also available on GitHub at https://github.com/shvetsgroup/ParallelRunner. If you are using CodeCeption framework, it is sadly a bit difficult to achieve; the documentation (http://codeception.com/docs/12-ParallelExecution) has, however, multiple possible solutions for you.

I strongly suggest you look into parallelizing your tests as it will be time well spent. Even if you are only able to save a few seconds on each run, this gain quickly accumulates. Faster tests means you will run them more often and this is usually a good way to improve code quality.

Property-based testing

Tired of spending time tediously writing test cases, John Hughes and Koen Claessen decided it was time for a change. A little more than 15 years ago, they wrote and published a paper about a new tool they called QuickCheck.

The main idea is that, instead of defining a list of possible input values and then asserting that the result is what we expect, you define a list of properties that characterize your function. The tool then generates as many test cases as wanted automatically and verifies that the property holds.

The default operating mode is for QuickCheck to generate random values and feed them to your functions. The result is then checked against the properties. If a failure is detected, the tool will then try to reduce the inputs to the minimal set of inputs generating the issue.

Having a tool generating as many testing values as you want is invaluable to find edge cases it would have taken you hours to think about. The fact that the test case is then reduced to its minimal form is also great to easily determine what is going wrong and how to fix it. It so happens that random values are not always the best way to test something. This is why you can also provide generators that will be used instead.

Also, thinking of your tests as a set of properties that need to hold true is a great way to focus more clearly on what the system is supposed to do instead of focusing on finding test values. This is especially helpful when doing TDD as your tests will be more akin to a specification.

If you want to learn more about this approach, the original paper is available online at http://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf. The author uses Haskell in his paper but the content is however fairly easy to read and understand.

What exactly is a property?

A property is a rule that your function must respect in order to be determined correct. It can be something really simple, such as the result of a function adding to integers requiring also to be an integer, or anything more complex, such as verifying the monad laws.

You usually want to create properties that are not already enforced otherwise, be it by another property or the language. For example, if we use the scalar type systems introduced by PHP 7, our preceding integer example is not needed.

As an example, we will take something from the paper. Say we just wrote a function that reverses the order of elements in an array. The authors propose that this function should have the following properties:

· The reverse([x]) == [x] property reverses an array with a single element and should yield the exact same array

· The reverse(reverse(x)) == x property reverses an array twice and should yield the exact same array

· The reverse(array_merge(x, y)) == array_merge(reverse(y), reverse(x)) property, reversing two merged arrays should yield the same result as merging the second array reversed to the first one reversed

The first two properties will guarantee that our function does not mess with the values. If we were to have only those two properties, a function doing absolutely nothing besides returning its parameter will pass the test with flying colors. This is where the third property comes into play. The way it is written ensures that our function does what we expect of it because there is no other way the property will hold.

What is interesting about those properties is that at no time do they perform any kind of computation. They are simple to implement and understand, meaning it is nearly impossible to introduce bugs in them. If you were to test your functions by somehow re-implementing the computation they are doing, it would kind of defeat the whole point.

Although pretty simple, this example shows perfectly that it is not easy to find valuable properties that are both meaningful and simple enough to ensure they will have no bugs. If you have trouble finding good properties, I encourage you to take an overview and think of your function in terms of the business logic you are trying to implement. Do not think in terms of inputs and outputs but try to see the broader picture.

he PhpQuickCheck testing library

Having seen the theoretical aspects of property-based testing in general, we can now shift our attention to a PHP-specific implementation-the PhpQuickCheck library. The source code is available on GitHub at https://github.com/steos/php-quickcheck and the library can be installed using composer command:

composer require steos/php-quickcheck -stability dev

You might need to change your minimum-stability setting to dev in your composer.json file, or add the dependency manually as explained on the GitHub page, because there is currently no stable release of the library.

The project was started in September 2014 and most of its development took place until November of the same year. Since then, not many new features have been added, mostly improvement of the coding styles and some minor improvements.

Although we can't say the project is really alive today, it is one of the first serious attempts to have a QuickCheck library in PHP and it has some functionalities that are not yet available in its main contender and will be discussed later.

But let's not get ahead of ourselves; let's get back to our first example, the reverse function. Imagine we wrote the array_reverse function available in PHP and we needed to test it. This is how it would look with the PhpQuickCheck library:

<?php

use QCheck\Generator;

use QCheck\Quick;

$singleElement = Quick::check(1000, Generator::forAll(

[Generator::ints()],

function($i) {

return array_reverse([$i]) == [$i];

}

), ['echo' => true]);

$inverse = Quick::check(1000, Generator::forAll(

[Generator::ints()->intoArrays()],

function($array) {

return array_reverse(array_reverse($array)) == $array;

}

), ['echo' => true]);

$merge = Quick::check(1000, Generator::forAll(

[Generator::ints()->intoArrays(), Generator::ints()- >intoArrays()],

function($x, $y) {

return

array_reverse(array_merge($x, $y)) ==

array_merge(array_reverse($y), array_reverse($x));

}

), ['echo' => true]);

The check static method accepts the amount of test data it needs to generate as the first argument. The second argument is an instance of Generator function; usually, you will use Generator::forAll to create it in the example. The last part is an array of options you can pass in the random generator seed variable, the max_size function for the generated data (the meaning of this value depends on the generator used), or finally the echo options which will display a dot (.) for each passed test.

The forAll instance accepts an array representing the arguments to your test and the test itself. In our case, for the first test, we generate random integers and for the other two, random integer arrays. The test must return a Boolean value: true for passed, false otherwise.

If you were to run our little example, it would display a dot for each random data generated, because we passed the echo option. The resulting variable contains information about the test results themselves. In our case, if you displayed $merge, it would show:

array(3) {

["result"]=> bool(true)

["num_tests"]=> int(1000)

["seed"]=> int(1478161013564)

}

The seed instance will be different on each run except if you pass one as parameter. Reusing the seed instance allows you to create the exact same test data. This can be useful to check whether a particular edge case is correctly fixed after being discovered.

An interesting feature is automatically determining which generator to use based on type annotations. You can do so using methods on the Annotation class:

<?php

/**

* @param string $s

* @return bool

function my_function($s) {

return is_string($s);

}

Annotation::check('my_function');

This feature can, however, only work with annotation right now and type hints will be ignored.

As you can see with those small examples, the PhpQuickCheck library relies heavily on static functions. The codebase in itself is also sometimes a bit hard to understand and the library lacks good documentation and an active community.

All in all, I don't think I would recommend using this over the option we'll see next. I just wanted to present the library to you as a possible alternative and, who knows, its status might change in the future.

Eris

Eris development started out in November 2014, roughly at the time the PhpQuickCheck library got its last big feature introduced. As we will see, the coding style is definitively more modern. Everything is cleanly organized in namespace and helpers take the form of functions instead of static methods.

As usual, you can get Eris using the composer command:

composer require giorgiosironi/eris

The documentation is available online at http://eris.rtfd.org/ and it is quite complete. The only gripe I have with it is that the sole examples are for people using PHPUnit to run their test suites. It should be doable to use it with other tests runners, but this is something that isn't documented for now.

If we wanted to use Eris to test the properties we defined for array_reduce, our test case would look something like this:

<?php

use Eris\Generator;

class ArrayReverseTest extends \PHPUnit_Framework_TestCase

{

use Eris\TestTrait;

public function testSingleElement()

{

$this->forAll(Generator\vector(1, Generator\nat()))

->then(function ($x) {

$this->assertEquals($x, array_reverse($x));

});

}

public function testInverse()

{

$this->forAll(Generator\seq(Generator\nat()))

->then(function ($x) {

$this->assertEquals($x, array_reverse(array_reverse($x)));

});

}

public function testMerge()

{

$this->forAll(

Generator\seq(Generator\nat()),

Generator\seq(Generator\nat())

)

->then(function ($x, $y) {

$this->assertEquals(

array_reverse(array_merge($x, $y)),

array_merge(array_reverse($y), array_reverse($x))

);

});

}

The code is somewhat similar to what we wrote for the PhpQuickCheck library but leverages methods that are added by the provided trait to our test case and generator functions instead of static methods. The forAll method accepts a list of generators representing the arguments to our test function. You can subsequently use the then keyword to define the function. You have access to all asserters provided by PHPUnit.

The documentation explains in detail how you can configure various aspects of the library, such as the amount of generated test data, limiting the execution time, and so on. Each generator is also detailed at length with various examples and use cases.

Let's see what happens when we have a failing test case. Imagine we want to prove that no strings are also a numerical value; we could write the following test:

<?php

class StringAreNotNumbersTest extends \PHPUnit_Framework_TestCase

{

use Eris\TestTrait;

public function testStrings()

{

$this->limitTo(1000)

->forAll(Generator\string())

->then(function ($s) {

$this->assertFalse(is_numeric($s),"'$s' is a numeric value.");});

}

You can see how we raised the number of iterations using the limitTo function to 1,000 from the default of 100. This is because a lot of strings are in fact not numerical values and without this raise, I was only able to get a failure one test out of three. Even with this higher limit, it is still possible that sometimes all test data will pass the test without failures.

This is the kind of output you would get:

PHPUnit 5.6.2 by Sebastian Bergmann and contributors.

F 1 / 1 (100%)

Reproduce with:

ERIS_SEED=1478176692904359 vendor/bin/phpunit --filter StringAreNotNumbersTest::testStrings

Time: 42 ms, Memory: 4.00MB

There was 1 failure:

1) StringAreNotNumbersTest::testStrings

'9' is a numeric value.

Failed asserting that true is false.

./src/test.php:55

./src/Quantifier/Evaluation.php:51

./src/Quantifier/ForAll.php:154

./src/Quantifier/ForAll.php:180

./src/test.php:57

FAILURES!

Tests: 1, Assertions: 160, Failures: 1.

The test failed after 160 iterations with the string "9". Eris also gives you the command to run if you want to reproduce exactly this failing test by seeding the random generator manually:

ERIS_SEED=1478176692904359 vendor/bin/phpunit -filter StringAreNotNumbersTest::testStrings".

As you can see, the library is fairly easy to use when your tests are written for PHPUnit. Otherwise, you might need to do some adaptation but I think it is worth your time.

Closing words

The QuickCheck library is easier to use in strictly typed functional programming language because it is sufficient to declare generators for certain types and some properties for your functions, and nearly everything else can be done automatically. The PhpQuickCheck library tries to emulate this behavior but the result is a bit tedious to use.

However, this doesn't mean you can't use property-based testing effectively in PHP! Once you have created your generators, the framework will use it to generate as much test data as you let it, possibly uncovering edge cases you would never have thought of. For example, there is a bug in the DateTime method's implementation in PHP that arises on leap years and could easily be overlooked when creating test data manually. See the Testing the language part at http://www.giorgiosironi.com/2015/06/property-based-testing-primer.html (by the creator of Eris) for more details on the issue.

Writing properties can be challenging, especially in the beginning. But more often than not, it helps you reason about the feature you are implementing and will probably lead to better code because you took the time to think about it from a different angle.

Summary

In this chapter, we had a quick look at what can be done on the testing front when you use a more functional approach to programming. As we saw, functional code is often easier to test because it enforces what is considered best practice for testing when doing imperative coding.

By having no side-effects and explicit dependencies, you can avoid most of the issues you usually encounter when writing tests. This results in less time spent testing and more time to concentrate on your application.

We also discovered property-based testing, which is a great way to discover issues related to edge cases. It also allows us you to take a step back and think about the properties you want to enforce for your functions, which is akin to creating a specification for them. This approach is particularly effective when doing TDD as it forces you to think about what you want instead of how to do it.

Now that we have discussed testing to ensure our functions do what they should, we will learn about code optimization in order to allow for application performance in the next chapter. A well-tested codebase will help you do the necessary refactoring to achieve better speed.