Typed PHP: Stronger Types For Cleaner Code (2014)

Introduction

Why Write This Book

I decided to write this book after spending a great deal of time working towards libraries that simplify the methods used when working with strings, numbers, arrays etc. They’re called Scalar Types, because PHP treats them differently to objects. They have no properties, no methods.

PHP has a rich history and a dominant place on the web. It has achieved much despite language inconsistencies and difficulties, particularly when it comes to these scalar types. Bjarne Stroustrup once said: “There are only two kinds of languages: the ones people complain about and the ones nobody uses”. PHP is one of those languages that everybody uses, yet that’s often seen as a good reason to ignore the bad parts and just get stuff done.

I’m all for getting stuff done, and to that end I have used Plain Ol’ PHP for many years. It’s always bugged me how procedural PHP is, in an ecosystem of OOP libraries and frameworks. So I decided to take a deeper look at building a stronger type system on top of PHP.

That’s the goal of this book. We look at using standard PHP libraries. We look at using user-land libraries. We look at using extensions and cross-compilers. All this is works towards creating a set of reusable tools which unify and ease the scalar types of PHP.

Who This Book Is For

This book assumes you have working knowledge of PHP. That means you understand the basics of programming, and have already employed them to write PHP code.

You don’t need to know how to set up a PHP stack - we will cover how that can easily be done (using VirtualBox, Vagrant and Phansible).

You also need to have an open mind. Many of the concepts, covered by this book, are experimental and hardly any of them are commonplace. That’s not to say that you can’t use these techniques in production applications, but it’s up to you to decide if your architecture will benefit from third-party extensions (in addition to PHP core).

Finally, you should have access to a decent Internet connection. The examples in this book work best inside a Vagrant virtual machine. If you’re unfamiliar with the term, don’t worry. We’ll talk about it later, but it’s basically software that installed development environments. That installation happens often and benefits from a fast connection.

Procedural / Object Oriented

Procedural and Object Oriented are different programming styles which approach program execution in different ways. It’s good to understand how they work, and how they define the state of scalar types.

Procedural Programming

Procedural Programming describes a top-down approach to program execution. That is; Procedural programs consist of a list of steps for the interpreter to take (from top to bottom).

In pseudo-code, a Procedural image resize program may resemble:

1. start execution

2. then store a file reference returned by a open_file function

3. then store a modified image returned by a resize_image function

4. then close the open file

5. then store a file reference returned by a open_file function

6. then write the modified image data to the second open file

7. then close the open file

8. then empty the modified image variable

9. end execution

Procedural programs can call on functions (as described in the example) and can define functions. The currently executing line position can be modified by things like looping constructs and goto-like constructs, but for the most part the program is framed in a set of instructions.

Object Oriented Programming

Object Oriented Programming describes program execution as the interaction and description of entities (or objects). These objects can have any combination of properties (another name for owned variables) and methods (another name for owned functions).

The code within these owned functions are still executed from top to bottom, but the main idea is that the program depends on inter-object communication for it’s coherence.

In pseudo-code, an Object-Oriented image resize program may resemble:

1. start execution

2. then create a file object

3. then create an image resize object

4. then pass the file object to the image resize object

5. then create another file object

6. then write the result of the image resize object’s resize method to the second file object

7. then close the second file object

8. then destroy the second file object

9. then destroy the image resize object

10.then destroy the first file object

11.end execution

Which is the best?

…is the wrong question to ask. Both have their strengths and weaknesses. Object-Oriented Programming mostly leads to more code than Procedural Programming. Object-oriented Programming lends itself to better separation between concerns. Stated another way - it’s easier to think of objects and their behaviours than to think of the whole flow of a program.

Ultimately, Procedural code can be as clean as Object-Oriented code.

“Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function.” - John Carmack

Which is PHP?

PHP is very procedural. That’s how Rasmus built it (back in the proverbial day) and that’s how it has mostly stayed. This is apparent, when it comes to how scalar variables are handled and manipulated. The PHP types are not objects. They don’t have methods or properties. If you want to do something to a PHP scalar variable (string, int, float, bool…), you pass it to a function.

Before PHP 5.3, there were no namespaces. As a result - all of these type-specific methods were added to the global namespace. They are available everywhere, and (as we’ve seen) they are inconsistent.

This makes scalar type code ugly code.

Native Function Inconsistencies

PHP is often decried because of the inconsistencies in the native functions. PHP is often praised because of the quality of the documentation. These are related! The documentation has evolved so well because native PHP functions are inconsistent.

This section is going to make me sound like a PHP-hater. That couldn’t be further from the truth! I love PHP and I’m committed to using it and helping others use it. To know what we’re building, we have to know what we’re trying to avoid building. That’s the point of what’s to follow.

Sporadic Underscores

· parse_str

· printf

· str_pad

· strcmp

· strip_tags

· stripslashes

These functions are described at http://php.net/manual/en/ref.strings.php

The full list contains 98 functions, 30 of which use one or more underscores. Sometimes functions clearly composed of multiple full words (like setlocale) don’t have underscores. Sometimes functions which do almost exactly the same things (strlen vs. str_word_count) are handled differently.

Sporadic Abbreviation

· addslashes

· chr

· htmlentities

· lcfirst

· number_format

· stroll

Most of the string functions use abbreviations of some kind. Applied consistently, this wouldn’t be a problem. However, the inclusion of a few non-abbreviated functions makes the string API difficult to memorise, which often means a round-trip to the documentation.

Inconsistent Argument Order

· array_key_exists($needle, $haystack)

· stripos($haystack , $needle)

The natural order of arguments differs depending on whether you are working with array functions or string functions. Rasmus explains this as a result of keeping as close to the underlying C libraries as possible. The problem with this explanation is that it means nothing to developers who have never worked with C, and just want to work with PHP.

It’s helpful to remember that the array methods are needle/haystack and the string methods are haystack/needle.

Regular Expression / Strings

· preg_filter

· str_replace

· preg_match

· strstr

· preg_split

· explode

In PHP, Regular Expressions are represented as strings but handled with a completely different set of functions. That’s just strange. While I can accept that Regular Expressions have a similar representation to strings, and that the functions they are given to (in the C libraries) expect to work with strings, developers shouldn’t need to account for this.

PHP assumes a string is a Regular Expression, if it starts and ends with recognisable delimiters. You can learn more about that at http://www.php.net/manual/en/regexp.reference.delimiters.php.

Alternatively, Regular Expressions should have their own representation which clearly separates them from strings. An example of a language which already has this, is JavaScript:

1 "abc".replace("b", "123"); // "abc" becomes "a123c"

2 "def".replace(/[e]/, "456"); // "def" becomes "d456f"

These languages make clear the distinction between strings and Regular Expressions.

Nouns / Verbs

· echo

· htmlentities

· lcfirst

· md5

· parse_str

· soundedex

Some of the native functions are verbs (like echo and parse_str) while others are nouns (like htmlentities and soundex). This makes it tricky to reason about what the method is doing.

Strange Return Values

Many of the native functions return multiple types. In the case of strstr a string is returned if it is matched, else the function returns false.

In contrast to this, the preg_match function returns 1 if the pattern is found, 0 if it is not and false if an error occurred. This makes for a slew of type checking before any of the return values can be used for their intended purpose.

Conclusion

PHP is a great language.

But if you’ve worked much with PHP, you will either have grown to ignore the sad state of PHP’s scalar type handling, or been frustrated at the lack of good alternatives.

And for most small-to-medium sized projects, adding a revised type-system is unnecessary. Yet if you learned how to make (or even just use) a well-built type system, wouldn’t it make sense to use it in large projects? Or in any projects you cared enough about?

This area of PHP often chases developers into prettier languages. Don’t be one of those developers! Learn how to write cleaner code, by using a clean abstraction.