The Clean Architecture in PHP (2015)
THE PROBLEM WITH CODE
Writing code is easy. So easy that there are literally hundreds of books claiming they can teach you to do it in two weeks, ten days, or even twenty-four hours. That makes it sound really easy! The internet is littered with articles on how to do it. It seems like everyone is doing it and blogging about it.
Here’s a fun question: would you trust a surgeon or even dentist who learned their profession from a couple of books that taught them how to operate in two weeks? Admittedly, writing code is nothing like opening up and fixing the human body, but as developers, we do deal with a lot of abstract concepts. Things that only exist as a collection of 1s and 0s. So much so that I’d definitely want an experienced, knowledgeable developer working on my project.
The problem with code is that good code, code that serves it’s purpose, has little or no defects, can survive and perform it’s purpose for a long time, and is easy to change, is quite difficult to accomplish.
Writing Good Code is Hard
If it were easy, everyone would be doing it
-Somebody, somewhere in history
Writing code is hard. Well, scratch that: writing code is easy. It’s so easy everyone is doing it. Let me start this chapter over.
If it were easy to be good at it, everyone would be good at it
-Me, I suppose
Writing code, especially in PHP, but in many other languages as well, is incredibly easy. Think about the barrier to entry: all you have to do is go download PHP on your Windows machine, type:
php -S localhost:1337
Now all of your PHP code in that directory is suddenly available for you via a browser. Hitting the ground running developing PHP is easy. On Linux, it’s even easier. Install with your distributions package manager and type the same command as above. You don’t even have to download a zip file, extract it, worry about getting it in your path, etc.
Not only is getting a server up easy, but actually learning how to get things accomplished in PHP is incredibly easy. Just search the web for “PHP.” Go ahead, I’ll wait. From Google, I got 2,800,000,000 results. The internet is literally littered with articles, tutorials, and source code relating to PHP.
I chose my words very carefully. The internet is literally littered with PHP.
Writing Bad Code is Easy
Since PHP is incredibly easy to get started in, it makes sense that eventually it would gather a large following of developers. The good, the bad, and the ugly. PHP has been around since 1994 or so, as has been gathering developers ever since. At the time of this writing, that’s twenty years worth of code being written in PHP.
Since then, an absolute horde of poorly written PHP has shown up on the web, in the form of articles, tutorials, StackOverflow solutions, and open source code. It’s also fair to point out that some really stellar PHP has shown up as well. The problem is, writing code the good way (we’ll talk about what that means soon) typically tends to be harder. Doing it the down and dirty, quick, and easy to understand way, is, well, easier.
The web has been proliferated with poorly written PHP, and the process of turning out poorly written PHP only naturally increases with the popularity and adoption of the language.
Simply put: it’s just way too easy to write bad code in PHP, it’s way too easy to find bad code in PHP, it’s way too easy to suggest (via putting source code out there or writing tutorials) others write bad code, and it’s way too easy for developers to never “level up” their skills.
So why is bad code bad? Let’s discuss the results of bad code.
We Can’t Test Anything
We don’t have time to write tests, we need to get working software out the door.
-The Project Manager at a previous job
Who has time to write tests? Test are hard, time consuming, and they don’t make anybody any money. At least according to project managers. All of this is absolutely correct. Writing good tests can be challenging. Writing good test can be time consuming. Very rarely will you come across an instance in your life where someone cuts you a check specifically to write software tests.
The Project Manager at my last job who, painfully, was also my boss, absolutely put his foot down to writing tests. Working software out the door was our one and only goal; making that paycheck. What’s so incredibly ironic about this is that a robust test suite is the number one way to make it possible to write working software.
Writing tests is supremely important to having a stable, long lasting software application. The mountains of books, articles, and conference talks dedicated to the subject are a testament to that fact. It’s also a testament to how hard it is to test, or, more correctly, how important it is to test effectively.
Testing is the single most important means of preventing bugs from happening in your code. While it’s not bullet proof and can never catch everything, when executed effectively, it can become a quick, repetitive, and solid way to verify that a lot of the important things in your code – such as calculating taxes or commissions or authentication – is working properly.
There is a direct correlation between how poorly you write your code, and how hard it is to test that code. Bad code is hard to test. So hard to test, in fact, that it leads some to declare testing pointless. The benefits of having tests in place though, cannot be argued.
Why does bad code make tests so hard? Think about a taxing function in our software. How hard would it be to test that taxing functionality if it were spattered about controllers? Or worse yet, spattered about a random collection of .php files? You’d essentially have to CURL the application with a set of POST variables and then search through the generated HTML to find the tax rates. That’s utterly terrible.
What happens when someone goes in and changes around the tax rates in the database? Now your known set of data is gone. Simple: use a testing database and pump it full of data on each test run. What about when designers change up the layout of the product page, and now your code to “find” the tax rate needs to change? Front-end design should dictate neither business nor testing logic.
It is nearly impossible to test poorly written code.
Change Breaks Everything
The biggest consequence of not being able to test the software is that change breaks everything. You’ve probably been there before: one little minute change seems to have dire consequences. Even worse, one small change in a specific portion of the application causes errors within a seemingly unrelated portion of the application. These are regression bugs, which is a bug caused after introducing new features, fixing other bugs, upgrading a library, changing configuration settings, etc.
When discovered, these regression bugs often lead to exclamations of “We haven’t touched that code in forever!” When they’re discovered, it’s often unknown what caused them in the first place, due to the nature of changes usually happening in “unrelated” portions of the code. The time elapsed before discovering them is often large, especially for obscure portions of the application, or very specific circumstances needed to replicate the bug.
Change breaks everything, because we don’t have the proper architecture in place to gracefully make those changes. How often have you hacked something in, and ignored those alarm bells going off in your brain? Further, how often has that hack come around to bite you later in the form of a regression bug?
Without good clean architecture conducive to change, and/or a comprehensive set of test suites, change is a very risky venture for production applications. I’ve dealt with numerous systems that the knowledgeable developer declared “stable and working” that they were utterly terrified of changing, because, when they do, “it breaks.” Stable, huh?
We Live or Die by the Framework
Frameworks are fantastic. If written well, they speed up application development tremendously. Usually, however, when writing software within a framework, your code is so embedded into that framework that you’re essentially entering a long term contract with that framework, especially if you expect your project to be long lived.
Frameworks are born every year, and die every once-in-awhile, too (read: CodeIgniter, Zend Framework 1, Symfony 1). If you’re writing your application in a framework, especially doing so the framework documented way, you’re essentially tying the success and longevity of your application to that of the framework.
We’ll discuss this much more in later chapters, and go into a specific instance where my team and I failed to properly prepare our application for the death of our chosen framework. For now, know this: there is a way to write code, using a framework, in such a way that switching out the framework shouldn’t lead to a complete rewrite of the application.
We Want to Use All the Libraries
Composer and Packagist brought with them a huge proliferation of PHP libraries, frameworks, components, and packages. It’s now easier than it ever has been to solve problems in PHP. The wide range of available libraries, installed quickly and simply through Composer, make it easy to use other developer’s code to solve your problems.
Just like the framework, though, using these libraries comes at a cost: if the developer decides to abandon them, you’re faced with no choice but eventually replacing it with something else. If you’ve littered your code base with usages of this library, you now have a time consuming process to run through to upgrade your application to use some other library.
And of course, later, we’ll describe how to gracefully handle this problem in a way that involves minimal rewriting, and hopefully minimal bugs if you’ve written a good test suite to verify your success.
Writing Good Code
Writing good code is hard.
The goal of this book is to solve these problems with bad code. We’ll discuss how architecture plays a key role in solving both causing and solving these problems, and then discuss ways in which to correct or at least mitigate these issues, such that we can build strong, stable, and long-lasting software applications.
What is Architecture?
Whether you know it or not, any piece of software you have ever written has followed some sort of architecture, even if it were just your own. Software architecture is the structure that defines the flow of information through a software system. It is a set of decisions made about how software is organized and operates in order to meet the goals of that software.
Architecture can apply to the application as a whole, or might only apply to individual pieces of the application. Maybe you follow one architectural pattern on the client side of the application and a completely different one on the server side of the application. The means by which your client side application and server side application communicate can follow an architectural pattern as well.
What does Architecture Look Like?
Some examples of architecture include how you organize your files, whether you intermix your PHP code and your HTML code, or whether your code is procedural in nature or object-oriented. The architecture might be whether you interface with the database directly, or abstract it away in your code so that you’re moving across several layers to get at the data. Or maybe you’re interacting with an API on the front-end, using something like Angular JS, and your back-end PHP is simply an API layer that gives data to the front-end application.
All of these features of your application determine the architecture. Architecture is simply a set of attributes about how your code is laid out, organized, and how it interacts with other pieces or layers of the code.
Since describing your architecture can be pretty verbose, architectural patterns can also be named, and often are when they are shared and described within the industry. Following commonly defined architecture, rather than coming up with something on your own, makes your code easily readable and understandable by other developers, especially when that architecture is well documented, and makes it quite easy to describe your architecture.
For example, you might be able to just say you use the MVC architecture on the front-end and an API web service on the back-end. Developers familiar with these ideas should understand you pretty quickly.
Layers of Software
Often when people talk of software architecture, they mention layers of software. Layers, in object oriented programming, are groups of specific classes that perform the similar functions. Usually, layers are broken up by concerns, which is a particular set of function or information. Our concerns can be many, depending on the application, but can include: database interaction, business rules, API interactions, or the view, or UI.
In good software architecture, these concerns are each broken out out into layers. Each layer is a separate set of code that should, in a perfect world, loosely interact with the other layers. For instance, when a web request comes in, we might pass it off to the control layer that processes the request, which pulls any necessary data from the database layer, and finally presents it to the view layer to render a UI to the user.
In a perfect world, these layers and their interaction is kept fairly separate, and specifically controlled. As you’re about to see, without these layers, software can become pretty messy and hard to maintain.
Examples of Poor Architecture
Before we start discussing how to cleanly build the architecture of your application, let’s first take a look at some code with poor architecture. Analyzing the problems faced by poor architecture can help us in understanding why our good architectural decisions are important and what benefits we might gain from taking better architectural routes.
Dirty, In-line PHP
PHP has an easy entry point: it’s not hard to get up and running, and the internet is littered with code samples. This is good for the language and community as it creates a low barrier for starting. Unfortunately, this has its drawbacks in that most of those code samples found on the web aren’t of the highest caliber, and often lead new developers into writing code that looks similar to this:
<body>
<?php $results = mysql_query(
'SELECT * FROM customers ORDER BY name'
); ?>
<h2>Customers</h2>
<ul>
<?php while ($customer = mysql_fetch_assoc($results)): ?>
<li><?= $customer['name'] ?></li>
<?php endwhile; ?>
</ul>
</body>
You’ll find code just like this plastered all over blogs and tutorials, even on websites touting themselves as a professional resource (no names named). Sadly, there is quite a bit wrong with writing PHP this way, especially for anything but tiny applications.
The mysql_ Functions are Deprecated
Right off the bat, the most glaring issue with this code is the use of the deprecated mysql_ functions. Even worse: the functions are being removed entirely in PHP 7, although you can install them as an extension via other channels.
They have been deprecated for a reason: they are considered unsafe and insecure to use, and suitable alternatives (PDO and the mysqli_ functions) have been created.
Regardless of third party support, this will make upgrading PHP hard or impossible some day. Choosing to use these functions today is choosing heartache tomorrow.
One Layer to Rule Them All
This sample PHP is written in a monolithic style. A monolithic application is one with a single layer comprising everything that application does, with each different concern of the application squished together. These applications are not modular and thus do not provide reusable code, and are often hard to maintain. Remember we discussed that software is often layered, keeping the code responsible for interfacing with the data separate from the code responsible for displaying it to the user. This sample is the complete opposite of a layered approach.
Our sample has some code which retrieves data from a database back-end and a view layer that is responsible for displaying this information to the user, all together as one single layer. We’re literally querying right in the middle of our HTML. This monolithic approach is the biggest problem with this sample, and it is the direct cause of the next two problems.
A Refactoring Nightmare
Refactoring is pretty much out of the window. Consider these scenarios and what we would have to do to accomplish them:
· What if a table name or column name changes? How many different files will we have to update to make that change? What about switching to a library like PDO? What if we want to get data from a different data source, like a RESTful API?
· What if we decided we wanted to start using a templating language, like Twig or Blade? Our database logic is so tightly wound into our HTML that we would have to rewrite the application to get it out so that we can have nice templates to represent the data.
· What if we wanted to provide some kind of standardized way of displaying user names (like Kristopher W. instead of Kristopher Wilson)? How many places might we have displayed a users name that now has to be updated?
Simply put, the ability to refactor code is hampered by our lazy, dirty architecture that throws everything together without any concern for the future.
An Untestable Pile of Mush
This approach results in code that is virtually untestable. Sure, some end-to-end tests, like Behat, will probably allow us to test this, but unit tests are out the window. We can’t test anything in isolation. How do we ensure that we got the expected number of users back? Using a live database, what is the expected number of users? And since we can’t test the $results variable directly, do we have to parse the HTML DOM? And what happens when the DOM changes?
These are going to be some poor, slow, error prone tests.
Testing only works when it can be executed repetitively and quickly. If it takes a long time to run the tests, the developer simply cannot rely on them through the process of development as it would incur too much time waiting to discover if some seemingly minute change broke anything unexpected elsewhere in the application.
Poor Man’s MVC
Usually, the next progression in architecture in the world of PHP development is the adoption of an MVC style architecture, which we’ll talk about in MVC. While this is a big step up from in-line, procedural PHP, it can still have several issues, especially if not implemented correctly. Take this example:
class CustomersController {
public function indexAction() {
$db = Db::getInstance();
$customers = $db->fetchAll(
'SELECT * FROM customers ORDER BY name'
);
return [
'customers' => $customers
];
}
}
<h2>Customers</h2>
<ul>
<?php foreach ($this->customers as $customer): ?>
<li><?= $customer['name'] ?></li>
<?php endforeach; ?>
</ul>
This code looks better. We’re using controllers and views, so we’ve separated the presentation logic from the control logic. The controller’s indexAction() grabs all the customers and then returns them, which gets passed to the view so it can render the data. This should make the code much easier to test and refactor. Except it doesn’t.
Still Hard-coding Queries
This is obvious. We still haven’t solved the problems of having hard-coded queries in layers that don’t concern the database, such as controllers. See the comments above in A Refactoring Nightmare for more details.
Strong Coupling to the Db Class
We’ve moved away from using the deprecated mysql_ functions and instead have abstracted away the database into some Db class, which is good. Except we still suffer the pitfalls of not being able to refactor this code. Our same questions as above in A Refactoring nightmare still apply. We can hardly change anything about our database layer without having to touch a large amount of files that use that database layer to do so.
Still Hard to Test
We’ve made testing much easier now simply by extracting our code out of the HTML. We now have a controller doing all the processing, and then passing that data off to the view. This is much easier to test, as we can simply instantiate CustomersController, call the indexAction() method, and analyze the return value. But how many customers should we expect, and what are their names? Again, we can’t know this unless we go the complicated route of setting up a test database (a known state) before running our tests.
Since we are declaring our Db class right in the indexAction() method, there’s no way to mock that. If there were, we could simply set it up to return a known set of customers, and then validate that the indexAction() properly retrieved them.
Two Very Large Layers
1. This code is hard to test in isolation as it declares it’s database dependency in-line, and thus can’t test without it. We can’t override it. We could override the configuration so we could use a properly staged test database, which is good for integration testing, but unit testing is impossible. We simply can’t test this controller without the database class.
2. We’re still hard coding queries, which ties us to a database and specific database flavor at that.
3. We’re retrieving an instance of the Db class, which tightly couples this implementation to that class. We talk about this in more detail in Coupling, The Enemy, but for now, understand that it makes it very hard to test this controller without bootstrapping our database class as well.
4. If we decide to rewrite our application layer, we lose everything. This is because our data domain is wrapped so tightly into our application services. Let’s say for an instance that we’re using Zend Framework and this is a Zend Framework controller. What happens when we want to switch to Laravel? This would require us to rewrite our entire controllers, but since our data access logic is stored right in the controller, we have to rewrite that, too, especially if we switch to using Eloquent ORM, which ships with Laravel.
Poor Usage of Database Abstraction
Finally, we get smart and abstract away the data source using the Repository design pattern:
class CustomersController {
public function usersAction() {
$repository = new CustomersRepository();
$customers = $repository->getAll();
return [
'customers' => $customers
];
}
}
<h2>Customers</h2>
<ul>
<?php foreach ($this->customers as $customer): ?>
<li><?= $customer['name'] ?></li>
<?php endforeach; ?>
</ul>
This code is much better than our original example, and even better than our second. We’re slowly coalescing to some good application architecture.
Instead of interfacing directly with the database, we’ve abstracted it away into a Repository class. The repository is responsible for understanding our datasource and retrieving and saving data for us. Our controller doesn’t have to know anything about where the data comes from, so we’ve removed the bad, hard-coded queries from the controller. We could easily refactor CustomersRepository to get its data from a different source, but wouldn’t have to touch any code that uses the repository so long as the getAll() method’s signature and return result are still the same.
While this is much better architecture, it still suffers some issues:
Strong Coupling to CustomersRepository
We’re using a concrete instance of the CustomersRepository, which means the controller is still tied to that implementation. For instance, this CustomersRepository probably connects to a database of some sort to retrieve the information. Now our controller is permanently tied to this implementation, unless we refactor it away. If we’re going to change out where or how our data is stored, we’re probably going to write a new class instead of completely changing the existing one. We discuss how to solve this in Dependency Injection.
Continuing Dependency Issues
We’re still declaring our dependency (CustomersRepository) right in our method, which makes it impossible to mock and test the usersAction() method in isolation (remember, we’d have to setup an entire known state in the database for this to work). This might be great for end-to-end testing of our application, but it isn’t so great for unit testing our application.
We’ll also talk about how to solve this in Dependency Injection.
So how Should this Code Look?
It’s pretty easy to pick apart some sample code and explain why it needs improvement, but it’s much harder to simply provide good code samples without going into quite a bit of discussions first. We’re going to solve this exact problem (listing customers) once we get to our Case Study at the end of the book, which will build on concepts we discuss in the next few chapters.
However, just like the days leading up to a holiday, everybody loves a sneak peek. We were actually really close in the last sample, and only had to make a few tweaks to make this some rock solid architecture. This is how we will solve this problem later:
class CustomersController extends AbstractActionController {
protected $customerRepository;
public function __construct(CustomerRepositoryInterface $repository) {
$this->customerRepository = $repository;
}
public function indexAction() {
return [
'users' => $this->customerRepository->getAll()
];
}
}
We’ve solved several problems:
1. We’re no longer tightly coupled to any repository implementation by using an interface. Whatever we get will be required to implement CustomerRepositoryInterface, and that should give us our data. We don’t care what, how, or where.
2. We can easily test now as we can mock the class being used by the controller and make it return a known set of data. Then we can test that the controller properly passes it off to the view.
3. We have nothing in here that should prevent us from ever upgrading to newer versions of PHP or libraries, unless PHP or some library drastically change how they work, which would require a big rewrite regardless of how we wrote our code.
4. Queries? We’re not even using queries. Again: we don’t even know where our data comes from at this layer. If we suddenly need to get data from a different place, no big deal: simply pass us a different object that implements CustomerRepositoryInterface.
If some of this doesn’t make much sense, don’t worry. We’re about to cover it all in-depth in the next chapters.
Costs of Poor Architecture
As we’ve just seen, taking a bad approach when developing your application can lead to several problems. Classically, using a bad architecture can lead to the following common problems, although it entirely depends on how the application was written:
1. Untestable. Poor architecture often results in code that is difficult to test. This especially happens when things are tightly coupled together and cannot be tested in isolation. We’ll talk about this in Coupling, the Enemy. Inability to test can lead to an unstable application.
2. Hard to Refactor. Developers tend to make iterative changes to the software they build as their understanding of and solution to a problem is strengthened. Users often request additional features and changes to existing applications. Both of these instances are known as refactoring, and software written with a poor architecture is hard to refactor, especially without a strong test suite to guarantee nothing breaks. See #1.
3. Impossible to Upgrade. Code architected and written poorly is often very hard to upgrade, either to new versions of PHP, new versions of underlying frameworks and libraries, or switching to new frameworks and libraries entirely. This can cause projects to end up in an impossible upgradeable limbo.
Coupling, The Enemy
The main issue we were dealing with when we looked at various examples of poor architecture is coupling. Coupling is the amount of dependency one component has on another. If one component simply cannot function without another, it is highly coupled. If one component, loosely depends on another and can function without it, it is loosely coupled.
The looser the coupling within your code base, the more flexibility you have in that codebase. With a high amount of coupling, refactoring, such as extending new functionality to existing code, becomes a very dangerous task. With loosely coupled code, it becomes much easier to change this around and swap out solutions as the code using that solution is not fully dependent upon it.
To get a better understanding of coupling, let’s look at two very different examples of highly coupled code.
Spaghetti Coupling
<body>
<?php $users = mysqli_query('SELECT * FROM users'); ?>
<ul>
<?php foreach ($users as $user): ?>
<li><?= $user['name'] ?></li>
<?php endforeach; ?>
</ul>
</body>
This example, which is very similar to our first example of code with poor architecture, has a lot of coupling. Can the application function in any respect without the database? No, we have queries all over the code. It is highly coupled to the database.
Can the application function without a web browser? Well, technically yes, but who wants to read straight HTML? Can we get a list of users without it being formatted in HTML? No, we cannot. Our application is highly coupled to the browser.
OOP Coupling
class UsersController {
public function indexAction() {
$repo = new UserRepository();
$users = $repo->getAll();
return $users;
}
}
In this example, we have a class, called UsersController, that uses another class, called UserRepository to get all the users. This code looks much better than the first example, but it still has a high level of coupling.
Can the UsersController function without the UserRepository? Definitely not, it’s highly coupled to it.
Why is Coupling the Enemy?
So what’s the big deal about all this coupling anyway? Who cares?
People who care about having loosely coupled code are:
1. Developers who refactor their code. Do you always get it right the first time? Do requirements never change on you? We often need to move things around or rework them, but that’s often hard to do when the code you’re reworking is so tightly bound to code in several other places. One little change here, a couple dozen regression bugs there.
2. Developers who like to test their code. Testing code can be an absolute pain if the code is tightly coupled. Often, we want to test just one component of an application at a time, in isolation – unit testing. But that’s impossible when one class requires a dozen other classes to run, and instantiates them itself.
3. Developers who like to reuse their code. Reusing code is great! Writing the same code twice sucks. Reusing one piece of code is absolutely impossible when it is tightly coupled to the rest of your application. You can’t just copy the class out and drop it in another project without either hacking away it’s coupling, or bringing everything else with it. For shame.
Simply put, coupling is the enemy of developers everywhere as it makes their future lives incredibly difficult. Don’t screw over your future self.
How do we Reduce Coupling?
There are quite a few ways we can reduce the amount of coupling within our codebase, but we’ll cover four basic, easy solutions:
1. Have less dependencies. This sounds like a no brainer. Having less dependencies reduces the amount of coupling in your code by reducing the amount of things to couple to. This does not mean, however, that we need to stop using dependencies. By making sure our classes and method are short, and only have one purpose, and by breaking out complex routines into several classes and methods, we can reduce the amount of dependencies each class itself needs, which makes it much easier to refactor classes in isolation.
2. Use Dependency Injection. We’ll cover this in the next chapter. Dependency injection provides us with a means with move the control of dependencies outside of our class and giving it to a third party.
3. Use Interfaces, not Concrete Classes. As much as possible, we want to couple ourselves to interfaces, which provide a sort of contract of what to expect. Used together with with dependency injection, we can write classes that know nothing about our dependencies, only that they request a specific format for the dependency, and let something else provide it. We’ll also cover this in the next chapter.
4. Use Adapters. Instead of coupling to something, instead couple to an adapter, which takes some third party code and transforms it into what we’d expect to to look and behave like. Combine this with #2 and #3 above, and we can safely use third party code without tightly coupling to it. We’ll cover this in Abstracting with Adapters.