Speed - The PHP Project Guide (2014)

The PHP Project Guide (2014)

13. Speed

At times you may find that your website is running slow, particularly when running queries that return large amounts of data or are inefficiently constructed. In this chapter we’ll take a look at some speed considerations, including what you shouldn’t be doing when writing code, particularly in terms of your database, and how you can start to speed up your website.

13.1 A method for testing your page server load time

Your server side page load time is the time passed since the code starts to be interpreted to when it was processed fully. This means you can store the time at the very start of the code and subtract this from the time at the very bottom of your code. Remember, this means the very first lines of your code. Add the following line:

1 $startTime = microtime(true);

This will get the current timestamp represented in microseconds. We pass true to return this value as a floating point integer (e.g. 1.375).

To finish, all you’ll need to do is a simple calculation at the very bottom of your code which will output the result:

1 echo 'Page loaded in ' . microtime(true) - $startTime . ' seconds';

Run your page and you’ll see how long your page took to process server side. Remember, this doesn’t include the rendering time of your HTML, CSS or JavaScript, that’s different. You’ll need to use browser tools to monitor the page load time of any requests from your page. We’ll discuss these next.

13.2 HTML, CSS, JavaScript and other HTTP Requests

Some would argue that the size of a CSS or JavaScript file doesn’t matter too much, as the contents of the file is almost always cached by the browser being used to access it. This is true, however the initial load time of an unnecessarily large file doesn’t make sense when steps can be taken to dramatically reduce the size of it, through carefully writing code, minifying the contents of these files and serving them compressed. Each of these steps can reduce a file significantly and can noticeably reduce your page load time. Considerations also need to be taken into account when serving content to mobile devices with poor connection speeds too. We also need to consider the amount of markup on a page, as this needs to be downloaded. Even if you have caching mechanisms in place, this won’t have any effect on the time it takes to re-download the markup every time a page is requested.

Reducing HTML/CSS/JavaScript file size

We’ll talk about CSS and JavaScript first, and then talk about the quality of HTML making a difference. CSS can be poorly written, resulting in duplicate styles, unnecessarily long selectors and/or chained selectors, not making use of shorthand style rules and including style rules for specific browsers. The list goes on.

Unfortunately, writing clean CSS comes with experience, but there are some steps you can take to push you in the right direction. There are various standards written on CSS that are good to follow as part of a project. You might also find using

There are a variety of plugins available in different browsers that allow you to monitor the use of CSS within pages you’re viewing. I won’t mention a specific plugin, as there are a few available through different browsers. Search around and find something that suits you.

You can also run your CSS through the W3C CSS validator. This will also pick up on any invalid CSS and let you know. Probably the best way to deal with CSS is to think about each selector as you write, and the process of writing good styles will eventually become natural. For example, let’s say you’re targeting an element, .name within an unordered list housed in an overall person container, .person. You could do:

1 .person ul li .name { }

Or, you could simply write:

1 .person .name {}

This of course, is only if you don’t need to be too specific as to which elements you’re targeting. Spending time going through your entire stylesheet and checking for small amends you can make like this one can save you a lot of bytes.

And, the same goes for JavaScript. Your code can be condensed in such a way that you reduce the amount of code you write. This is slightly trickier, and you generally need to know a lot more about JavaScript to be able to refactor it in this way.

Finally, you can minify CSS and JavaScript very quickly and easily, which condenses it into an almost unreadable state, usually based on a single line with variables, functions and more replaced with smaller identifiers. Minification is a quick and easy way to reduce the file size of your CSS and JavaScript before you place it within a production environment. Be aware that on occasion, minifying your code can result in problems with its functionality. Good minifiers provide levels of magnification that can be controlled, so you can specify how deep the minification should be. The last thing you should do is simply minify and go live, ensure every page is tested thoroughly first.

13.3 Loading resources from a CDN

A CDN (Content Delivery Network) is an infrastructure built to serve cached content from a location nearest to the person requesting the data. CDNs usually provide different levels of caching to suit your needs.

Let’s take jQuery as an example. You could upload the latest version of jQuery to your server, include it and it would be downloaded to a user’s computer, and cached by the browser for the next time it’s requested. Now think what would happen if you serve the file from a CDN, such a Google’s or jQuery’s own. The file path would look something like http://code.jquery.com/jquery-1.8.2.min.js or http://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js. If a user has visited another website that uses the same version of jQuery (extremely likely) and their browser has already downloaded and cached it, it won’t need to be downloaded again when they visit your website. This will reduce the need for an additional HTTP request and therefore you’ll be closer to a faster website. Delivering content from someone else’s CDN isn’t the only way you can serve files. Files specific to your website such as logos, any uploaded images, video files and other resources can take advantage of a CDN. It’s more than likely that the delivery network will be faster than your server at delivering files efficiently and therefore will help to speed up requests. This is particularly useful for large downloads or streaming video content.

13.4 Rendering time

We’ll only touch on this because we’re mainly looking at server side speed. This is still important because what you output with PHP will affect rendering speed within the browser. Let’s say you were outputting forum posts. Carefully choosing the way to structure the markup within a loop will vastly reduce the amount of HTML you output, increasing the speed in which your PHP is processed, and also the rendering speed of your browser.

Bear in mind to also avoid inline styling, which is bad. Also avoid inline JavaScript, and ensure you’ve set up your page so your browser can cache external CSS and JavaScript effectively. Loading resources like CSS and JavaScript inline (or lumped on the page) means they won’t be able to be cached, resulting it larger download sizes and effectively slower load times.

If you need more help with this, use a well-respected tool like Google Page Speed.

13.5 Limit queries

The first thing to analyse is how many database queries you’re running per page, relative to how much data you’re returning. If you’re querying in loops, this is bad. If you’re querying more than once to retrieve data from two or more tables where data is linked, this is also bad. The more queries you’re running per page, the slower your page will be. This is because your website will send a request to the database, wait for a response and will then read what has been sent back. Each request generates network packets that will slow down your page. The answer to this problem is to get better at SQL in terms of joining data from multiple tables within the query so fewer requests are required.

Let’s say you’re returning a list of posts by a user. The table defines the user by a user_id field, and the username isn’t present as this is stored in the users table, not the posts table. The bad way to do this would be to loop through the posts, and then place a separate query within the loop to retrieve the username based on the user_id. You’d be generating so many queries, if not now, then into the future when a result set grows. You’d be able to easily return this data in a different manner. Let’s take a look at an example. Our tables are posts, and users.

1 SELECT `posts`.`title`, `posts`.`timestamp` `users`.`username`

2 FROM `posts`

3 JOIN `users`

4 ON `posts`.`user_id` = `users`.`user_id`

5 ORDER BY `posts`.`timestamp`

6 DESC

7 LIMIT 10

So, we’re returning all the data we need, plus an additional field, username, which comes from the users table, being matched by user_id. This is only one way of joining, but we can also use joins like LEFT JOIN and RIGHT JOIN depending on how your data may sit in either table. In this case, when a post is made, we know that a user will exist in the users table, which will be the user who made the post.

13.6 Optimise queries

You may also find that particular queries are performing slowly. Basic queries pulling data from a couple of tables shouldn’t be a problem, but that also depends on how much data you’re gathering. You may be familiar with the SELECT * syntax, and may be currently doing this to pull data from a table. Selecting all field data from a table may not be necessary and therefore doing this may be giving you unused fields, thereby slowing down your query unnecessarily. This can be particularly slow if you’re joining and then subsequently pulling all data from the joining table, when in fact you may only need to pull a username, like in the example we’ve just looked at. If you only need to return the username of a user, only select that data. Selecting all other data within a table isn’t going to help you in terms of speed, and also gives a clear indication as to what data you’re retrieving when you look at the query.

Defining which fields you need to pull isn’t the only way you can optimise a query. You may also be joining in such a way that your queries perform slower or using slow MySQL functions within your query.

13.7 MySQL or PHP functions?

You can easily perform functions within queries to concatenate, format and organise data returned. However, you can also do this within your application layer. You may find that a particular function has been optimised well within MySQL than it has in PHP, and you can measure the speed difference to determine what’s better to use. It’ll be a lot better to test this yourself instead of relying on others to tell you what’s better optimised. This is due to the fact your data set or processing may be different or their advice on a particular method may be outdated or inaccurate. Using the method stated at the start of this chapter should be enough for testing this with large pages and large data sets.

Normally, this kind of thing doesn’t affect speed noticeably, but that depends on how extensively it’s being used. If something has been optimised to perform better within MySQL and you need to work with a lot of data, do it there. There are some cases when pulling data from a query and then working with it in PHP can have its advantages. If in doubt, runs some small tests to see what performs better.

13.8 Searching a database

It’s common to need to search a database for data, particularly if your website relies on data to be pulled back from a user search. Using the LIKE keyword can have a drastic effect on speed if not used properly, or used in the wrong situation. If you have a small collection of rows in a table and need to allow a user to search this, LIKE may be suitable as long as the data isn’t going to grow and grow in volume. If you do end up with a lot of data here, your website speed is going to decrease as time goes on and more rows need to be searched. For large sets of data, performance time can be terrible. As a real life example, searching for a single keyword within 100,000 forum topics returned an average of 20 seconds. You absolutely can’t afford for a page to take event 3 seconds to load, let a loan 20 seconds.

So, is there an alternative solution to this? Well, yes, there a few things that can be done to reduce the amount of time for queries like this. We’ll look at two. The first we’ll look at is basic, and the second slightly more advanced.

Add a fulltext index

A fulltext index can be used on a table with a MyISAM engine type and can dramatically speed up searching for words within a table. You’ll need to add a FULLTEXT index type on the field the search will be performed on. A query may look something like:

1 $sql = "SELECT `id`, `headline` FROM articles WHERE MATCH (headline) AGAINST ('{$\

2 keyword}')";

Use a search server

Two popular search servers are Lucene and Sphinx. Lucene is slightly more complicated to get started with so I’d recommend Sphinx if you’re new to this. This basically takes over the search process and returns relevant information from the search term provided. Because MySQL isn’t greatly equipped to handle text search, a solution like this is perfect as the process is passed on to this type of software. Typically, the data will be ingested into the software and will then be searched.

13.9 Looping

Looping can cause massive problems as we’ve already seen within queries. Using PHP functions within queries that have poor performance can cause speed issues. You may also find speed issues if you’re looping through very large sets of data returned from a query or within a file, e.g. a large xml feed.

You may have also come across the problem of looping infinitely. This may be easy to change, as you can alter the condition within a loop. If you’re using a variable in the loop’s condition that is likely to change, ask yourself whether it could cause an infinite or large loop at some stage.

13.10 File reading

Reading files can pose a problem if using out of date methods or reading and processing large files. Reading large files will always be a problem, but you can avoid reading a large file at once by reading it in chunks. There are several methods you can use to do this, depending on how much control you have over the data.

Also ensuring you’re not using deprecated methods will ensure optimum speed. For example, fopen should not be used, but instead file_get_contents due to speed changes. The latter function is optimised per operating system, so you know you’ll get the fastest writing available to your server. This is just one example, but it’s always best to check the PHP manual to avoid using slower, deprecated functions.

You can also limit the amount of data you read in from a specific file. So, reading a 2mb file doesn’t mean you have to access the entire file if you don’t need to. You’d simply specify the maxlen argument within the file_get_contents function. Simple decisions like this add up in terms of overall speed.

Saying this, it’s doubtful you’d need to read from a file where you could store the data in a database, so think carefully about where you’re storing and retrieving data from.

13.11 Limiting user defined values

If your website allows users to change values such as how many posts are displayed per page, this needs to be taken into consideration with speed. If a user can change this value, then they can set this value to anything. So, in short, any data that can be defined by your user needs to be limited. To do this, a simple IF statement or ternary operation when injecting this value somewhere will allow you to set the defined value to a default if the value defined is too high. Ensuring these checks are in place for anywhere data can be manipulated will mean you can avoid excess bursts of data being processed and affecting your server speed.

13.12 Caching

For high load websites where many visitors are accessing the same information again and again, it doesn’t make sense to serve the same content by making a request to your database. This is where caching comes in. Caching allows you to store data that would otherwise have to be generated by the server and therefore reducing on load to the database. This of course wouldn’t be cached forever in all cases, and your cached data would have to be refreshed at some point to provide the updated version of the data. Caching is particularly useful for data that will be accessed time and time again by everyone, or for data that just doesn’t change very regularly.

We’ve already briefly discussed CDN caching, but there are ways to cache directly with PHP. We’ll take a look at some of the options and a brief overview of each one, as your requirements for caching will certainly vary depending on your website.

PHP Accelerators

There are several accelerators that can be used in conjunction with PHP to cache data either file side or in memory. This includes caching entire queries (although MySQL has its own functionality to cache queries too) and can dramatically speed up access to data that is requested time and again by lots of people - particularly if the data is unlikely to change for a long time, in which case it can be cached for a large period of time. Some website data can be cached forever, meaning that it will always be read from a cache unless flushed. Flushing data will simply generate content and place this back in the cache. If, for example, your menu is generated from content within a database table, you could probably cache this for a long time, or forever. Only when you make a change to the menu would you need to flush the cache and pull through the new changes.

Memcached

A popular and somewhat easy to set up caching solution for PHP is memcached. Although not exclusively designed for PHP, memcached is an effective solution to cache in memory and is widely documented with many examples. Setting up something like memcached isn’t too tricky, and with the documentation and perhaps a little browsing around for help, you could be up and running with memcached with a couple of hours. There are then several techniques that can be used to cache different aspects of your website, including caching data retrieved from queries (you’re not actually caching the query, you’re caching what is returned from it), PHP variables and of course, any output like HTML. If you’re new to memcached, it’s best to try memcached in isolation instead of trying to directly tie it in with any code you already have. Trying some quick examples will let you slowly get to grips with using it and will help you start to understand how to check what’s cached and what isn’t. As mentioned earlier, memcached isn’t exclusive to PHP and can be used with other languages.

eAccelerator

Another option is to use eAccelerator. eAccelerator is a general purpose optimiser that can be very easily installed and will get to work straight away at caching files either on disk or in memory to speed your pages up ‘1-10 times’ as claimed by eAccelerator. There is a certain lack of control with eAccelerator, which means it’s not ideal if you need very specific caching rules.

APC

Similarly, APC (Alternative PHP Cache) can be used, and will be bundled with PHP 6 (not yet released at time of writing). APC is very easy to use and works similarly to memcached whereby you need to define what to cache and when to retrieve it. APC is fully documented with examples in the PHP documentation, so this can be referred to while developing.

13.13 Caching AJAX requests

AJAX requests to a server should be cached. Remember, an AJAX request is simply an asynchronous call to a file using JavaScript - meaning the file is access without a page load required, and these should be cached so the browser can retrieve the file from storage instead of having to re-download the content again. If you’re using a JavaScript library like jQuery, this happens by default when using the ajax method, but when caching isn’t required you can turn it off. Turning off caching will append a timestamp to the end of the file being requested within the query string. It may look something like:

articles.xml?_=1359464930724

This means the file path will essentially change with every request and therefore the browser will almost always re-download the content. This is only required for when you’re requesting files that contain real time information and will change frequently.