Web Programming - Zend PHP 5 Certification Study Guide (2014)

Zend PHP 5 Certification Study Guide (2014)

Web Programming

Although you will find it used in scenarios as diverse as quality control, point-of-sale systems and even jukebokes, PHP was designed primarily as a web development language, and that remains its most common use today.

In this chapter, we focus on the features of PHP that make it such a great choice for developing web applications, as well as some web-related topics that you should be familiar with in order to take the exam.

Anatomy of a Web Page

Most people think of a web page as nothing more than a collection of HTML code. This is fine if you happen to be a web designer, but as a PHP developer, your knowledge must run much deeper if you want to take full advantage of what the Web has to offer.

From the point of view of the web server, the generation of a document starts with an HTTP request, in which the client requests access to a resource using one of a short list of methods. The client can also send a data payload (called request) along with its request. For example, if you are posting an HTTP form, the payload could consist of the form data, while if you are uploading a file, the payload would consist of the file itself.

Once a request is received, the server decodes the data it has received and passes it on to the PHP interpreter (clearly, we are assuming that the request was made for a PHP script—otherwise, the server can choose a different handler or, in the case of static resources, such as images, output them directly).

Upon output, the server first writes a set of response headers to the clients. These can contain information useful to the client, such as the type of content being returned, or its encoding, as well as data needed to maintain the client and the server in a stateful exchange (we’ll explain this later).

Forms and URLs

Most often, your script will interact with clients using one of two HTTP methods: GET and POST. From a technical perspective, the main difference between these two methods is that POST allows the client to send along a data payload, while GET only allows you to send data as part of the query string, which is part of the URL itself.

Of course, you could still submit a form using GET—but you would be somewhat limited in the size and type of data you could send. For example, you can only upload files using POST, and almost all browsers implement limitations on the length of the query string that confine the amount of data you can send with a GET operation.

By the standards of HTTP, POST should always be used for requests that change data in your application and GET requests should only be used for requests that query your data (such as searches, sorts, or page retrieval). Web indexers and browsers respect this and will not resubmit a POST request without explicit permission from the user so as to not cause duplicate information.

From an HTML perspective, the difference between GET and POST is limited to the method attribute of the <form> element:

Listing 5.1: An HTML form

<!--Form submitted with GET-->

<form action="index.php" method="GET">

List: <input type="text" name="list" /><br />

Order by:

<select name="orderby">

<option value="name">Name</option>

<option value="city">City</option>

<option value="zip">ZIP Code</option>

</select><br />

Sort order:

<select name="direction">

<option value="asc">Ascending</option>

<option value="desc">Descending</option>

</select>

</form>

<!--Form submitted with POST-->

<form action="index.php" method="POST">

<input type="hidden" name="login" value="1" />

<input type="text" name="user" />

<input type="password" name="pass" />

</form>

GET and URLs

When a form is submitted using the GET method, its values are encoded directly in the query string portion of the URL. For example, if you submit the form above by entering user in the List box and choosing to sort by Name in Ascending order, the browser will call up our index.php script with the following URL:

http://example.org/index.php?list=user&orderby=name&direction=asc

As you can see, the data has been encoded and appended to the end of the URL for our script. In order to access the data, we must now use the $_GET superglobal array. Each argument is accessible through an array key of the same name as the corresponding HTML element:

echo $_GET['list'];

You can create arrays by using array notation…

http://example.org/index.php?list=user&order[by]=column&order[dir]=asc

…and then access them using the following syntax:

echo $_GET['order']['by'];

echo $_GET['order']['dir'];

Note that there is nothing that stops you from creating URLs that already contain query data—there is no special trick to it, except that the data must be encoded using a particular mechanism that, in PHP, is provided by the urlencode() function:

$data = "Max & Ruby";

echo "http://www.phparch.com/index.php?name="

. urlencode ($data);

The PHP interpreter will automatically decode all incoming data for you, so there is no need to execute urldecode() on anything extracted from $_GET.

Using POST

When sending the form we introduced above with the method attribute set to POST, the data is accessible using the $_POST superglobal array. Just like $_GET, $_POST contains one array element named after each input name.

if ($_POST['login']) {

if ($_POST['user'] == "admin" &&

$_POST['pass'] == "secretpassword") {

// Handle login

}

}

In this example, we first check that the submit button was clicked, then we validate that the user input is correct. Similar to GET input, we can again use array notation:

Listing 5.2: An HTML form with array notation

<form method="post">

<p>

Please choose all languages you currently know or

would like to learn in the next 12 months.

</p>

<p>

<label>

<input type="checkbox"

name="languages[]"

value="PHP" />

PHP

</label>

<label>

<input type="checkbox"

name="languages[]"

value="Perl" />

Perl

</label>

<label>

<input type="checkbox"

name="languages[]"

value="Ruby" />

Ruby

</label>

<br />

<input type="submit" value="Send" name="poll" />

</p>

</form>

The form above has three checkboxes, all named languages[]; these will all be added individually to an array called languages in the $_POST superglobal array—just like when you use an empty key (e.g. $array[] = "foo") to append a new element to an existing array in PHP. Once inside your script, you will be able to access these values as follows:

Listing 5.3: Handling POST input

foreach ($_POST['languages'] as $language) {

switch ($language) {

case 'PHP' :

echo "PHP? Awesome! <br />";

break;

case 'Perl' :

echo "Perl? Ew. Just Ew. <br />";

break;

case 'Ruby' :

echo "Ruby? Can you say... 'bandwagon?' <br />";

break;

default:

echo "Unknown language!";

}

}

When You Don’t Know How Data Is Sent

If you need to write a script that is supposed to work just as well with both GET and POST requests, you can use the $_REQUEST superglobal array. This array is filled in using data from different sources in an order specified by a setting in your php.ini file (usually, EGPCS, meaning Environment,Get, Post, Cookie and Built-in variableS. Note that $_REQUEST only contains cookie, GET, and POST information).

The problem with using this approach is that, technically, you don’t know where the data comes from. This is a potentially major security issue that you should be fully aware of. This problem is discussed in more detail in the Security chapter.

Managing File Uploads

File uploads are an important feature for many Web applications; improperly handled, they are also extremely dangerous—imagine how much damage allowing an arbitrary file to be uploaded to a sensitive location on your server’s hard drive could be!

A file can be uploaded through a “multi-part” HTTP POST transaction. From the perspective of building your file upload form, this simply means that you need to declare it in a slightly different way:

Listing 5.4: An HTML form with file upload

<form enctype="multipart/form-data" action="index.php"

method="post">

<input type="hidden" name="MAX_FILE_SIZE"

value="50000" />

<input name="filedata" type="file" />

<input type="submit" value="Send file" />

</form>

As you can see, the MAX_FILE_SIZE value is used to define the maximum file size allowed (in this case, 50,000 bytes); note, however, that this restriction is almost entirely meaningless, since it sits on the client side. Any moderately crafty attacker will be able to set this parameter to an arbitrary value: you can’t count on it to prevent a malicious actor from overwhelming your system by sending files so large that they deplete its resources.

You can limit the amount of data uploaded by a POST operation by modifying a number of configuration directives, such as post_max_size, max_input_time and upload_max_filesize.

Once a file is uploaded to the server, PHP stores it in a temporary location and makes it available to the script that was called by the POST transaction (index.php in the example above). It is up to the script to move the file to a safe location if it so chooses—the temporary copy is automatically destroyed when the script ends.

Inside your script, uploaded files will appear in the $_FILES superglobal array. Each element of this array will have a key corresponding to the name of the HTML element that uploaded a file (filedata, in this case). The element will, itself, be an array with the following elements:

$_FILES elements

Description

name

The original name of the file

type

The MIME type of the file provided by the browser

size

The size (in bytes) of the file

tmp_name

The name of the file’s temporary location

error

The error code associated with this file. A value of UPLOAD_ERR_OK indicates a successful transfer, while any other error indicates that something went wrong (for example, the file was bigger than the maximum allowed size).

The real problem with file uploads is that most—but not all—of the information that ends up in $_FILES can be spoofed by submitting malicious information as part of the HTTP transaction. PHP provides some facilities that allow you to determine whether a file upload is legitimate. One of them is checking that the error element of your file upload information array is set to UPLOAD_ERR_OK. You should also check that size is not zero and that tmp_name is not set to none.

Finally, you can use is_uploaded_file() to determine that a would-be hacker hasn’t somehow managed to trick PHP into building a temporary file name that, in reality, points to a different location. Once you verify that the file is a legitimate upload, call move_uploaded_file() to move it to a permanent location. Note that a call to the latter function also checks whether the source file is a valid upload file, so there is no need to call is_uploaded_file() first.

One of the most common mistakes that developers make when dealing with uploaded files is using the name element of the file data array as the destination when moving it from its temporary location. Because this piece of information is passed by the client, doing so opens up a potentially catastrophic security problem in your code. You should, instead, either generate your own file names, or make sure that you filter the input data properly before using it (this is discussed in greater detail in the Security chapter).

GET or POST?

PHP makes it very easy to handle data sent using either POST or GET. However, this doesn’t mean that you should choose one or the other at random.

From a design perspective, a POST transaction indicates that you intend to modify data (i.e., you are sending information over to the server). A GET transaction, on the other hand, indicates that you intend to retrieve data. These guidelines are routinely ignored by most web developers—much to the detriment of proper programming techniques. Even from a practical perspective, however, you will have to use POST in some circumstances. For example:

· You need your data to be transparently encoded using an arbitrary character set

· You need to send a multi-part form—for example, one that contains a file

· You are sending large amounts of data

HTTP Headers

As we mentioned at the beginning of the chapter, the server responds to an HTTP request by first sending a set of response headers that contain various tidbits of information about the data that is to follow, as well as other details of the transaction. These are simple strings in the form key: value, terminated by a newline character. The headers are separated from the content by an extra newline.

Although PHP and your web server will automatically take care of sending out a perfectly valid set of response headers, there are times when you will want to either overwrite the standard headers or provide new ones of your own.

This is an extremely easy process: all you need to do is call the header() function and provide it with a properly formed header. The only real catch (besides the fact that you should only output valid headers) is that header() must be called before any other output, including all HTML data and PHP output and any whitespace characters outside of PHP tags. If you fail to abide by this rule, two things will happen: your header will have no effect, and PHP may output an error.

Note that you may be able to output a header even after you have output some data if output buffering is on. Doing so, however, puts your code at the mercy of what is essentially a transparent feature that can be turned on and off at any time and is, therefore, a bad coding practice.

Redirection

The most common use of headers is to redirect the user to another page. To do this, we use the Location header:

header("Location: http://phparch.com");

Note that the header redirection method shown here merely requests that the client stop loading the current page and go elsewhere—it is up to the client to actually do so. To be safe, header redirects should be followed by a call to exit, to ensure that subsequent portions of your script are not called unexpectedly:

header("Location: http://phparch.com");

exit;

To stop browsers from emitting “Do you wish to re-post this form” messages when refreshing the page after submitting a form, you can use a header redirection to forward the user to the results page after processing the form.

Compression

HTTP supports the transparent compression and decompression of data in transit during a transaction using the gzip algorithm. Compression will make a considerable impact on bandwidth usage—as much as a 90% decrease in file size. However, because it is performed on the fly, it uses up more resources than a typical request.

The level of compression is configurable, with 1 being the least compression (thus requiring the least amount of CPU usage) and 9 being the most compression (and highest CPU usage). The default is 6.

Turning on compression for any given page is easy, and because the browser’s Accept headers are taken into account, the page is automatically compressed for only those users whose browsers can handle the decompression process:

ob_start("ob_gzhandler");

Placing this line of code at the top of a page will invoke PHP’s output buffering mechanism, and cause it to transparently compress the script’s output.

You can also enable compression on a site-wide basis by changing a few configuration directives in your php.ini file:

zlib.output_compression = on

zlib.output_compression_level = 9

Notice how this approach lets you set the compression level. Since these settings can be turned on and off without changing your code, this is the best way of implementing compression within your application.

Client Side Caching

By default, most browsers will attempt to cache as much of the content they download as possible. This is done both in an effort to save time for the user, and as a way to reduce bandwidth usage on both ends of a transaction.

Caching is not always desirable, however, and it is sometimes necessary to instruct a browser on how you want to cache the output of your application.

Cache manipulation is considered something of a black art, because all browsers have quirks in the way they handle the instructions sent them by the server. Here’s an example:

header("Cache-Control: no-cache, must-revalidate");

header("Expires: Thu, 31 May 1984 04:35:00 GMT");

This set of headers tells the browser (or other proxies) not to cache the item at all, by setting a cache expiration date in the past. Sometimes, however, you might want to tell a browser to cache something for a finite length of time. For example, a PDF file generated on the fly may only contain “fresh” information for a fixed period of time, after which it must be reloaded. The following instruction tells the browser to keep the page in its cache for 30 days:

// 30 Days from now

$date = gmdate("D, j M Y H:i:s", time() + 2592000);

header("Expires: " . $date . " UTC");

header("Cache-Control: Public");

header("Pragma: Public");

Cookies

Cookies allow your applications to store a small amount of textual data (typically, 4-6kB) on a web client. There are a number of possible uses for cookies, although their most common one is maintaining session state (explained in the next section). Cookies are typically set by the server using a response header, and subsequently made available by the client as a request header.

You should not think of cookies as a secure storage mechanism. Although you can transmit a cookie so that it is exchanged only when an HTTP transaction takes place securely (e.g., under HTTPS), you have no control over what happens to the cookie data while it’s sitting at the client’s side—or even whether the client will accept your cookie at all (most browsers allow their users to disable cookies). Therefore, cookies should always be treated as “tainted” until proven otherwise—a concept that we’ll examine in the Security chapter.

To set a cookie on the client, use the setcookie() function:

setcookie("hide_menu", "1");

This simple function call sets a cookie called “hide_menu” to a value of 1 for the remainder of the user’s browser session, at which time it is automatically deleted.

Should you wish to make a cookie persist between browser sessions, you will need to provide an expiration date. Expiration dates are provided to setcookie() in the UNIX timestamp format (the number of seconds that have passed since January 1, 1970). Remember that users or their browser settings can remove a cookie at any time and therefore it is unwise to rely on expiration dates too much.

setcookie("hide_menu", "1", time() + 86400);

This will instruct the browser to (try to) hang on to the cookie for a day.

There are three more arguments you can pass to setcookie(). They are, in order:

Argument

Description

path

Allows you to specify a path (relative to your website’s root) where the cookie will be accessible; the browser will only send a cookie to pages within this path.

domain

Allows you to limit access to the cookie to pages within a specific domain or hostname; note that you cannot set this value to a domain other than the e of the page setting the cookie (e.g., the host www.phparch.com can set a cookie for hades.phparch.com, but not for www.microsoft.com).

secure

This requests that the browser only send this cookie as part of its request headers when communicating under HTTPS.

Accessing Cookie Data

Cookie data is usually sent to the server using a single request header. The PHP interpreter takes care of automatically separating the individual cookies from the header and places them in the $_COOKIE superglobal array:

if ($_COOKIE['hide_menu'] == 1) {

// hide menu

}

Cookie values must be scalar; of course, you can create arrays using the same array notation used for $_GET and $_POST:

setcookie("test_cookie[0]", "foo");

setcookie("test_cookie[1]", "bar");

setcookie("test_cookie[2]", "bar");

At the next request, $_COOKIE['test_cookie'] will automatically contain an array. You should, however, keep in mind that the amount of storage available is severely limited; therefore, you should keep the amount of data you store in cookies to a minimum, and use sessions instead.

Remember that setting cookies is a two-stage process: first, you send the cookie to the client, and then the client sends it back to you at the next request. Therefore, the $_COOKIE array will not be populated with new information until the next request comes along.

There is no way to “delete” a cookie, primarily because you really have no control over how cookies are stored and managed on the client side. You can, however, call setcookie() with an empty string and a negative timestamp, which will effectively empty the cookie; in most cases, the browser will remove it:

setcookie("hide_menu", false, -3600);

Sessions

HTTP is a stateless protocol: the web server does not know (or care) whether two requests come from the same user; each request is handled without regard to the context in which it happens. Sessions are used to create a measure of state between requests—even when there is a large time interval between them.

Sessions are maintained by passing a unique session identifier between requests—typically in a cookie, although it can also be passed in forms and GET query arguments. PHP handles sessions transparently through a combination of cookies and URL rewriting, when session.use_trans_sid is turned on in php.ini (it is off by default in PHP 5), by generating a unique session ID and using it track a local data store (by default, a file in the system’s temporary directory) where session data is saved at the end of every request.

Using session.use_trans_sid to embed the session ID in the URL is a security risk. Users could share their session ID by sending the URL to a third-party, who could then hijack their session. Setting session.use_only_cookies will also ensure that only cookie-based sessions are used. You should always use cookie-based sessions

Sessions are started in one of two ways. You can either set PHP to start a new session automatically whenever a request is received by changing the session.auto_start configuration setting in your php.ini file, or you can explicitly call session_start() at the beginning of each script. Both approaches have their advantages and drawbacks. In particular, when sessions are started automatically, you obviously do not have to include a call to session_start() in every script. However, the session is started before your scripts are executed; this denies you the opportunity to load your classes before your session data is retrieved, and makes storing objects in the session impossible.

In addition, session_start() must be called before any output is sent to the browser, because it will try to set a cookie by sending a response header.

In the interest of security, it is a good idea to follow your call to session_start() with a call to session_regenerate_id() whenever you change a user’s privileges to prevent “session fixation” attacks. We explain this problem in greater detail in the Security chapter.

With PHP 5.3, session_register(), session_unregister() and session_is_registered() were all marked as deprecated.

Accessing Session Data

Once the session has been started, you can access its data in the $_SESSION superglobal array:

Listing 5.5: Reading session data

// Set a session variable

$_SESSION['hide_menu'] = true;

// From here on, we can access hide_menu in $_SESSION

if ($_SESSION['hide_menu']) {

// Hide menu

}

Session Handlers

Sessions are stored on disk by default, using PHP’s serialize() and unserialize() behavior to encode and decode the data.

However, PHP has the ability to change the session handler. The session handler is responsible for all session data I/O—meaning you can change both the encoding/decoding and the storage mechanism.

You can set the save handler either by changing the session.save_handler setting in PHP’s INI file, or by using the session_set_save_handler() function.

The default value for session.save_handler is files, but other handlers can be provided by extensions. For example, the memcache and memcached extensions both provide a session handler. Of course, you would first need to install and configure a memcached server.

To use these, simply modify your INI file:

Listing 5.6: Session settings for memcache

; memcached

session.save_handler = memcached

session.save_path = "host1:11211;host2:11211"

; memcache

session.save_handler = memcache

session.save_path = "tcp://host1:11211,tcp://host2:11211"

The real power, however, is in the ability to create your own session handlers by using session_set_save_handler(). To use this function, we simply define multiple callbacks, one for each of open, close, read, write, destroy, and gc (garbage collection).

Prior to PHP 5.4, we had to specify each of these separately as arguments to session_set_save_handler(), but with the addition of the SessionHandlerInterface we can now simply define a class that implements it and pass an instance of that class in as the single argument.

Additionally, PHP 5.4 exposes the standard session handler as the new SessionHandler class, which can be extended to change functionality.

Prior to PHP 5.4, we might have done the following:

Listing 5.7: Custom session handler class before 5.4

class JsonSessionHandler

{

protected $save_path;

protected $file;

public function open($save_path, $session_id) {

$this->save_path = $save_path;

$this->file = $save_path

. DIRECTORY_SEPARATOR

. $session_id

. '.json';

return is_writable($save_path);

}

public function close() {

return is_writeable($this->file);

}

public function read($session_id) {

return json_decode(file_get_contents($this->file));

}

public function write($session_id, $data) {

return (bool) file_put_contents(json_encode($data));

}

public function destroy($session_id) {

unlink($this->file);

return !is_file($this->file);

}

public function gc($maxlifetime) {

$timeout = time() - $maxlifetime;

$files = glob($this->save_path

. DIRECTORY_SEPARATOR

. '*.json')

foreach ($files as $file) {

if (filemtime($file) < $timeout) {

unlink($file);

}

}

}

}

$handler = new JsonSessionHandler;

session_set_save_handler(

[$handler, 'open'],

[$handler, 'close'],

[$handler, 'read'],

[$handler, 'write'],

[$handler, 'destroy'],

[$handler, 'gc']

);

With PHP 5.4, however, we can shorten this in two ways. First by implementing SessionHandlerInterface, meaning we can pass our instance in directly:

Listing 5.8: Custom session handler class with 5.4

class JsonSessionHandler implements SessionHandlerInterface

{

...

}

$handler = new JsonSessionHandler;

session_set_save_handler($handler);

Second, we could also extend the SessionHandler class:

Listing 5.9: Extending a session handler class with 5.4

class JsonSessionHandler extends SessionHandler

{

public function read($session_id) {

$data = parent::read($session_id);

return json_decode($data);

}

public function write($session_id, $data) {

$data = json_encode($data);

return parent::write($session_id, $data);

}

}

$handler = new JsonSessionHandler();

session_set_save_handler($handler);

In most cases you will use a custom handler to avoid writing to the disk. To allow scaling across multiple servers, we want to use a shared data store, e.g., memcache or MySQL. Therefore extending SessionHandler does not make sense and you’ll need to implement SessionHandlerInterface.

Built-in HTTP Server

With PHP 5.4, a new built-in HTTP server (known as the cli-server) was added, which allows us to easily test our projects without the need for other server software.

To use the new cli-server, supply the -S and -t flags on the command line.

The -S flag enables the cli-server, and should be followed by IP address and port to bind to. Bear in mind that ports below 1024 will require root access.

The -t flag sets the document root. This is especially necessary when using custom routing.

If you want to use a router file, simply specify it as the last argument.

$ php -S 0.0.0.0:8080 -t ./public index.php

The command above will start cli-server, bound to all network interfaces on port 8080, with the ./public directory as the document root, and index.php as the router file. You can then access the server on http://localhost:8080/.

The output from the cli-server will look similar to this:

PHP 5.5.13 Development Server started at Thu Jun 19 01:34:33 2014

Listening on http://0.0.0.0:8080

Document root is /path/to/document/root

Press Ctrl-C to quit.

[Thu Jun 19 10:35:32 2014] 127.0.0.1:53492 [200]: /

[Thu Jun 19 10:35:33 2014] 127.0.0.1:53494 [200]: /index.php

[Thu Jun 19 10:35:34 2014] 127.0.0.1:53496 [200]: /info.php

As you can see, each request is shown in a running log.

The router file will be used for all requests, even static assets; to tell PHP that we wish to serve the file straight from disk, we simply return false. Otherwise, we handle the request entirely.

When using a router file, requests handled by the router will not be shown in the log output.

Listing 5.10: Router file

if (PHP_SAPI == 'cli-server') {

// If the file exists on disk

if (realpath(__DIR__ . "/" . $_SERVER['REQUEST_URI'])) {

// serve as-is

return false;

} else {

// route the request to our app (could be Zend

// Framework, etc)

MyRouter::route($_SERVER['REQUEST_URI']);

return true;

}

// Dummy router example, shows the requested resource,

// and dumps $_SERVER

class MyRouter {

static public function route($path) {

echo "Requested Resource: $path";

var_dump($_SERVER);

}

}

}

The cli-server is a great asset for development, as it supports complex custom routing and is super simple to use.

Summary

If we had to explain why PHP is the most popular web development language on earth, we’d probably pick all the reasons explained in this chapter. The language itself has an incredible set of features, and many extensions make working with specific technologies, like web services, much easier than on most other platforms. But it’s the simplicity of creating a web application capable of interacting with a client on so many levels and with so little effort that makes creating dynamic websites a breeze.

You should keep in mind that the vast majority of security issues that can afflict a PHP application are directly related to the topics presented in this chapter—don’t forget to read the Security chapter thoroughly.

A deep working knowledge of the subjects we covered in this chapter is paramount to good PHP development. Therefore, the exam often deals with them, even when a question is about a different topic. You should keep this in mind while preparing for the test.