PHP & MySQL: Novice to Ninja, 5th Edition (2012)

Chapter 5. Relational Database Design

Since Chapter 2, we’ve worked with a very simple database of jokes, composed of a single table named, appropriately enough, joke. While this database has served us well as an introduction to MySQL databases, there’s more to relational database design than can be understood from this simple example. In this chapter, we’ll expand on this database and learn a few new features of MySQL, in an effort to realize and appreciate the real power that relational databases have to offer. Be forewarned that I will cover several topics only in an informal, nonrigorous sort of way. As any computer science major will tell you, database design is a serious area of research, with tested and mathematically provable principles that, while useful, are beyond the scope of this text. For more complete coverage of database design concepts and SQL in general, pick up a copy ofSimply SQL. If you’re really into learning the hard principles behind relational databases, Database in Depth (Sebastopol: O’Reilly, 2005) is a worthwhile read.

Giving Credit Where Credit Is Due

To start off, let’s recall the structure of our joke table. It contains three columns: id, joketext, and jokedate. Together, these columns allow us to identify jokes (id), and keep track of their text (joketext) and the date they were entered (jokedate). For your reference, here’s the SQL code that creates this table and inserts a couple of entries:^[³²^]

chapter5/sql/jokes1.sql

# Code to create a simple joke table

CREATE TABLE joke (

id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

joketext TEXT,

jokedate DATE NOT NULL

) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB;

# Adding jokes to the table

INSERT INTO joke SET

joketext = 'Why did the chicken cross the road? To get to the other

↵ side!',

jokedate = '2009-04-01';

INSERT INTO joke

(joketext, jokedate) VALUES (

'Knock-knock! Who\'s there? Boo! "Boo" who? Don\'t cry; it\'s only a

↵ joke!',

"2009-04-01"

);

Now, let’s say we wanted to track another piece of information about our jokes: the names of the people who submitted them. It would be natural to add a new column to our joke table for this. The SQL ALTER TABLE command (which we’ve yet to see) lets us do exactly that. Open up phpMyAdmin as you did in Chapter 2, select your database (ijdb if you used the name I suggested), and type this command:

ALTER TABLE joke ADD COLUMN authorname VARCHAR(255)

This code adds a column called authorname to your table. The type declared is a variable-length character string of up to 255 characters, VARCHAR(255)—plenty of space for even very esoteric names. Let’s also add a column for the authors’ email addresses:

ALTER TABLE joke ADD COLUMN authoremail VARCHAR(255)

For more information about the ALTER TABLE command, see Appendix B. Just to make sure the two columns were added properly, we’ll ask MySQL to describe the table to us:^[³³^]

DESCRIBE joke

This should give you a table of results like the one in Figure 5.1.

Our joke table now contains five columns

Figure 5.1. Our joke table now contains five columns

Looks good, right? Obviously, to accommodate this expanded table structure, we’d need to make changes to the HTML and PHP form code we wrote in Chapter 4 that allowed us to add new jokes to the database. Using UPDATE queries, we could now add author details to all the jokes in the table. But before we spend too much time on such changes, we should stop and consider whether this new table design was the right choice here. In this case, it turns out it wasn’t.

Rule of Thumb: Keep Entities Separate

As your knowledge of database driven websites continues to grow, you may decide that a personal joke list is too limited. In fact, you might receive more submitted jokes than you have original jokes of your own. Let’s say you decide to launch a website where people from all over the world can share jokes with each other. Adding the author’s name and email address to each joke certainly makes a lot of sense, but the method we used above leads to potential problems:

· What if a frequent contributor to your site named Joan Smith changed her email address? She might begin to submit new jokes using the new address, but her old address would still be attached to the jokes she’d submitted in the past. Looking at your database, you might simply think there were two people named Joan Smith who had submitted jokes. She might inform you of the change of address, and you may try to update all the old jokes with the new address, but if you missed just one joke, your database would still contain incorrect information. Database design experts refer to this sort of problem as an update anomaly .

· It would be natural for you to rely on your database to provide a list of all the people who’ve ever submitted jokes to your site. In fact, you could easily obtain a mailing list using the following query:

· SELECT DISTINCT authorname, authoremail

· FROM joke

The word DISTINCT in the above query stops MySQL from outputting duplicate result rows. For example, if Joan Smith submits 20 jokes to your site, using the DISTINCT option would cause her name to only appear once in the list instead of 20 times. Then, if for some reason, you decided to remove all the jokes that a particular author had submitted to your site, you’d remove any record of this person from the database in the process, and you’d no longer be able to email him or her with information about your site! Database design experts call this a delete anomaly . As your mailing list might be a major source of income for your site, it’s unwise to go throwing away an author’s email address just because you disliked the jokes that person submitted.

· You have no guarantee that Joan Smith will enter her name the same way each time; consider the variations: Joan Smith, J. Smith, Smith, Joan—you catch my drift. This makes keeping track of a particular author exceedingly difficult, especially if Joan Smith also has several email addresses she likes to use.

These problems—and more—can be dealt with very easily using established database design principles. Instead of storing the information for the authors in the joke table, let’s create an entirely new table for our list of authors. Just as we have an id column in our joke table to identify each joke with a unique number, we’ll use an identically named column in our new table to identify our authors. We can then use those author IDs in our joke table to associate authors with their jokes. The complete database layout is shown in Figure 5.2.

The authorid field associates each row in joke with a row in author

Figure 5.2. The authorid field associates each row in joke with a row in author

These tables show that there are three jokes and two authors. The authorid column of the joke table establishes a relationship between the two tables, indicating that Kevin Yank submitted jokes 1 and 2 and Joan Smith submitted joke 3. Notice that since each author now only appears once in the database, and independently of the jokes submitted, we’ve avoided all the potential problems just outlined. What’s really important to note about this database design is that we’re storing information about two types of things (jokes and authors), so it’s most appropriate to have two tables. This is a rule of thumb that you should always keep in mind when designing a database: each type of entity (or “thing”) about which you want to be able to store information should be given its own table. To set up the aforementioned database from scratch is fairly simple (involving just twoCREATE TABLE queries), but since we’d like to make these changes in a nondestructive manner (that is, without losing any of our precious knock-knock jokes), we will use the ALTER TABLE command again. First, we remove the author-related columns in the joke table:

ALTER TABLE joke DROP COLUMN authorname

ALTER TABLE joke DROP COLUMN authoremail

Now, we create our new table:

CREATE TABLE author (

id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

name VARCHAR(255),

email VARCHAR(255)

) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB

Finally, we add the authorid column to our joke table:

ALTER TABLE joke ADD COLUMN authorid INT

If you prefer, here are the CREATE TABLE commands that will create the two tables from scratch:

chapter5/sql/2tables.sql (excerpt)

# Code to create a simple joke table that stores an author ID

CREATE TABLE joke (

id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

joketext TEXT,

jokedate DATE NOT NULL,

authorid INT

) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB;

# Code to create a simple author table

CREATE TABLE author (

id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

name VARCHAR(255),

email VARCHAR(255)

) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB;

All that’s left to do is add some authors to the new table, and assign authors to all the existing jokes in the database by filling in the authorid column.^[³⁴^] Go ahead and do this now if you like—it should give you some practice with INSERT and UPDATE queries. If you’re rebuilding the database from scratch, however, here’s a series of INSERT queries that will do the trick:

chapter5/sql/2tables.sql (excerpt)

# Adding authors to the database

# We specify the IDs so they are known when we add the jokes below.

INSERT INTO author SET

id = 1,

name = 'Kevin Yank',

email = 'thatguy@kevinyank.com';

INSERT INTO author (id, name, email)

VALUES (2, 'Joan Smith', 'joan@example.com');

# Adding jokes to the database

INSERT INTO joke SET

joketext = 'Why did the chicken cross the road? To get to the

↵ other side!',

jokedate = '2012-04-01',

authorid = 1;

INSERT INTO joke (joketext, jokedate, authorid)

VALUES (

'Knock-knock! Who\'s there? Boo! "Boo" who? Don\'t cry; it\'s only

↵ a joke!',

'2012-04-01',

);

INSERT INTO joke (joketext, jokedate, authorid)

VALUES (

'A man walks into a bar. "Ouch."',

'2012-04-01',

);

SELECT with Multiple Tables

With your data now separated into two tables, it may seem that you’re complicating the process of data retrieval. Consider, for example, our original goal: to display a list of jokes with the name and email address of the author next to each joke. In the single-table solution, you could gain all the information needed to produce such a list using a single SELECT query in your PHP code:

try

{

$sql = 'SELECT joketext, authorname, authoremail FROM joke';

$result = $pdo->query($sql);

}

catch (PDOException $e)

{

$error = 'Error fetching jokes ' . $e->getMessage();

include 'error.html.php';

exit();

}

foreach ($result as $row)

{

$jokes[] = array(

'id' => $row['id'],

'text' => $row['joketext'],

'name' => $row['authorname'],

'email' => $row['authoremail']

);

}

With our new database layout, this would, at first, no longer seem possible. As the author details of each joke are no longer stored in the joke table, you might think that you’d have to fetch those details separately for each joke you wanted to display. The code required would involve a call to the PDO query method for each and every joke to be displayed. This would be messy and slow. As your database of jokes increased in size, the overhead of all those queries would drag down your site’s performance in a big way. Taking all this into account, it would seem that the “old way” was the better solution, despite its weaknesses. Fortunately, relational databases like MySQL are designed to make it easy to work with data stored in multiple tables! Using a new form of the SELECT statement, called a join , you can have the best of both worlds. Joins allow you to treat related data in multiple tables as if they were stored in a single table. Here’s what the syntax of a simple join looks like:

SELECT columns

FROM table1 INNER JOIN table2

ON condition(s) for data to be related

In your case, the columns we’re interested in are id and joketext in the joke table, and name and email in the author table. The condition for an entry in the joke table to be related to an entry in the author table is that the value of the authorid column in the joke table is equal to the value of the id column in the author table. Let’s look at an example of a join. The first two queries show you what’s contained in the two tables—they’re unnecessary to perform the join. The third query is where the action’s at:

SELECT id, LEFT(joketext, 20), authorid FROM joke

The results of this query are shown in Figure 5.3.

A peek at the contents of the joke table

Figure 5.3. A peek at the contents of the joke table

SELECT * FROM author

The results of this query are shown in Figure 5.4.

A peek at the contents of the author table

Figure 5.4. A peek at the contents of the author table

SELECT joke.id, LEFT(joketext, 20), name, email

FROM joke INNER JOIN author

ON authorid = author.id

Finally, Figure 5.5 shows the results of this query.

The results of your first join

Figure 5.5. The results of your first join

See? The results of the third SELECT—a join—group the values stored in the two tables into a single table of results, with related data correctly appearing together. Even though the data is stored in two tables, you can still access all the information you need to produce the joke list on your web page with a single database query. Note in the query that, since there are columns named id in both tables, you must specify the name of the table when you refer to either id column. The joke table’s id is referred to as joke.id, while the author table’s id column is author.id. If the table name is unspecified, MySQL won’t know which id you’re referring to, and will produce the error shown in Figure 5.6:

SELECT id, LEFT(joketext, 20), name, email

FROM joke INNER JOIN author

ON authorid = id

MySQL has a low tolerance for ambiguity

Figure 5.6. MySQL has a low tolerance for ambiguity

Now that you know how to access the data stored in your two tables efficiently, you can rewrite the code for your joke list to take advantage of joins:

chapter5/jokes/index.php (excerpt)

try

{

$sql = 'SELECT joke.id, joketext, name, email

FROM joke INNER JOIN author

ON authorid = author.id';

$result = $pdo->query($sql);

}

catch (PDOException $e)

{

$error = 'Error fetching jokes: ' . $e->getMessage();

include 'error.html.php';

exit();

}

foreach ($result as $row)

{

$jokes[] = array(

'id' => $row['id'],

'text' => $row['joketext'],

'name' => $row['name'],

'email' => $row['email']

);

}

include 'jokes.html.php';

You can then update your template to display the author information for each joke:

chapter5/jokes/jokes.html.php (excerpt)

<?php foreach ($jokes as $joke): ?>

<p>

<?php echo htmlspecialchars($joke['text'], ENT_QUOTES,

↵ 'UTF-8'); ?>

<input type="hidden" name="id" value="<?php echo

↵ $joke['id']; ?>">

(by <a href="mailto:<?php

echo htmlspecialchars($joke['email'], ENT_QUOTES,

'UTF-8'); ?>"><?php

echo htmlspecialchars($joke['name'], ENT_QUOTES,

'UTF-8'); ?></a>)

</p>

</blockquote>

</form>

<?php endforeach; ?>

The resulting display is shown in Figure 5.7.

I wrote all the best ones myself

Figure 5.7. I wrote all the best ones myself

The more you work with databases, the more you’ll come to realize the power of combining data from separate tables into a single table of results. Consider, for example, the following query, which displays a list of all jokes written by Joan Smith:

SELECT joketext

FROM joke INNER JOIN author

ON authorid = author.id

WHERE name = "Joan Smith"

The results that are output from this query, shown in Figure 5.8, come only from the joke table, but the query uses a join to let it search for jokes based on a value stored in the author table. There will be plenty more examples of clever queries like this throughout the book, but this example alone illustrates that the practical applications of joins are many and varied, and, in almost all cases, can save you a lot of work!

Joan’s joke

Figure 5.8. Joan’s joke

Simple Relationships

The type of database layout for a given situation is usually dictated by the form of relationship that exists between the data that it needs to store. In this section, I’ll examine the typical relationship types, and explain how best to represent them in a relational database. In the case of a simpleone-to-one relationship , a single table is all you’ll need. An example of a one-to-one relationship is the email address of each author in our joke database. Since there will be one email address for each author, and one author for each email address, there’s no reason to split the addresses into a separate table.^[³⁵^] A many-to-one relationship is a little more complicated, but you’ve already seen one of these as well. Each joke in our database is associated with just one author, but many jokes may have been written by that one author. This joke–author relationship is many-to-one. I’ve already covered the problems that result from storing the information associated with a joke’s author in the same table as the joke itself. In brief, it can result in many copies of the same data, which are difficult to keep synchronized and waste space. If we split the data into two tables and use an ID column to link them together (making joins possible as shown before), all these problems disappear. A one-to-many relationship is simply a many-to-one relationship seen from the opposite direction. As the joke–author relationship is many-to-one, the author–joke relationship is one-to-many (there is one author for, potentially, many jokes). This is easy to see in theory, but when you’re coming at a problem from the opposite direction, it’s less obvious. In the case of jokes and authors, we started with a library of jokes (the many) and then wanted to assign an author to each of them (the one). Let’s now look at a hypothetical design problem where we start with the one and want to add the many. Say we wanted to allow each of the authors in our database (the one) to have multiple email addresses (the many). When an inexperienced person in database design approaches a one-to-many relationship like this one, often the first thought is to try to store multiple values in a single database field, as shown in Figure 5.9.

Never overload a table field to store multiple values as is done here

Figure 5.9. Never overload a table field to store multiple values as is done here

This would work, but to retrieve a single email address from the database, we’d need to break up the string by searching for commas (or whatever special character you chose to use as a separator); it’s a not-so-simple, potentially time-consuming operation. Try to imagine the PHP code necessary to remove one particular email address from a specific author! In addition, you’d need to allow for much longer values in the email column, which could result in wasted disk space because the majority of authors would have just one email address. Now take a step back, and realize that this one-to-many relationship is just the same as the many-to-one relationship we faced between jokes and authors. The solution, therefore, is also the same: split the new entities (in this case, email addresses) into their own table. The resulting database structure is shown in Figure 5.10.

The authorid field associates each row of email with one row of author

Figure 5.10. The authorid field associates each row of email with one row of author

Using a join with this structure, we can easily list the email addresses associated with a particular author:

SELECT email

FROM author INNER JOIN email

ON authorid = author.id

WHERE name = "Kevin Yank"

Many-to-many Relationships

Okay, you now have a steadily growing database of jokes published on your website. It’s growing so quickly, in fact, that the number of jokes has become unmanageable! Your site visitors are faced with a mammoth page that contains hundreds of jokes without any structure whatsoever. We need to make a change. You decide to place your jokes into the following categories: Knock-knock jokes, “Crossing the road” jokes, Lawyer jokes, Light bulb jokes, and Political jokes. Remembering our rule of thumb from earlier, you identify joke categories as a new entity, and create a table for them:

CREATE TABLE category (

id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,

name VARCHAR(255)

) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB

Now you come to the daunting task of assigning categories to your jokes. It occurs to you that a “political” joke might also be a “crossing the road” joke, and a “knock-knock” joke might also be a “lawyer” joke. A single joke might belong to many categories, and each category will contain many jokes. This is a many-to-many relationship. Once again, many inexperienced developers begin to think of ways to store several values in a single column, because the obvious solution is to add a category column to the joke table and use it to list the IDs of those categories to which each joke belongs. A second rule of thumb would be useful here: if you need to store multiple values in a single field, your design is probably flawed. The correct way to represent a many-to-many relationship is by using a lookup table . This is a table that contains no actual data, but lists pairs of entries that are related. Figure 5.11 shows what the database design would look like for our joke categories.

The jokecategory table associates pairs of rows from the joke and category tables

Figure 5.11. The jokecategory table associates pairs of rows from the joke and category tables

The jokecategory table associates joke IDs (jokeid) with category IDs (categoryid). In this example, we can see that the joke that starts with “How many lawyers …” belongs to both the Lawyers and Light bulb categories. A lookup table is created in much the same way as is any other table. The difference lies in the choice of the primary key. Every table we’ve created so far has had a column named id that was designated to be the PRIMARY KEY when the table was created. Designating a column as a primary key tells MySQL to forbid two entries in that column from having the same value. It also speeds up join operations based on that column. In the case of a lookup table, there is no single column that we want to force to have unique values. Each joke ID may appear more than once, as a joke may belong to more than one category, and each category ID may appear more than once, as a category may contain many jokes. What we want to prevent is the same pair of values appearing in the table twice. And, since the sole purpose of this table is to facilitate joins, the speed benefits offered by a primary key would come in very handy. For this reason, we usually create lookup tables with a multicolumn primary key as follows:

CREATE TABLE jokecategory (

jokeid INT NOT NULL,

categoryid INT NOT NULL,

PRIMARY KEY (jokeid, categoryid)

) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB

This creates a table in which the jokeid and categoryid columns together form the primary key. This enforces the uniqueness that’s appropriate to a lookup table, preventing a particular joke from being assigned to a specific category more than once, and speeds up joins that make use of this table. ^[³⁶^] Now that your lookup table is in place and contains category assignments, you can use joins to create several interesting and practical queries. This query lists all jokes in the Knock-knock category:

SELECT joketext

FROM joke INNER JOIN jokecategory

ON joke.id = jokeid

INNER JOIN category

ON categoryid = category.id

WHERE name = "Knock-knock"

As you can see, this query uses two joins. First, it takes the joke table and joins it to the jokecategory table; then it takes that joined data and joins it to the category table. As your database structure becomes more complex, multijoin queries like this one become common. The following query lists the categories that contain jokes beginning with “How many lawyers …”:

SELECT name

FROM joke INNER JOIN jokecategory

ON joke.id = jokeid

INNER JOIN category

ON categoryid = category.id

WHERE joketext LIKE "How many lawyers%"

And this query—which also makes use of our author table to join together the contents of four tables—lists the names of all authors who have written knock-knock jokes:

SELECT author.name

FROM joke INNER JOIN author

ON authorid = author.id

INNER JOIN jokecategory

ON joke.id = jokeid

INNER JOIN category

ON categoryid = category.id

WHERE category.name = "Knock-knock"

One for Many, and Many for One

In this chapter, I explained the fundamentals of good database design, and we learned how MySQL and, for that matter, all relational database management systems provide support for the representation of different types of relationships between entities. From your initial understanding of one-to-one relationships, you should now have expanded your knowledge to include many-to-one, one-to-many, and many-to-many relationships. In the process, you learned a few new features of common SQL commands. In particular, you learned how to use a SELECT query to join data spread across multiple tables into a single set of results. With the increased expressiveness that multiple database tables bring, you’re now equipped to extend the simple “joke list” site you assembled in Chapter 4 to include authors and categories, and that’s exactly what Chapter 7 will be all about. Before you tackle this project, however, you should take some time to add to your PHP skills. Just as you spent this chapter learning some of the finer points of MySQL database design, Chapter 6 will teach you some of the subtleties of PHP programming—which will make the job of building a more complete joke database site much more fun.

^[³²^]If you ever need to re-create your database from scratch, you can use phpMyAdmin to drop all the tables and then go to the Import tab of the now-empty ijdb database and feed it this SQL file. phpMyAdmin will run all the commands it contains, thereby restoring the database. In this way, you can use the .sql files in the code archive for this book as database snapshots to load up whenever you need them.

^[³³^]Instead of typing the DESCRIBE query yourself, you could just select the joke table in phpMyAdmin and click the Structure tab. At this point in your database administration career, though, it’s advisable to take every opportunity you can to become familiar with SQL queries like DESCRIBE.

^[³⁴^]For now, you’ll have to do this manually. But rest assured, in Chapter 7 we’ll see how PHP can insert entries with the correct IDs automatically, reflecting the relationships between them.

^[³⁵^]There are exceptions to this rule. For example, if a single table grows very large with lots of columns, some of which are rarely used in SELECT queries, it can make sense to split those columns out into their own table. This can improve the performance of queries on the now-smaller table.

^[³⁶^]If you like, you can use the CREATE TABLE and INSERT commands in Example 7.1 to create the jokecategory table from scratch (and others, including the jokes within the tables) to follow along.