Better Searching with Regular Expressions - Dynamic Web Pages - PHP and MySQL: The Missing Manual (2011)

PHP and MySQL: The Missing Manual (2011)

Part 2. Dynamic Web Pages

Chapter 5. Better Searching with Regular Expressions

In the last chapter, you did one of the most common things programmers do: Write code that solves a problem, but is ugly, messy, and a little hard to understand. Unfortunately, most programmers leave code in that state because…well, it works.

Bad code is like sloppy plumbing or a poorly constructed house framing. At some point, things are going to go bad, and someone is going to have to fix problems. And, if you’ve ever had an electrician tell you what they’ve got to charge you because of the guy who did it wrong before him, then you know how expensive it is to fix someone else’s mistakes.

But here’s the thing: even good code is going to fail at some point. Anytime you’ve got a system where humans are involved, someone will eventually do something unexpected, or maybe just something you never thought about dealing with when you wrote your code. And that’s when you’rethe electrician, trying to fix things when the customer’s unhappy—but there’s nobody else to blame.

So writing ugly code that works really isn’t an option. And the code in run_query.php right now is very ugly. It’s all those if statements, trying to figure out whether the user entered a CREATE or an UPDATE or an INSERT, or maybe a SELECT…or who knows what else? What you really need is a way to search the incoming query for all those keywords at one time. And then there’s converting things to uppercase, and dealing with whitespace, and making sure the SQL keyword you want is at the beginning of the query, it gets complicated, fast.

Unfortunately, there’s no elegant way to solve this problem with strpos and the string manipulation you’ve done so far. Fortunately, you’ve got another option: regular expressions. Regular expressions are like a giant keg of gunpowder: extremely powerful, and perfectly capable of blowing up your program and creating hours of frustration. There’s no way to get the power without the danger.

That’s okay, though, because you’re not running off to battle just yet. Before you’re done with run_query.php, you’ll have introduced regular expressions, and cut out all but one of those annoying if statements around searching through $query_text. Most important, your program will make more sense and thus be easier to troubleshoot when problems occur down the line.

WARNING

It’s pretty common knowledge that most people—and even most programmers—see regular expressions in particular as a complicated, difficult, black art of programming. That’s okay; you’re more than ready to tackle regular expressions. And once you understand how they work, you’ll wonder why anyone wouldn’t want to use them all over the place.

String Matching, Double-Time

So far, you’ve been using strpos to do string searching (Searching Within Text), and you’ve been passing your string into that function and then some additional characters or strings to look for. The problem is that using strpos in this way limits you to a single search string at a time. So you can search for UPDATE and you can search for DROP, but not at the same time.

Here’s where regular expressions come into the picture. A regular expression is just what it sounds like: a regular sequence of characters or numbers or some other pattern—an expression—you want to search for. So if you had a string like “abcdefghijklmnopqrstuvwxyz,” then you could search for the pattern, or regular expression, “abc.” It would show up once, so of course that doesn’t seem too regular.

But suppose you had an entire web page, and you wanted to search for links. You might use an expression like “<a” to find all the link elements. You might find none, or one, or ten; but with a regular expression, you can search for practically anything you want. But it does get a bit murky, so the best place to start is at the beginning.

A Simple String Searcher

Just about the simplest regular expression you can come up with is a single simple letter, like “a” or “m.” So the regular expression “a” will match any “a.” Doesn’t sound too difficult, does it?

In PHP, if you want to search using regular expressions, you use the preg_match function. While that sounds related to childbirth, it actually stands for “p-reg,” as in “PHP regular (expressions).” However you say it, you use it like this:

<?php

$string_to_search = "Martin OMC-28LJ";

$regex = "/OM/";

$num_matches = preg_match($regex, $string_to_search);

if ($num_matches > 0) {

echo "Found a match!";

} else {

echo "No match. Sorry.";

}

?>

WARNING

Be sure that the first thing you give to preg_match is the regular expression, not the string in which you want to search. This arrangement might seem backward from how you’ve been working, but you’ll soon be using the preg_match and related functions so often, putting the search string will start to feel odd.

So there you go. Save that program as regex.php, and run it from the command line. You should get a result like this:

--(08:25 $)-> php regex.php

Found a match!

Admittedly, this isn’t very exciting. Before you can walk, though, you gotta crawl. And one of those crawling steps is understanding just how you write a regular expression.

First, regular expressions are just strings, so you wrap them in quotes. You’ll typically use double quotes (“) rather than single quotes (’) because you’ll need to do some funny escape characters, and PHP doesn’t do as much helpful processing on single-quoted strings as double-quoted ones.

Additionally, regular expressions begin and end with a forward slash. It’s then everything between those slashes that makes up the meat of the expression. So “/OM/” is a regular expression that searches for OM.

More specifically, “/OM/” searches for exactly OM. So it won’t match “om” or “Om” or “OhM.” It’s got to be a capital O followed by a capital M. In other words, at least so far, this is just like the string matching you’ve already done.

And preg_match has some wrinkles, too. First, as you’ve seen, it takes a regular expression first, and then the string in which to search. Then, it returns the number of matches, rather than the position at which a match was found. And here’s the first real wrinkle: preg_match will never return anything other than 0 or 1. It returns 0 if there are no matches, 1 on the first match, and then it simply stops searching.

If you want to find all the matches, you can use preg_match_all. So preg_match(“/ Mr/”, “Mr. Mranity”) returns 1, but preg_match_all(“/Mr/”, “Mr. Mranity”) returns 2.

NOTE

The re are also several additional things you can pass into—and get out of—preg_match and preg_match_all. You can find out about all this online at www.php.net/manual/en/function.preg-match.php. For now, though, just get comfortable with regular expressions.

Search for One String…Or Another

So far, there’s not a lot that regular expressions seem to offer that you don’t already have with strpos. But there’s a lot more that you can do, and one of the coolest is searching for one string or another. To do this, you use a special character called the pipe. The pipe looks like a vertical line: |. (It’s usually above the backslash character, over on the right side of your keyboard.)

UNDER THE HOOD: WHICH QUOTE IS THE BEST QUOTE?

Almost every programming language seemingly treats single-quoted strings (‘My name is Bob’) and double-quoted strings (“I am a carpenter”) the same. However, also in almost every programming language, there’s a lot more going on than you may realize, all based upon which quotation mark you use.

In general, less processing is performed on single-quoted strings. But what processing occurs in the first place? Take the statement I’m going to the bank. If you put that in a single-quoted string, you get ‘I’m going to the bank’. But PHP is going to bark at you, because the single-quote in I’m looks like it’s ending the simple string ‘I’, and all the rest—m going to the bank—must be something else. Of course that’s not what you mean, so you do one of two things: you either switch to double quotes and move on, or you escape the single quote.

Escaping something means telling the programming language not to treat something as part of the language; it’s just part of the string. Typically, you escape characters by throwing a backslash (\) in front of the potentially problematic character. So in the string I’m going to the bank., you’d write it in single quotes like this: ‘I\’m going to the bank’. That \ tells PHP to ignore both it and the thing that follows.

What if you want to actually write a backslash? Suppose you’re writing a program for your great-great-great granddad, the one that still runs DOS on his 286? You might want to say, ‘Never, ever, ever type in \’del C:\*.*\’ and hit Return!’ Well, you handled the single-quotes handily, but now PHP is trying to escape the character following that in-string backslash: \*. That just confuses PHP, which can’t figure out why it’s being asked to escape an asterisk. In this case, you need to escape the backslash itself. So you just put in the escape character—the backslash—and then the character to be escaped: another backslash. The result is ‘Never, ever, ever type in \’del C:\\*.*\’

Other than the single quote (‘) and the backslash (\), PHP doesn’t do any other processing to your single-quoted strings. But there are lots of other things you might need processing for: a new line (\n), a tab (\t), or that slick way of inserting variables right into a string with {$variable} or just using $variable.

So with a single-quoted string, you get very little. With a double-quoted string, you get all the extra processing. As a result, most programmers tend to use double quotes. That way, they don’t have to think, “Now do I need extra processing on this string? Or can I use single quotes?”

One last note: extra processing really isn’t a performance issue in 99 percent of the applications you write. The processing involved in handling those extra escape characters and variables isn’t going to frustrate your customers or send server hard drives or RAM chips into a frenzy. You can happily use double-quoted strings all the time, and you’ll probably never notice any issues at all.

Anytime you want to search for one thing or another, you put those two things together inside of parentheses, separated by the pipe:

/(Mr|Dr)\. Smith/

First, though, notice a wrinkle: the backward slash (\). This character is escaping the period, as that period usually means, in a regular expression, “match any single character.” But in this case, you want to match an actual period, not anything. So \. will match a period, and nothing but a period.

/Mr. Smith/ will match “Mr. Smith” but will skip right over “Dr. Smith.” But /(Mr|Dr). Smith/ would match either “Mr. Smith” or “Dr. Smith”.

Therefore, this little code snippet would find a match in both cases:

// This will match

echo "Matches: " . preg_match("/(Mr|Dr). Smith/", "Mr. Smith");

// So will this

echo "Matches: " . preg_match("/(Mr|Dr). Smith/", "Dr. Smith");

With this new wrinkle, you should be able to make some extensive changes to the run_query.php script from the last chapter (Creating an HTML Form with a Big Empty Box). Open that file and take a look. Here’s the old version:

<?php

require '../../scripts/database_connection.php';

$query_text = $_REQUEST['query'];

$result = mysql_query($query_text);

if (!$result) {

die("<p>Error in executing the SQL query " . $query_text . ": " .

mysql_error() . "</p>");

}

$return_rows = false;

$uppercase_query_text = strtoupper($query_text);

$location = strpos($uppercase_query_text, "CREATE");

if ($location === false) {

$location = strpos($uppercase_query_text, "INSERT");

if ($location === false) {

$location = strpos($uppercase_query_text, "UPDATE");

if ($location === false) {

$location = strpos($uppercase_query_text, "DELETE");

if ($location === false) {

$location = strpos($uppercase_query_text, "DROP");

if ($location === false) {

// If we got here, it's not a CREATE, INSERT, UPDATE,

// DELETE, or DROP query. It should return rows.

$return_rows = true;

}

}

}

}

}

if ($return_rows) {

// We have rows to show from the query

echo "<p>Results from your query:</p>";

echo "<ul>";

while ($row = mysql_fetch_row($result)) {

echo "<li>{$row[0]}</li>";

}

echo "</ul>";

} else {

// No rows. Just report if the query ran or not

echo "<p>The following query was processed successfully:</p>";

echo "<p>{$query_text}</p>";

}

?>

All that if stuff is what makes it messy. But with regular expressions, you can make some pretty spectacular changes:

<?php

// require and database connection code

$return_rows = true;

if (preg_match("/(CREATE|INSERT|UPDATE|DELETE|DROP)/",

strtoupper($query_text))) {

$return_rows = false;

}

if ($return_rows) {

// display code

}

?>

NOTE

You may want to save this version as another file, or in another directory, so you can go back and see what you started with. In this book’s examples, you’ll find the original version of run_query.php in the Chapter 4 examples directory, and this new version in the Chapter 5 examples directory.

Take a close look here, especially at the fairly long condition for the if statement. here’s the breakdown of what’s going on:

1. You start with setting $return_rows to true, instead of false. That’s because your regular expression search is checking whether you don’t have return rows. This version is easier to read than the older one, where you’re constantly doing a comparison, and then if there’s not a match, setting$return_rows to true.

2. Then, the if condition: it begins with preg_match. There’s no need to use preg_match_all, since you only care if the search strings are found at all, not if they’re found more than once.

3. The regular expression is actually pretty simple: it’s each keyword for a SQL statement that doesn’t return any rows, all separated by that pipe symbol. So it’s basically an expression for matching a string that contains CREATE or INSERT or UPDATE or DELETE or DROP.

4. This expression is evaluated against the uppercase version of $query_text. Not only do you not change the value of $query_text, but you don’t even really need to save the uppercase version. If you need an uppercase version again later, you can just call strtoupper again.

5. You know that preg_match returns 0 if there’s no match, and PHP sees 0 as false. preg_match returns 1 if there’s a match, which PHP sees as true. So you can just drop the whole preg_match in as your if statement’s condition, and know that if there’s a match, the if statement code will run; if there’s not a match, it won’t.

6. Inside the if, $return_rows is set to false, because a match means this is a query that doesn’t have return rows.

Not only is this code easier to read, and makes more sense to a human brain, but you cut 20 lines of code down to 4.

WARNING

It’s not always good to have less lines of code. Sometimes you can sacrifice readability and clarity to save a few lines, and that’s not helpful. But if you can condense four or five conditions into one or two, that usually is a good thing.

Get into Position

One of the problems with even this streamlined version of run_query.php is it looks for a match anywhere within the input query. If you read the box on Get Specific with Position and Whitespace Trimming, you know there are still problems. You need to trim your user’s query string, and that’s pretty simple:

if (preg_match("/(CREATE|INSERT|UPDATE|DELETE|DROP)/",

trim(strtoupper($query_text)))) {

$return_rows = false;

}

But there’s another trickier problem: you really only want to search for those special keywords at the beginning of the query string. That prevents a query like this…

SELECT *

FROM registrar_activities

WHERE name = 'Update GPA'

OR name = 'Drop a class'

…from being mistaken as an UPDATE or DROP query. This query, a SELECT, returns rows, but if it’s interpreted as an UPDATE or DROP, your script will not show return rows.

It took some additional if conditions to get this to work before, but that was in the dark days before regular expressions. Now, it’s no problem to tell PHP, “I want this expression, but only at the beginning of the search string.”

To accomplish this feat of wizardry, just add the carat (^) to the beginning of your search string. ^ says, “At the beginning.”

// Matches

echo "Matches: " . preg_match("/^(Mr|Dr). Smith/",

"Dr. Smith") . "\n";

// Does NOT match

echo "Matches: " . preg_match("/^(Mr|Dr). Smith/",

" Dr. Smith") . "\n";

So in the first case, /^(Mr|Dr). Smith/ matches “Dr. Smith” because the string begins with “Dr. Smith” (“Mr. Smith” would be okay, too). But the second string does not match, because the ^ rejects the leading spaces.

Taking this back to your query runner, you’d do something like this:

if (preg_match("/^(CREATE|INSERT|UPDATE|DELETE|DROP)/",

trim(strtoupper($query_text)))) {

$return_rows = false;

}

That one little carat character makes all the difference. You can do the same thing with $ at the end of a string: it requires matches not at the beginning, but at the end of the search string:

// Does NOT match

echo "Matches: " . preg_match("/^(Mr|Dr). Smith$/",

"Dr. Smith ") . "\n";

// Matches

echo "Matches: " . preg_match("/^(Mr|Dr). Smith$/",

"Dr. Smith") . "\n";

WARNING

Make sure that your ^ and $ are inside the opening / and closing /. If you were to put, for example, /^(Mr|Dr). Smith/$, PHP would complain about that last $, saying that $ is an unknown modifier. This error is an easy to make, and it can be pretty frustrating to track down if you don’t realize what you’ve done.

So in the first case, there’s no match because the regular expression, which uses $, doesn’t allow for the trailing spaces in “Dr. Smith ”. The second check does match, though, because there’s no leading space (which matches the ^(Mr|Dr) part) and no trailing space (which matches the Smith$part).

In fact, when you have a ^ at the beginning of your expression and a $ at the end, you’re requiring an exact match not just within the search string but to the string itself. It’s like you’re saying that the search string should equal the regular expression. Of course if you were doing a real equals in PHP (with == or ===), you couldn’t have those nifty or statements with |, or any of the other cool things regular expressions offer.

Ditching trim and strtoupper

As long as you’re simplifying your code with some regular expression goodness, take things further. Right now, you’re converting $query_text to all uppercase with strtoupper, and then searching for CREATE, INSERT, and the like within that uppercase version of the query.

But regular expressions are happy to be case-insensitive, and not care about whether they match upper or lowercase versions of a word. Just add an “i” to the end of your expression, after the closing forward slash:

// Matches

echo "Matches: " . preg_match("/^(MR|DR). sMiTH$/i",

"Dr. Smith") . "\n";

This expression produces a match, despite the case of the expression and the search string not matching. So you can change your search in run_query.php to take advantage of this:

$return_rows = true;

if (preg_match("/^(CREATE|INSERT|UPDATE|DELETE|DROP)/i",

trim($query_text))) {

$return_rows = false;

}

No more strtoupper, and a new “i” at the end of the expression. With this change, the sort of query shown in Figure 5-1 will happily be recognized as DROP, which returns no result rows.

Even though you’re not adding functionality with these regular expressions, you’re definitely improving your code. You’re searching for what you want in the original $query_text, instead of changing $query_text to work with your search. That’s the way it should be: search an unchanged input string whenever possible.

Figure 5-1. Even though you’re not adding functionality with these regular expressions, you’re definitely improving your code. You’re searching for what you want in the original $query_text, instead of changing $query_text to work with your search. That’s the way it should be: search an unchanged input string whenever possible.

So what about trimming whitespace? Well, you really don’t need to trim $query_string; instead, in your regular expression, you just want to ignore leading spaces.

But think about that further: when you’re searching, are you truly ignoring something? No. That may be what the result is, but what you actually want to say is something like this:

1. Begin by matching any number of spaces—including the case where there are no spaces.

2. Then, after some indeterminate number of spaces, look for (CREATE|INSERT|UPDATE|DELETE|DROP).

So while you’re ignoring those spaces in your particular situation—figuring out if the query is a CREATE, or UPDATE, or whatever—you’re really just doing another type of matching.

Now, you know how to match a space: you just include it in your regular expression. So /^ Mr. Smith/ requires an opening space. “Mr. Smith” would not match, but “ Mr. Smith” would.

WARNING

Laying out type in books can be tricky. In the examples above, be sure you notice that the first “Mr. Smith” has no leading space, the second “ Mr. Smith” did have a space, and the regular expression, /^ Mr. Smith/ also had a space after the ^.

But that requires a space. How can you say that more than one space is okay? That’s when you need +. + says, “The thing that came just before me can appear any number of times.”

// Matches

echo "Matches: " . preg_match("/^ (MR|DR). sMiTH$/i",

" Dr. Smith") . "\n";

// Does NOT match

echo "Matches: " . preg_match("/^ (MR|DR). sMiTH$/i",

" Dr. Smith") . "\n";

// Matches

echo "Matches: " . preg_match("/^ +(MR|DR). sMiTH$/i",

" Dr. Smith") . "\n";

The first and second expressions look for exactly one space, and so the first entry matches, but the second—with multiple leading spaces—doesn’t. But the third expression accepts any number of spaces, so once again matches.

But try this:

// Does NOT match

echo "Matches: " . preg_match("/^ +(MR|DR). sMiTH$/i",

"Dr. Smith") . "\n";

Uh oh. Apparently “any number of spaces” for + really means, “any non-zero number of spaces.” If you are okay with nothing, or any number of characters, use *.

// Matches

echo "Matches: " . preg_match("/^ *(MR|DR). sMiTH$/i",

"Dr. Smith") . "\n";

So now you can look for spaces within your $query_text in run_query.php, and avoid touching the input string at all, even temporarily:

$return_rows = true;

if (preg_match("/^ *(CREATE|INSERT|UPDATE|DELETE|DROP)/i",

$query_text)) {

$return_rows = false;

}

Searching for Sets of Characters

Take a look at Figure 5-2. Will your current version of run_query.php handle what’s typed in this text box?

This query looks like it would be no problem. But there’s trouble lurking here, at least with your regular expression as it currently stands. Can you see what that problem is?

Figure 5-2. This query looks like it would be no problem. But there’s trouble lurking here, at least with your regular expression as it currently stands. Can you see what that problem is?

There might be leading spaces—it’s not possible to tell just looking at the illustration, or even if you were looking at an actual browser. But even if there isn’t leading space, there’s something else here: a return. Your clever, endearing users have done something you’d probably never think about: hit Enter a few times before typing in their SQL.

Suddenly, your regular expression doesn’t match this as a DROP, despite your handling leading spaces and issues with capitalization. That’s because Enter produces some special characters: usually either \n, or in some situations, \r\n, or, just to keep things interesting, occasionally just \r.

NOTE

These are all just varying flavors of new line characters. \n is called the line feed character, and \r is called a carriage return. In general, Windows uses \r\n, Unix and Linux use \n, and Macs (in particular older, pre-OS X Macs) use \r.

Fortunately, there aren’t nearly as many cross-system problems with these characters as there were just a few years ago. You can pretty safely use \n to create a new line, but when you’re search, you need to account for all the variations.

So what can you do? Well, it’s easy to account for multiple characters like this: \n* will match any number of new lines, and \r* will match any number of carriage returns. But what about \r\n? \r*\n* would match that, but what about spaces? You could do \r*\n* * and match Enter followed by spaces, but if you start to think about spaces and then Enters and then more spaces…and more Enters…it gets tricky again.

Of course, the whole point of regular expressions was to get away from that sort of thing. And, you can: you can search for any of a set of characters. That’s really what you want: accept any number (including zero) of any of a set of characters, a \r, a \n, or a space. You don’t care how many appear, or in what order, either.

FREQUENTLY ASKED QUESTION: BACK TO SQUARE ONE?

It may seem like all this regular expression work has just gotten you back to where you began: a search for CREATE or INSERT or UPDATE anywhere within $query_text. If you’re ignoring all the leading spaces, isn’t that just the same as $location = strpos($query_text, “CREATE”); and all its if-based brethren?

It may look like that at first glance, but you’re worlds away from all those if statements. First, to restate the obvious, you’ve got a script you should be happy to show any of your programmer friends. You’ve used regular expressions, and used them well, so you don’t have a litter box of conditions to sort through.

Second, your code is more sensible. It starts with the presumption that you’ll return rows. Then, based on a condition, it may change that presumption. This is natural human logic: start one way, if something else is going on, go another way. That’s a lot better than the sort of backward-logic of your earlier version of run_query.php.

But most importantly, you’re still not searching anywhere within $query_text for those SQL keywords. You’re searching anywhere within the string beginning with the first non-space character. So this sort of query…

SELECT *

FROM registrar_activities

WHERE name = 'Update GPA'

OR name = 'Drop a class'

…still comes across as a SELECT, and isn’t mistaken for a DROP, for example. And you did it without a lot of messy and obscure hard-to-read code.

You could do something like (\r|\n|)*, which is using the | to represent or again, and then the * applies to the entire group. But when you’re dealing with just single characters, you can skip the | and just put all the allowed characters into a set, which is indicated by square brackets ([ and ]).

$return_rows = true;

if (preg_match("/^[ \t\r\n]*(CREATE|INSERT|UPDATE|DELETE|DROP)/i",

$query_text)) {

$return_rows = false;

}

This code handles spaces, the two flavors of new lines, and tosses in \t for tab characters. So no matter how many leading spaces, tabs, or new lines are input, your regular expression is happy to handle them. In fact, this sort of whitespace matching is so common that regular expressions can use \s as an abbreviation for [ \t\r\n]. So you could simplify things further:

$return_rows = true;

if (preg_match("/^\s*(CREATE|INSERT|UPDATE|DELETE|DROP)/i",

$query_text)) {

$return_rows = false;

}

Try this out. Enter the SQL shown back in Figure 5-2, and submit your query. Your regular expression is just waiting to handle things. But wait…you’ll probably get something like Figure 5-3. What’s going on?

Just as you’re getting your regular expression and search code bulletproof, there’s a new error to deal with. This occurs before your search ever runs. But it definitely shows a problem: mysql_query did not seem to like those leading \r\n sequences.

Figure 5-3. Just as you’re getting your regular expression and search code bulletproof, there’s a new error to deal with. This occurs before your search ever runs. But it definitely shows a problem: mysql_query did not seem to like those leading \r\n sequences.

The problem here isn’t your regular expression. It’s really that you’re trying to pass into mysql_query some queries that haven’t been screened much for problems—like all those extra \r\ns at the beginning.

In fact, there are lots of queries that will create problems for run_query.php, regardless of how clean your regular expression code is. Try entering this query:

SELECT *

FROM urls

WHERE description = 'home page'

That seems simple enough, but it’s still going to break your script. It doesn’t matter if you have anything in the urls table or not; you’ll still get an error (see Figure 5-4).

Don’t be misled by this error. You don’t have an error in your SQL; you just have some overly simplistic code in your script. No worries, though, with a good base of regular expressions under your belt, you’re ready to tackle more robust PHP and MySQL integration.

Figure 5-4. Don’t be misled by this error. You don’t have an error in your SQL; you just have some overly simplistic code in your script. No worries, though, with a good base of regular expressions under your belt, you’re ready to tackle more robust PHP and MySQL integration.

Frankly, you could spend weeks writing all the code required to really handle every possible SQL query, make sure the right things are accepted and the wrong ones aren’t, and to handle all the various types of queries.

But that’s not a good idea. Just taking in any old SQL query is, in fact, a very bad idea. What’s a much better idea is to take a step back, and think about what your users really need. It’s probably not a blank form, and so in the next chapter, you’ll give them what they need: a normal web form that just happens to talk to MySQL on the back end.

Regular Expressions: To Infinity and Beyond

It’s not an overexaggeration to say you’ve just barely scratched the surface of regular expressions. Although you’ve got a strong grasp of the basics—from matching to ^ and $ and the various flavors of preg_match, from position and whitespace to + and * and sets—there are more than a few trees that have died to produce all the paper educating programmers about regular expressions.

But don’t be freaked out or daunted, and don’t stop working your PHP and MySQL skills until you’ve “mastered” regular expressions. First, mastery is pretty elusive, and even the best regular expression guys (and girls) dip into Google to remember how to get the right sequence of characters within their slashes. Practice makes perfect, so look for chances to use regular expressions. And, as you get better at PHP, you’ll use them more often, and they’ll slowly become as familiar as PHP itself, or HTML, or any of the other things you’ve been doing over and over.

POWER USERS’ CLINIC REGULAR EXPRESSIONS AREN’T JUST FOR PHP

As you’re probably seeing, it does take some work to get very far with regular expressions. There are lots of weird characters both to find on your keyboard, and to work into your expressions. And it doesn’t take long for a regular expression to start to look like something QBert might say: *SD)!!@8#.

But the world rewards you in more ways than you might realize. For instance, JavaScript also has complete support for regular expressions. Methods like replace() in JavaScript take in regular expressions, as do the match() methods on strings. So everything you’ve learned in PHP translates over, perfectly.

You also get some nice benefits in HTML5. You can use regular expressions in an HTML5 form to provide patterns against which data is validated. So your work in PHP is helping you out in almost every aspect of web programming.

In fact, there’s hardly a serious programming language that doesn’t support regular expressions. When you decide to learn Ruby and Ruby on Rails (and you should), you’ll be swimming in regular expressions, and they’re also hugely helpful as you move into using testing frameworks like Cucumber or Capybara or TestUnit. If all that sounds intimidating, relax! You’ve got regular expressions down, even before you’ve learned what lots of these languages are.

The moral of this story? What you’re learning about SQL applies to more than MySQL, and what you’re learning about regular expressions applies to more than PHP. Your skills are growing, so use them all over the place.