Regular Expressions - Object-Oriented JavaScript Second Edition (2013)

Object-Oriented JavaScript Second Edition (2013)

Appendix D. Regular Expressions

When you use regular expressions (discussed in Chapter 4, Objects), you can match literal strings, for example:

> "some text".match(/me/);

["me"]

But, the true power of regular expressions comes from matching patterns, not literal strings. The following table describes the different syntax you can use in your patterns, and provides some examples of their use:

Pattern

Description

[abc]

Matches a class of characters.

> "some text".match(/[otx]/g);

["o", "t", "x", "t"]

[a-z]

A class of characters defined as a range. For example, [a-d] is the same as [abcd], [a-z] matches all lowercase characters, [a-zA-Z0-9_] matches all characters, numbers, and the underscore character.

> "Some Text".match(/[a-z]/g);

["o", "m", "e", "e", "x", "t"]

> "Some Text".match(/[a-zA-Z]/g);

["S", "o", "m", "e", "T", "e", "x", "t"]

[^abc]

Matches everything that is not matched by the class of characters.

> "Some Text".match(/[^a-z]/g);

["S", " ", "T"]

a|b

Matches a or b. The pipe character means OR, and it can be used more than once.

> "Some Text".match(/t|T/g);

["T", "t"]

> "Some Text".match(/t|T|Some/g);

["Some", "T", "t"]

a(?=b)

Matches a only if followed by b.

> "Some Text".match(/Some(?=Tex)/g);

null

> "Some Text".match(/Some(?= Tex)/g);

["Some"]

a(?!b)

Matches a only when not followed by b.

> "Some Text".match(/Some(?! Tex)/g);

null

> "Some Text".match(/Some(?!Tex)/g);

["Some"]

\

Escape character used to help you match the special characters used in patterns as literals.

> "R2-D2".match(/[2-3]/g);["2", "2"]

> "R2-D2".match(/[2\-3]/g);["2", "-", "2"

]

\n

\r

\f

\t

\v

New line

Carriage return

Form feed

Tab

Vertical tab

\s

White space, or any of the previous five escape sequences.

> "R2\n D2".match(/\s/g);

["\n", " "]

\S

Opposite of the above; matches everything but white space. Same as [^\s]:

> "R2\n D2".match(/\S/g);

["R", "2", "D", "2"]

\w

Any letter, number, or underscore. Same as [A-Za-z0-9_].

> "S0m3 text!".match(/\w/g);

["S", "0", "m", "3", "t", "e", "x", "t"]

\W

Opposite of \w.

> "S0m3 text!".match(/\W/g);

[" ", "!"

]

\d

Matches a number, same as [0-9].

> "R2-D2 and C-3PO".match(/\d/g);["2", "2", "3"]

\D

Opposite of \d; matches non-numbers, same as [^0-9] or [^\d].

> "R2-D2 and C-3PO".match(/\D/g);

["R", "-", "D", " ", "a", "n", "d", " ", "C", "-", "P", "O"]

\b

Matches a word boundary such as space or punctuation.

Matching R or D followed by 2:

> "R2D2 and C-3PO".match(/[RD]2/g);

["R2", "D2"]

Same as above but only at the end of a word:

> "R2D2 and C-3PO".match(/[RD]2\b/g);

["D2"]

Same pattern but the input has a dash, which is also an end of a word:

> "R2-D2 and C-3PO".match(/[RD]2\b/g);

["R2", "D2"]

\B

The opposite of \b.

> "R2-D2 and C-3PO".match(/[RD]2\B/g);

null

> "R2D2 and C-3PO".match(/[RD]2\B/g);

["R2"]

[\b]

Matches the backspace character.

\0

The null character.

\u0000

Matches a Unicode character, represented by a four-digit hexadecimal number.

> "стоян".match(/\u0441\u0442\u043E/);

["сто"]

\x00

Matches a character code represented by a two-digit hexadecimal number.

> "\x64";

"d"

> "dude".match(/\x64/g);

["d", "d"]

^

The beginning of the string to be matched. If you set the m modifier (multi-line), it matches the beginning of each line.

> "regular\nregular\nexpression".match(/r/g);

["r", "r", "r", "r", "r"]

> "regular\nregular\nexpression".match(/^r/g);

["r"]

> "regular\nregular\nexpression".match(/^r/mg);

["r", "r"]

$

Matches the end of the input or, when using the multiline modifier, the end of each line.

> "regular\nregular\nexpression".match(/r$/g);

null

> "regular\nregular\nexpression".match(/r$/mg);

["r", "r"]

.

Matches any single character except for the new line and the line feed.

> "regular".match(/r./g);

["re"]

> "regular".match(/r.../g);

["regu"]

*

Matches the preceding pattern if it occurs zero or more times. For example, /.*/ will match anything including nothing (an empty input).

> "".match(/.*/);

[""]

> "anything".match(/.*/);

["anything"]

> "anything".match(/n.*h/);

["nyth"]

Keep in mind that the pattern is "greedy", meaning it will match as much as possible:

> "anything within".match(/n.*h/g);

["nything with"]

?

Matches the preceding pattern if it occurs zero or one times.

> "anything".match(/ny?/g);

["ny", "n"]

+

Matches the preceding pattern if it occurs at least once (or more times).

> "anything".match(/ny+/g);

["ny"]

> "R2-D2 and C-3PO".match(/[a-z]/gi);

["R", "D", "a", "n", "d", "C", "P", "O"]

> "R2-D2 and C-3PO".match(/[a-z]+/gi);

["R", "D", "and", "C", "PO"]

{n}

Matches the preceding pattern if it occurs exactly n times.

> "regular expression".match(/s/g);

["s", "s"]

> "regular expression".match(/s{2}/g);

["ss"]

> "regular expression".match(/\b\w{3}/g);

["reg", "exp"]

{min,max}

Matches the preceding pattern if it occurs between a min and max number of times. You can omit max, which will mean no maximum, but only a minimum. You cannot omit min.

An example where the input is "doodle" with the "o" repeated 10 times:

> "doooooooooodle".match(/o/g);

["o", "o", "o", "o", "o", "o", "o", "o", "o", "o"]

> "doooooooooodle".match(/o/g).length;

10

> "doooooooooodle".match(/o{2}/g);

["oo", "oo", "oo", "oo", "oo"]

> "doooooooooodle".match(/o{2,}/g);

["oooooooooo"]

> "doooooooooodle".match(/o{2,6}/g);

["oooooo", "oooo"]

(pattern)

When the pattern is in parentheses, it is remembered so that it can be used for replacements. These are also known as capturing patterns.

The captured matches are available as $1, $2,... $9

Matching all "r" occurrences and repeating them:

> "regular expression".replace(/(r)/g, '$1$1');

"rregularr exprression"

Matching "re" and turning it to "er":

> "regular expression".replace(/(r)(e)/g, '$2$1');

"ergular experssion"

(?:pattern)

Non-capturing pattern, not remembered and not available in $1, $2...

Here's an example of how "re" is matched, but the "r" is not remembered and the second pattern becomes $1:

> "regular expression".replace(/(?:r)(e)/g, '$1$1');

"eegular expeession"

Make sure you pay attention when a special character can have two meanings, as is the case with ^, ?, and \b.