Oracle PL/SQL Programming (2014)
Appendix A. Regular Expression Metacharacters and Function Parameters
This appendix describes the various regular expression metacharacters available starting with Oracle Database 10g. It also provides a summary of the syntax of the REGEXP_ functions. For more details on Oracle’s regular expression support, see Chapter 8.
Metacharacters
The “Initial release” column in Table A-1 through Table A-3 indicates which metacharacters were introduced in Oracle Database 10g Release 1 and which in Release 2.
Table A-1. Character-matching metacharacters
Syntax |
Initial release |
Description |
. |
10gR1 |
Matches any single character except for newline. Will match newline when the n flag is set. On Windows, Linux, and Unix platforms, chr(10) is recognized as the newline. |
[ ... ] |
10gR1 |
Defines a matching list that matches any character listed between the brackets. You may specify ranges of characters, as in a–z. These ranges are interpreted based on the NLS_SORT setting. A dash (-) is a literal when it occurs first or last in the list (e.g., [abc-]). A closing bracket (]) is a literal when it occurs first in the list (e.g., []abc]). A caret (^) in the first position makes the list a nonmatching list (see the next entry). |
[^ ... ] |
10gR1 |
Matches any character not listed between the brackets. Referred to as a nonmatching list. |
[:class:] |
10gR1 |
Matches any character that belongs to the specified character class. May only be used within a matching list: [[:class:]abc] is a valid expression, but [:class:]abc is not. Table A-5 lists the valid character class names. |
[.coll.] |
10gR1 |
Matches the specified collation element, which may be one or more characters. May only be used within a matching list. For example, the expression [[.ch.]] matches the Spanish letter ch. Table A-4 lists the valid collation elements. |
[=char=] |
10gR1 |
Matches all characters that share the same base character as char. May be used only within a matching list. For example, [[=e=]] matches any of: “eéëèÉËÈE”. |
\d |
10gR2 |
Matches any digit. Equivalent to [[:digit:]]. |
\D |
10gR2 |
Matches any nondigit. Equivalent to [^[:digit:]]. |
\w |
10gR2 |
Matches any word character. Word characters are defined to be alphabetic characters, numeric characters, and the underscore. |
\W |
10gR2 |
Matches any nonword character. |
\s |
10gR2 |
Matches any whitespace character. Equivalent to [[:space:]]. |
\S |
10gR2 |
Matches nonwhitespace characters. Equivalent to [^[:space:]]. |
Table A-2. Quantifiers
Syntax |
Initial release |
Description |
? |
10gR1 |
Zero or one. |
* |
10gR1 |
Zero or more. |
+ |
10gR1 |
One or more. |
{m} |
10gR1 |
Exactly m occurrences. |
{m,} |
10gR1 |
At least m occurrences. |
{m,n} |
10gR1 |
At least m, and at most n occurrences. |
+? |
10gR2 |
One or more, but nongreedy. |
?? |
10gR2 |
Zero or one, but nongreedy. |
{m}? |
10gR2 |
The same as {m}. |
{m,}? |
10gR2 |
At least m occurrences, but nongreedy and stops as soon as m occurrences are reached. |
{m,n}? |
10gR2 |
At least m, and at most n occurrences, but nongreedy; when possible, m occurrences are matched. |
Table A-3. Other metacharacters
Syntax |
Initial release |
Description |
| |
10gR1 |
Specifies an alternation. An alternation within a subexpression doesn’t extend beyond the subexpression. |
( ... ) |
10gR1 |
Defines a subexpresson. |
\n |
10gR1 |
References the text matched by the nth subexpression. Backreferences may range from \1 through \9. |
\ |
10gR1 |
When not followed by a digit, the \ is an escape character. For example, use the pattern \\1 to look for a single backslash followed by the digit 1; use \( to look for an opening parenthesis (rather than begin a subexpression), etc. |
^ |
10gR1 |
Anchors an expression to the beginning of the string (in multiline mode, to the beginning of a line). |
$ |
10gR1 |
Anchors an expression to the end of the string (in multiline mode, to the end of a line). |
\A |
10gR2 |
Anchors an expression to the beginning of the string regardless of whether multiline mode is specified. |
\Z |
10gR2 |
Anchors an expression to the end of the string, or a newline that happens to be ending a string, regardless of whether multiline mode is specified. |
\z |
10gR2 |
Anchors an expression to the end of the string regardless of whether multiline mode is specified. |
Table A-4. Collation elements
NLS_SORT |
Multicharacter collation elements |
|
|
XCROATIAN |
d_ lj nj |
D_ LJ Nj |
D_ Lj NJ |
XCZECH |
Ch |
CH |
Ch |
XCZECH_PUNCTUATION |
Ch |
CH |
Ch |
XDANISH |
aa oe |
AA OE |
Aa Oe |
XHUNGARIAN |
cs gy ly ny sz ty zs |
CS GY LY NY SZ TY ZS |
Cs Gy Ly Ny Sz Ty Zs |
XSLOVAK |
dz d_ ch |
DZ D_ CH |
Dz D_ Ch |
XSPANISH |
ch ll |
CH LL |
Ch Ll |
Table A-5. Supported character classes
Class |
Description |
[:alnum:] |
Alphanumeric characters (same as [:alpha:] + [:digit:]) |
[:alpha:] |
Alphabetic characters only |
[:blank:] |
Blank space characters, such as space and tab |
[:cntrl:] |
Nonprinting (control) characters |
[:digit:] |
Numeric digits |
[:graph:] |
Graphical characters (same as [:punct:] + [:upper:] + [:lower:] + [:digit:]) |
[:lower:] |
Lowercase letters |
[:print:] |
Printable characters |
[:punct:] |
Punctuation characters |
[:space:] |
Whitespace characters such as space, formfeed, newline, carriage return, horizontal tab, and vertical tab |
[:upper:] |
Uppercase letters |
[:xdigit:] |
Hexadecimal characters |
Functions and Parameters
The following subsection shows the syntax of Oracle’s regular expression functions. The meaning of the parameters is shown in Regular Expression Parameters.
Regular Expression Functions
The syntax for each regular expression function is shown next.
REGEXP_COUNT (Oracle Database 11g and later)
Returns a tally of occurrences of an expression in a target string. The syntax is:
REGEXP_COUNT(source_string, expression
[, position
[, match_parameter]])
REGEXP_INSTR
Returns the character position at which text can be found matching a regular expression in a target string. The syntax is:
REGEXP_INSTR(source_string, expression
[, position [, occurrence
[, return_option
[, match_parameter
[, subexpression]]]]])
REGEXP_LIKE
Determines whether a given string contains text matching an expression. This is a Boolean function, returning TRUE, FALSE, or NULL. The syntax is:
REGEXP_LIKE (source_string, expression
[, match_parameter])
REGEXP_REPLACE
Performs a regular expression search-and-replace operation (see Chapter 8 for details). The syntax is:
REGEXP_REPLACE(source_string, expression
[, replace_string
[, position [, occurrence
[, match_parameter]]]])
REGEXP_SUBSTR
Extracts text matching a regular expression from a string. The syntax is:
REGEXP_SUBSTR(source_string, expression
[, position [, occurrence
[, match_parameter
[, subexpression]]]])
Regular Expression Parameters
These are the parameters that may be included in the regular expression functions described in the preceding subsection:
source_string
Is a string to be searched.
expression
Is a regular expression describing the pattern of text that you seek.
replace_string
Is a string generating the replacement text to be used in a search-and-replace operation.
position
Is the character position within source_string at which to begin a search. This defaults to 1.
occurrence
Is the occurrence of the pattern you want to locate. This defaults to 1, giving you the first possible match.
return_option
Is valid only for REGEXP_INSTR, and determines whether the beginning or ending character position is returned for text matching a pattern. The default is 0, for the beginning. Use 1 to return the ending position.
match_parameter
Is a text string through which you may specify options to vary the behavior of the regular expression matching engine:
‘c’
Requests a case-sensitive search. (By default, your NLS_SORT setting determines whether a search is case-sensitive.)
‘i’
Requests a case insensitive search.
‘n’
Allows the period to match newline characters. By default, the period does not match newlines.
‘m’
Changes the definition of line with respect to the ^ and $ metacharacters. By default, line means the entire target string. Using the m option, however, causes the definition of line to change from the entire target string to any line within that string, where lines are delimited by newline characters.
subexpression (Oracle Database 11g and later)
Is a number (0–9) identifying which subexpression to match on. The default is 0 and signifies that subexpressions will not be used.
You can specify multiple match parameters in any order. For example, ‘in’ means the same as ‘ni’. If you specify conflicting options (such as ‘ic’), the last option (‘c’, in this case) is the one that takes precedence.