POSIX Metacharacters - Introduction to Regular Expressions in SAS (2014)

Introduction to Regular Expressions in SAS (2014)

Appendix C. POSIX Metacharacters

Throughout the book, we discussed metacharacters of all types that adhere to Perl standards (de facto standard across the industry) for implementation since they are what SAS uses. And they are all that you need when you’re running within the SAS environment. However, if you ever need to push the RegEx processing to a system outside of SAS, there is no guarantee that they will always work because not all systems use Perl syntax (mostly older systems don’t).

Note: When you are attempting this more advanced application, know the parameters of the system you are using. You might not need to change the RegEx coding.

The exact applications of the metacharacters described in this appendix are outside the scope of this text but are provided here for the advanced reader who is interested in them. For example, although we have not covered it, POSIX metacharacters might be needed when you are performing in-database fuzzy matching with PROC SQL.

[[:alpha:]]

This metacharacter matches any alphabetic character and is equivalent to [a-zA-Z].

[[:^alpha:]]

This metacharacter matches any non-alphabetic character and is equivalent to [^a-zA-Z].

[[:alnum:]]

This metacharacter matches any alphanumeric character and is equivalent to [a-zA-Z0-9].

[[:^alnum:]]

This metacharacter matches any non-alphanumeric character and is equivalent to [^a-zA-Z0-9].

[[:ascii:]]

This metacharacter matches any ASCII character and is equivalent to [\0-\177] (i.e., it does not match UNICODE).

[[:^ascii:]]

This metacharacter matches any non-ASCII character and is equivalent to [^\0-\177] (i.e., it matches UNICODE).

[[:blank:]]

This metacharacter matches any blank character.

[[:^blank:]]

This metacharacter matches any non-blank character.

[[:cntrl:]]

This metacharacter matches any control character.

[[:^cntrl:]]

This metacharacter matches any non-control character.

[[:digit:]]

This metacharacter matches any digit character and is equivalent to \d or [0-9].

[[:^digit:]]

This metacharacter matches any non-digit character and is equivalent to \D and [^0-9].

[[:graph:]]

This metacharacter matches any visible character and is equivalent to [[:alnum:][:punct:]]. In other words, if you can see it when printed on a piece of paper, then it is matched by this metacharacter.

[[:^graph:]]

This metacharacter matches any non-printing character and is equivalent to [^[:alnum:][:punct:]]. If you can’t see it printed on a piece of paper, then it is matched by this metacharacter.

[[:lower:]]

This metacharacter matches any lowercase alphabetic character and is equivalent to [a-z].

[[:^lower:]]

This metacharacter matches anything except a lowercase alphabetic character and is equivalent to [^a-z].

[[:print:]]

This metacharacter prints a string of characters—any characters encountered.

[[:^print:]]

This metacharacter does not print any characters.

[[:punct:]]

This metacharacter matches any visible punctuation or symbol character.

[[:^punct:]]

This metacharacter matches anything except visible punctuation or symbol characters.

[[:space:]]

This metacharacter matches any space character and is equivalent to \s.

[[:^space:]]

This metacharacter matches anything except a space character and is equivalent to \S.

[[:upper:]]

This metacharacter matches any uppercase alphabetic characters and is equivalent to [A-Z].

[[:^upper:]]

This metacharacter matches all non-uppercase alphabetic characters and is equivalent to [^A-Z].

[[:word:]]

This metacharacter matches any word character encountered and is equivalent to \w.

[[:^word:]]

This metacharacter matches any non-word characters and is equivalent to \W.

[[:xdigit:]]

This metacharacter matches any hexadecimal character.

[[:^xdigit:]]

This metacharacter does not match a hexadecimal character.