Regular expression language quick reference
The regex entities use a deterministic approach for matching values via regex expressions.
A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs. Regular expressions can contain both special and ordinary characters. Most ordinary characters, like ‘A’, ‘a’, or ‘0’, are the simplest regular expressions; they simply match themselves. You can concatenate ordinary characters, so last matches the string ‘last’.
Some characters, like ‘|’ or ‘(‘, are special. Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted.
Each section in this quick reference lists a particular category of characters, operators, and constructs that you can use to define regular expressions.
Supported regex features
The following sections describe supported regex features.
The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally.
Escaped character | Description | Pattern | Matches |
---|---|---|---|
\a | Matches a bell character, \u0007 | \a | “\u0007” in “Error!” + ‘\u0007’ |
\b | In a character class, matches a backspace, \u0008 | [\b]{3,} | “\b\b\b\b” in “\b\b\b\b” |
\t |
Matches a tab, \u0009. |
(\w+)\t | “item1\t”, “item2\t” in “item1\titem2\t” |
\r |
Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.) |
\r\n(\w+) |
“\r\nThese” in “\r\nThese are\ntwo lines.” |
\v |
Matches a vertical tab, \u000B. |
[\v]{2,} |
“\v\v\v” in “\v\v\v” |
\f |
Matches a form feed, \u000C. |
[\f]{2,} |
“\f\f\f” in “\f\f\f” |
\n |
Matches a new line, \u000A. |
\r\n(\w+) |
“\r\nThese” in “\r\nThese are\ntwo lines.” |
\ nnn |
Uses octal representation to specify a character (nnn consists of two or three digits). |
\w\040\w |
“a b”, “c d” in “a bc d” |
\x nn |
Uses hexadecimal representation to specify a character (nn consists of exactly two digits). |
\w\x20\w |
“a b”, “c d” in “a bc d” |
\u nnnn |
Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn). |
\w\u0020\w |
“a b”, “c d” in “a bc d” |
\ |
When followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character. For example, \* is the same as \x2A, and \. is the same as \x2E. This allows the regular expression engine to disambiguate language elements (such as * or ?) and character literals (represented by \* or \?). |
\d+[\+-x\*]\d+ |
“2+2” and “3*9” in “(2+2) * 3*9” |
A character class matches any one of a set of characters. Character classes include the language elements listed in the following table.
Character class | Description | Pattern | Matches |
---|---|---|---|
[ character_group ] |
Matches any single character in character_group |
[ae] |
“a” in “gray” |
[^ character_group ] |
Negation: Matches any single character that is not in character_group. By default, characters in character_group are case-sensitive. |
[^aei] |
“r”, “g”, “n” in “reign” |
[ first – last ] |
Character range: Matches any single character in the range from first to last. |
[A-Z] |
“A”, “B” in “AB123” |
. |
Wildcard: Matches any single character except \n. |
a.e |
“ave” in “nave” |
\w |
Matches any word character. |
\w |
“I”, “D”, “A”, “1”, “3” in “ID A1.3” |
\W |
Matches any non-word character. |
\W |
” “, “.” in “ID A1.3” |
\s |
Matches any white-space character. |
\w\s |
“D ” in “ID A1.3” |
\S |
Matches any non-white-space character. |
\s\S |
” _” in “int __ctr” |
\d |
Matches any decimal digit. |
\d |
“4” in “4 = IV” |
\D |
Matches any character other than a decimal digit. |
\D |
” “, “=”, ” “, “I”, “V” in “4 = IV” |
Anchors, or atomic zero-width assertions, cause a match to succeed or fail depending on the current position in the string, but they do not cause the engine to advance through the string or consume characters. The metacharacters listed in the following table are anchors.
Assertion | Description | Pattern | Matches |
---|---|---|---|
^ |
By default, the match must start at the beginning of the string; in multiline mode, it must start at the beginning of the line. |
^\d{3} |
“901” in “901-333-“ |
$ |
By default, the match must occur at the end of the string or before \n at the end of the string; in multiline mode, it must occur before the end of the line or before \n at the end of the line. |
-\d{3}$ |
“-333” in “-901-333” |
\A |
The match must occur at the start of the string. |
\A\d{3} |
“901” in “901-333-“ |
\Z |
The match must occur at the end of the string or before \n at the end of the string. |
-\d{3}\Z |
“-333” in “-901-333” |
\z |
The match must occur at the end of the string. |
-\d{3}\z |
“-333” in “-901-333” |
\b |
The match must occur on a boundary between a \w (alphanumeric) and a \W (nonalphanumeric) character. |
\b\w+\s\w+\b |
“them theme”, “them them” in “them theme them them” |
\B |
The match must not occur on a \b boundary. |
\Bend\w*\b |
“ends”, “ender” in “end sends endure lender” |
Grouping constructs delineate subexpressions of a regular expression and typically capture substrings of an input string. Grouping constructs include the language elements listed in the following table. Only non-capturing groups are supported.
Grouping constraints | Description | Pattern | Matches |
---|---|---|---|
(?: subexpression ) |
Defines a non-capturing group. |
(?:[0-9]{4})|(?:[0-9] [0-9] [0-9] [0-9]) |
“1234” in “the number is 1234” |
(?= subexpression ) |
Zero-width positive lookahead assertion. |
\w+(?=\.) |
“is”, “ran”, and “out” in “He is. The dog ran. The sun is out.” |
(?! subexpression ) |
Zero-width negative lookahead assertion. |
\b(?!un)\w+\b |
“sure”, “used” in “unsure sure unity used” |
(?<= subexpression ) |
Zero-width positive lookbehind assertion. |
(?<=19)\d{2}\b |
“99”, “50”, “05” in “1851 1999 1950 1905 2003” |
(?<! subexpression ) |
Zero-width negative lookbehind assertion. |
(?<!19)\d{2}\b |
“51”, “03” in “1851 1999 1950 1905 2003” |
A quantifier specifies how many instances of the previous element (which can be a character, a group, or a character class) must be present in the input string for a match to occur. Quantifiers include the language elements listed in the following table.
Quantifier | Description | Pattern | Matches |
---|---|---|---|
* |
Matches the previous element zero or more times. |
\d*\.\d |
“.0”, “19.9”, “219.9” |
+ |
Matches the previous element one or more times. |
“be+” |
“bee” in “been”, “be” in “bent” |
? |
Matches the previous element zero or one time. |
“rai?n” |
“ran”, “rain” |
{ n } |
Matches the previous element exactly n times. |
“,\d{3}” |
“,043” in “1,043.6”, “,876”, “,543”, and “,210” in “9,876,543,210” |
{ n ,} |
Matches the previous element at least n times. |
“\d{2,}” |
“166”, “29”, “1930” |
{ n , m } |
Matches the previous element at least n times, but no more than m times. |
“\d{3,5}” |
“166”, “17668” |
*?, +?, ?? |
Same as for *, + and ?, but as few times as possible. |
<.*?> |
The ‘*’, ‘+’, and ‘?’ qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against ‘<a> b <c>’, it will match the entire string, and not just ‘<a>’. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using the RE <.*?> will match only ‘<a>’. |
{ n }? |
Matches the preceding element exactly n times. |
“,\d{3}?” |
“,043” in “1,043.6”, “,876”, “,543”, and “,210” in “9,876,543,210” |
{ n ,}? |
Matches the previous element at least n times, but as few times as possible. |
“\d{2,}?” |
“166”, “29”, “1930” |
{ n , m }? |
Matches the previous element between n and m times, but as few times as possible. |
“\d{3,5}?” |
“166”, “17668” |
Alternation constructs modify a regular expression to enable either/or matching. These constructs include the language elements listed in the following table.
Quantifier | Description | Pattern | Matches |
---|---|---|---|
| |
Matches any one element separated by the vertical bar (|) character. |
th(?:e|is|at) |
“the”, “this” in “this is the day.” |