String Functions (Regular Expressions)¶
This subset of string functions can be used to perform operations on strings that match a regular expression (often referred to as “regex”).
In this Topic:
List of Regex Functions¶
|REGEXP||Alias for RLIKE.|
|REGEXP_LIKE||Alias for RLIKE.|
Regex General Usage Notes¶
In these topics, “subject” refers to the item to operate on and “pattern” refers to the regular expression:
The subject is typically a variable column, while the pattern is typically a constant, but this is not required; every argument to a regular expression function can be either a constant or variable.
Patterns support the full POSIX ERE (Extended Regular Expression) syntax. For details, see POSIX basic and extended (Wikipedia).
Patterns are implicitly anchored at both ends, e.g.
'^ABC$'. For example, to match any string starting with ABC, the pattern would be
Patterns additionally support the following Perl backslash-sequences:
Note that the backslash character needs to be escaped (i.e.
\). For more information, see Regex Escape Characters and Caveats in this topic.
By default, the POSIX wildcard character
.(in the pattern) does not include newline characters
\n(in the subject) as matches.
To also match newline characters, either replace
patternargument, or use the
sparameter in the
parametersargument (described below).
All the regular expression functions support Unicode. A single Unicode character always counts as one character (i.e. the POSIX meta-character
.matches exactly one Unicode character), regardless of the byte-length of the corresponding binary representation of that character. Also, for functions that take or return subject offsets, a single Unicode character counts as 1.
Most regular expression functions support an optional
parameters argument as the very last input. The
parameters argument is a VARCHAR string that specializes the matching
behavior of the regular expression function. The following parameters are supported:
||Enables case-sensitive matching.|
||Enables case-insensitive matching.|
||Enables multi-line mode (i.e. meta-characters
||Extracts sub-matches; applies only to REGEXP_INSTR and REGEXP_SUBSTR.|
||Enables the POSIX wildcard character
The default string is simply
c, which specifies:
- Case-sensitive matching.
- Single-line mode.
- No sub-match extraction, except for REGEXP_REPLACE, which always uses sub-match extraction.
- POSIX wildcard character
.does not match
When specifying multiple parameters, the string is entered with no spaces or delimiters. For example,
cis specifies case-sensitive matching in multi-line mode with POSIX wildcard matching.
i are included in the
parameters string, the one that occurs last in the string dictates whether the function performs case-sensitive or case-insensitive
Snowflake does not support backreferences in regular expression patterns (known as “squares” in formal language theory); however, backreferences are supported in the replacement string of the REGEXP_REPLACE function.
Regex Corner Cases¶
An empty pattern (i.e.
'') matches nothing, not even an empty subject.
The exception is REGEXP_LIKE, in which the empty pattern matches the empty subject because the pattern is implicitly anchored at both ends, i.e.
'' automatically becomes
An empty group (i.e. sub-expression
()), matches the space in between characters, including the beginning and end of the subject.
Regex Escape Characters and Caveats¶
As mandated by the POSIX standard, the single backslash character
\ is used to escape meta-characters (e.g.
\?). The backslash character is also used for so-called
Note that the backslash character is further used to insert control characters into SQL strings (e.g.
\n to insert a newline). As a result, to insert a single backslash character into
a SQL string literal, the backslash character needs to be escaped (i.e.
For example, to insert the backreference
\1 into a replacement string literal of REGEXP_LIKE, you may need to use