Regular Expression (String) Functions¶
These functions are a subset of string functions that can be used to perform operations on strings that match a regular expression.
In this Topic:
List of Functions¶
|REGEXP||Alias for RLIKE.|
|REGEXP_LIKE||Alias for RLIKE.|
General Usage Notes¶
For this discussion, “subject” refers to the item to operate on and “pattern” refers to the regular expression.
The subject is typically a variable column, while the pattern is typically a constant, but this is not required; every argument to a regular expression function can be either a constant or variable.
Patterns support the full POSIX Extended Regular Expression (ERE) syntax. See POSIX basic and extended (in Wikipedia) for details.
Patterns additionally support the following Perl backslash-sequences:
By default, the POSIX wildcard character
.(in the function pattern) does not include newline characters
\n(in the function subject) as matches. To also match newline characters, either replace
(.|\n)in the pattern argument, or use the
sparameter in the parameters argument (described below).
All the regular expression functions support Unicode. A single Unicode character always counts as one character (i.e. the POSIX meta-character
.will match exactly one Unicode character), regardless of the byte-length of the corresponding binary representation of that character. Also, for functions that take or return subject offsets, a single Unicode character counts as 1.
All regular expression functions support an optional parameters argument as the very last input. The parameters argument is a VARCHAR string that specializes the matching behavior of the regular expression function. The following parameters are supported:
|m||Multi-line mode, i.e. treats the meta-characters
|e||Extract sub-matches. Affects the behavior of REGEXP_INSTR and REGEXP_SUBSTR.|
|s||Allow the wildcard character
The default is
- Case-sensitive matching
- Single-line mode
- No sub-match extraction, except for REGEXP_REPLACE, which always uses sub-match extraction
.wildcard character does not match
If both the
i parameters are set, the last one in the parameters string dictates whether the function performs case-sensitive or case-insensitive matching.
Snowflake does not support backreferences (known as “squares” in formal language theory) in patterns; however, backreferences in the replacement string of the REGEXP_REPLACE function are supported.
An empty pattern
'' matches nothing, not even an empty subject.
The exception is REGEXP_LIKE, in which the empty pattern matches the empty subject because the pattern is implicitly anchored at both ends, i.e.
'' automatically becomes
An empty group, i.e. sub-expression
(), matches the space in between characters, including the beginning and end of the subject.
Escape Characters and Caveats¶
As mandated by the POSIX standard, the single backslash character
\ is used to escape meta-characters (e.g.
\?). The backslash character is also used for so-called backslash-sequences (e.g.
Note that the backslash character is further used to insert control characters into SQL strings (e.g.
\n to insert a newline). As a result, to insert a single backslash character into a SQL string literal, the backslash character
needs to be escaped, i.e.
Additionally, many command line clients, including HenPlus (and consequently
sfsql), pre-process user input for control characters. As a result, to insert a single backslash character into a SQL string literal from such a client,
the backslash character needs to be double-escaped, i.e.
For example, to insert the backreference
\1 into a replacement string literal of REGEXP_LIKE, you may need to use