Regular Expression (String) Functions

These functions are a subset of string functions that can be used to perform operations on strings that match a regular expression.

In this Topic:

List of Functions

Function Notes
REGEXP Alias for RLIKE.
REGEXP_COUNT  
REGEXP_INSTR  
REGEXP_LIKE Alias for RLIKE.
REGEXP_REPLACE  
REGEXP_SUBSTR  
RLIKE  

General Usage Notes

For this discussion, “subject” refers to the item to operate on and “pattern” refers to the regular expression.

  • The subject is typically a variable column, while the pattern is typically a constant, but this is not required; every argument to a regular expression function can be either a constant or variable.

  • Patterns support the full POSIX Extended Regular Expression (ERE) syntax. For details, see POSIX basic and extended.

  • Patterns additionally support the following Perl backslash-sequences:

    \d, \s, \w, \D, \S, \W, \b, \B

    For details, see Character classes or Backslash sequences.

  • By default, the POSIX wildcard character . (in the function pattern) does not include newline characters \n (in the function subject) as matches. To also match newline characters, either replace . with (.|\n) in the pattern argument, or use the s parameter in the parameters argument (described below).

  • All the regular expression functions support Unicode. A single Unicode character always counts as one character (i.e. the POSIX meta-character . will match exactly one Unicode character), regardless of the byte-length of the corresponding binary representation of that character. Also, for functions that take or return subject offsets, a single Unicode character counts as 1.

Parameters Argument

All regular expression functions support an optional parameters argument as the very last input. The parameters argument is a VARCHAR string that specializes the matching behavior of the regular expression function. The following parameters are supported:

Parameter Effect
c Case-sensitive matching.
i Case-insensitive matching.
m Multi-line mode, i.e. treats the meta-characters ^ and $ as the beginning and end of any line of the subject. By default, ^ and $ mark the beginning and end of the entire subject.
e Extract sub-matches. Affects the behavior of REGEXP_INSTR and REGEXP_SUBSTR.
s Allow the wildcard character . to match \n (default is False).

The default is c, i.e.:

  • Case-sensitive matching
  • Single-line mode
  • No sub-match extraction, except for REGEXP_REPLACE, which always uses sub-match extraction
  • The . wildcard character does not match \n newline characters

If both the c and i parameters are set, the last one in the parameters string dictates whether the function performs case-sensitive or case-insensitive matching.

Limitations

Snowflake does not support backreferences (known as “squares” in formal language theory) in patterns; however, backreferences in the replacement string of the REGEXP_REPLACE function are supported.

Corner Cases

An empty pattern '' matches nothing, not even an empty subject.

The exception is REGEXP_LIKE, in which the empty pattern matches the empty subject because the pattern is implicitly anchored at both ends, i.e. '' automatically becomes '^$'.

An empty group, i.e. sub-expression (), matches the space in between characters, including the beginning and end of the subject.

Escape Characters and Caveats

As mandated by the POSIX standard, the single backslash character \ is used to escape meta-characters (e.g. \* or \?). The backslash character is also used for so-called backslash-sequences (e.g. \w).

Note that the backslash character is further used to insert control characters into SQL strings (e.g. \n to insert a newline). As a result, to insert a single backslash character into a SQL string literal, the backslash character needs to be escaped, i.e. \\ becomes \.

Additionally, many command line clients, including HenPlus (and consequently sfsql), pre-process user input for control characters. As a result, to insert a single backslash character into a SQL string literal from such a client, the backslash character needs to be double-escaped, i.e. \\\\ becomes \.

For example, to insert the backreference \1 into a replacement string literal of REGEXP_LIKE, you may need to use \\\\1.