String Functions (Regular Expressions)

This subset of string functions can be used to perform operations on strings that match a regular expression (often referred to as “regex”).

In this Topic:

List of Regex Functions

Function Notes

Regex General Usage Notes

In these topics, “subject” refers to the item to operate on and “pattern” refers to the regular expression:

  • The subject is typically a variable column, while the pattern is typically a constant, but this is not required; every argument to a regular expression function can be either a constant or variable.

  • Patterns support the full POSIX ERE (Extended Regular Expression) syntax. For details, see POSIX basic and extended (Wikipedia).

  • Patterns are implicitly anchored at both ends, e.g. 'ABC' automatically becomes '^ABC$'. For example, to match any string starting with ABC, the pattern would be 'ABC.*'.

  • Patterns additionally support the following Perl backslash-sequences:

    \d , \s , \w , \D , \S , \W , \b , \B

    For details, see Character classes (Wikipedia) or Backslash sequences (Perl documentation).

    Note that the backslash character needs to be escaped (i.e. \\ becomes \). For more information, see Regex Escape Characters and Caveats in this topic.

  • By default, the POSIX wildcard character . (in the pattern) does not include newline characters \n (in the subject) as matches.

    To also match newline characters, either replace . with (.|\n) in the pattern argument, or use the s parameter in the parameters argument (described below).

  • All the regular expression functions support Unicode. A single Unicode character always counts as one character (i.e. the POSIX meta-character . matches exactly one Unicode character), regardless of the byte-length of the corresponding binary representation of that character. Also, for functions that take or return subject offsets, a single Unicode character counts as 1.

Regex Parameters

Most regular expression functions support an optional parameters argument as the very last input. The parameters argument is a VARCHAR string that specializes the matching behavior of the regular expression function. The following parameters are supported:

Parameter Effect
c Enables case-sensitive matching.
i Enables case-insensitive matching.
m Enables multi-line mode (i.e. meta-characters ^ and $ mark the beginning and end of any line of the subject). By default, multi-line mode is disabled (i.e. ^ and $ mark the beginning and end of the entire subject).
e Extracts sub-matches; applies only to REGEXP_INSTR and REGEXP_SUBSTR.
s Enables the POSIX wildcard character . to match \n. By default, wildcard character matching is disabled.

The default string is simply c, which specifies:

  • Case-sensitive matching.
  • Single-line mode.
  • No sub-match extraction, except for REGEXP_REPLACE, which always uses sub-match extraction.
  • POSIX wildcard character . does not match \n newline characters.

When specifying multiple parameters, the string is entered with no spaces or delimiters. For example, cis specifies case-sensitive matching in multi-line mode with POSIX wildcard matching.

If both c and i are included in the parameters string, the one that occurs last in the string dictates whether the function performs case-sensitive or case-insensitive matching.

Regex Backreferences

Snowflake does not support backreferences in regular expression patterns (known as “squares” in formal language theory); however, backreferences are supported in the replacement string of the REGEXP_REPLACE function.

Regex Corner Cases

An empty pattern (i.e. '') matches nothing, not even an empty subject.

The exception is REGEXP_LIKE, in which the empty pattern matches the empty subject because the pattern is implicitly anchored at both ends, i.e. '' automatically becomes '^$'.

An empty group (i.e. sub-expression ()), matches the space in between characters, including the beginning and end of the subject.

Regex Escape Characters and Caveats

As mandated by the POSIX standard, the single backslash character \ is used to escape meta-characters (e.g. \* or \?). The backslash character is also used for so-called backslash-sequences (e.g. \w).

Note that the backslash character is further used to insert control characters into SQL strings (e.g. \n to insert a newline). As a result, to insert a single backslash character into a SQL string literal, the backslash character needs to be escaped (i.e. \\ becomes \).

For example, to insert the backreference \1 into a replacement string literal of REGEXP_LIKE, you may need to use \\\\1.