Categories:

String Functions (Regular Expressions)

REGEXP_INSTR

Returns the position of the specified occurrence of the regular expression pattern in the string subject. If no match is found, returns 0.

See also String Functions (Regular Expressions).

Syntax

REGEXP_INSTR( <subject> , <pattern> [ , <position> [ , <occurrence> [ , <option> [ , <regexp_parameters> [ , <group_num> ] ] ] ] ] )

Arguments

Required:

subject

The string to search for matches.

pattern

Pattern to match.

Optional:

position

Number of characters from the beginning of the string where the function starts searching for matches.

Default: 1 (the search for a match starts at the first character on the left)

occurrence

Specifies which occurrence of the pattern to match. The function skips the first occurrence - 1 matches.

Default: 1

option

Specifies whether to return the offset of the first character of the match (0) or the offset of the first character following the end of the match (1).

Default: 0

regexp_parameters

String of one or more characters that specifies the regular expression parameters used for searching for matches. The supported values are:

  • c: case-sensitive.

  • i: case-insensitive.

  • m: multi-line mode.

  • e: extract sub-matches.

  • s: the ‘.’ wildcard also matches newline.

For more details, see the regular expression parameters documentation.

Default: c

Note

By default, REGEXP_INSTR returns the begin or end character offset for the entire matching part of the subject. However, if the e (for “extract”) parameter is specified, REGEXP_INSTR returns the begin or end character offset for the part of the subject that matches the first sub-expression in the pattern. If e is specified but a group_num is not also specified, then the group_num defaults to 1 (the first group). If there is no sub-expression in the pattern, REGEXP_INSTR behaves as if e was not set. For examples that use e, see Examples in this topic.

group_num

The group_num parameter specifies which group to extract. Groups are specified by using parentheses in the regular expression.

If a group_num is specified, Snowflake allows extraction even if the 'e' option was not also specified. The 'e' is implied.

Snowflake supports up to 1024 groups.

For examples that use group_num, see the Examples in this topic.

Usage Notes

  • Positions are 1-based, not 0-based. For example, the position of the letter “M” in “MAN” is 1, not 0.

  • For additional usage notes, see the General Usage Notes for regular expression functions.

Collation Details

Arguments with collation specifications are currently not supported.

Examples

Basic Examples

The next few examples use this data:

CREATE TABLE demo1 (id INT, string1 VARCHAR);
INSERT INTO demo1 (id, string1) VALUES 
    (1, 'nevermore1, nevermore2, nevermore3.')
    ;

Search for a matching string. In this case, the string is “nevermore” followed by a single decimal digit, e.g. “nevermore1”:

select id, string1,
      regexp_substr(string1, 'nevermore\\d') AS "SUBSTRING", 
      regexp_instr( string1, 'nevermore\\d') AS "POSITION"
    from demo1
    order by id;
+----+-------------------------------------+------------+----------+
| ID | STRING1                             | SUBSTRING  | POSITION |
|----+-------------------------------------+------------+----------|
|  1 | nevermore1, nevermore2, nevermore3. | nevermore1 |        1 |
+----+-------------------------------------+------------+----------+

Search for a matching string, but starting at the 5th character in the string, rather than at the 1st character in the string:

select id, string1,
      regexp_substr(string1, 'nevermore\\d', 5) AS "SUBSTRING", 
      regexp_instr( string1, 'nevermore\\d', 5) AS "POSITION"
    from demo1
    order by id;
+----+-------------------------------------+------------+----------+
| ID | STRING1                             | SUBSTRING  | POSITION |
|----+-------------------------------------+------------+----------|
|  1 | nevermore1, nevermore2, nevermore3. | nevermore2 |       13 |
+----+-------------------------------------+------------+----------+

Search for a matching string, but look for the 3rd match rather than the 1st match:

select id, string1,
      regexp_substr(string1, 'nevermore\\d', 1, 3) AS "SUBSTRING", 
      regexp_instr( string1, 'nevermore\\d', 1, 3) AS "POSITION"
    from demo1
    order by id;
+----+-------------------------------------+------------+----------+
| ID | STRING1                             | SUBSTRING  | POSITION |
|----+-------------------------------------+------------+----------|
|  1 | nevermore1, nevermore2, nevermore3. | nevermore3 |       25 |
+----+-------------------------------------+------------+----------+

This query is nearly identical the the previous query, but this shows how to use the option parameter to indicate whether you want the position of the matching expression, or the position of the first character after the matching expression:

select id, string1,
       regexp_substr(string1, 'nevermore\\d', 1, 3) AS "SUBSTRING", 
       regexp_instr( string1, 'nevermore\\d', 1, 3, 0) AS "START_POSITION",
       regexp_instr( string1, 'nevermore\\d', 1, 3, 1) AS "AFTER_POSITION"
    from demo1
    order by id;
+----+-------------------------------------+------------+----------------+----------------+
| ID | STRING1                             | SUBSTRING  | START_POSITION | AFTER_POSITION |
|----+-------------------------------------+------------+----------------+----------------|
|  1 | nevermore1, nevermore2, nevermore3. | nevermore3 |             25 |             35 |
+----+-------------------------------------+------------+----------------+----------------+

This query shows that if you search for an occurrence beyond the last actual occurrence, the position returned is 0:

select id, string1, 
       regexp_substr(string1, 'nevermore', 1, 4) AS "SUBSTRING",
       regexp_instr( string1, 'nevermore', 1, 4) AS "POSITION"
    from demo1
    order by id;
+----+-------------------------------------+-----------+----------+
| ID | STRING1                             | SUBSTRING | POSITION |
|----+-------------------------------------+-----------+----------|
|  1 | nevermore1, nevermore2, nevermore3. | NULL      |        0 |
+----+-------------------------------------+-----------+----------+

Examples of Capture Groups

This section shows how to use the “group” feature of regular expressions.

The first few examples in this section don’t use capture groups; the section starts with some simple examples and then continues on with examples that use capture groups.

These examples use the strings created below:

CREATE TABLE demo2 (id INT, string1 VARCHAR);
INSERT INTO demo2 (id, string1) VALUES 
    -- A string with multiple occurrences of the word "the".
    (2, 'It was the best of times, it was the worst of times.'),
    -- A string with multiple occurrences of the word "the" and with extra
    -- blanks between words.
    (3, 'In    the   string   the   extra   spaces  are   redundant.'),
    -- A string with the character sequence "the" inside multiple words 
    -- ("thespian" and "theater"), but without the word "the" by itself.
    (4, 'A thespian theater is nearby.')
    ;

The next example looks for:

  • the word “the”

  • followed by one or more non-word characters (for example, the whitespace separating words)

  • followed by one or more word characters.

“Word characters” include not only the letters a-z and A-Z, but also the underscore (“_”) and the decimal digits 0-9, but not whitespace, punctuation, etc.

select id, string1,
       regexp_substr(string1, 'the\\W+\\w+') as "SUBSTRING",
       regexp_instr(string1, 'the\\W+\\w+') as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+--------------+----------+
| ID | STRING1                                                     | SUBSTRING    | POSITION |
|----+-------------------------------------------------------------+--------------+----------|
|  2 | It was the best of times, it was the worst of times.        | the best     |        8 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | the   string |        7 |
|  4 | A thespian theater is nearby.                               | NULL         |        0 |
+----+-------------------------------------------------------------+--------------+----------+

Starting from position 1 of the string, look for the 2nd occurrence of

  • the word “the”

  • followed by one or more non-word characters

  • followed by one or more word characters.

select id, string1, 
       regexp_substr(string1, 'the\\W+\\w+', 1, 2) as "SUBSTRING",
       regexp_instr(string1, 'the\\W+\\w+', 1, 2) as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+-------------+----------+
| ID | STRING1                                                     | SUBSTRING   | POSITION |
|----+-------------------------------------------------------------+-------------+----------|
|  2 | It was the best of times, it was the worst of times.        | the worst   |       34 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | the   extra |       22 |
|  4 | A thespian theater is nearby.                               | NULL        |        0 |
+----+-------------------------------------------------------------+-------------+----------+

This example is similar to the preceding example, but adds capture groups. Rather than returning the position of the entire match, this query returns the position of only the “group”, i.e. the portion of the substring that matches the part of the regular expression in parentheses. In this case, the returned value should be the position of the word after “the”.

select id, string1,
       regexp_substr(string1, 'the\\W+(\\w+)', 1, 2,    'e', 1) as "SUBSTRING",
       regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, 'e', 1) as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1                                                     | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
|  2 | It was the best of times, it was the worst of times.        | worst     |       38 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | extra     |       28 |
|  4 | A thespian theater is nearby.                               | NULL      |        0 |
+----+-------------------------------------------------------------+-----------+----------+

If you specify the 'e' (“extract”) parameter but don’t specify the group_num, then the group_num defaults to 1:

select id, string1,
       regexp_substr(string1, 'the\\W+(\\w+)', 1, 2,    'e') as "SUBSTRING",
       regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, 'e') as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1                                                     | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
|  2 | It was the best of times, it was the worst of times.        | worst     |       38 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | extra     |       28 |
|  4 | A thespian theater is nearby.                               | NULL      |        0 |
+----+-------------------------------------------------------------+-----------+----------+

If you specify a group_num, Snowflake assumes that you want to extract, even if you didn’t specify 'e' (“extract”) as one of the parameters:

select id, string1,
       regexp_substr(string1, 'the\\W+(\\w+)', 1, 2,    '', 1) as "SUBSTRING",
       regexp_instr( string1, 'the\\W+(\\w+)', 1, 2, 0, '', 1) as "POSITION"
    from demo2
    order by id;
+----+-------------------------------------------------------------+-----------+----------+
| ID | STRING1                                                     | SUBSTRING | POSITION |
|----+-------------------------------------------------------------+-----------+----------|
|  2 | It was the best of times, it was the worst of times.        | worst     |       38 |
|  3 | In    the   string   the   extra   spaces  are   redundant. | extra     |       28 |
|  4 | A thespian theater is nearby.                               | NULL      |        0 |
+----+-------------------------------------------------------------+-----------+----------+

This example shows how to retrieve the position of second word from the first, second, and third matches of a two-word pattern in which the first word is A. This also shows that trying to go beyond the last pattern causes Snowflake to return 0.

CREATE TABLE demo3 (id INT, string1 VARCHAR);
INSERT INTO demo3 (id, string1) VALUES
    (5, 'A MAN A PLAN A CANAL')
    ;
select id, string1,
    regexp_substr(string1, 'A\\W+(\\w+)', 1, 1,    'e', 1) as "SUBSTRING1",
    regexp_instr( string1, 'A\\W+(\\w+)', 1, 1, 0, 'e', 1) as "POSITION1",
    regexp_substr(string1, 'A\\W+(\\w+)', 1, 2,    'e', 1) as "SUBSTRING2",
    regexp_instr( string1, 'A\\W+(\\w+)', 1, 2, 0, 'e', 1) as "POSITION2",
    regexp_substr(string1, 'A\\W+(\\w+)', 1, 3,    'e', 1) as "SUBSTRING3",
    regexp_instr( string1, 'A\\W+(\\w+)', 1, 3, 0, 'e', 1) as "POSITION3",
    regexp_substr(string1, 'A\\W+(\\w+)', 1, 4,    'e', 1) as "SUBSTRING4",
    regexp_instr( string1, 'A\\W+(\\w+)', 1, 4, 0, 'e', 1) as "POSITION4"
    from demo3;
+----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------+
| ID | STRING1              | SUBSTRING1 | POSITION1 | SUBSTRING2 | POSITION2 | SUBSTRING3 | POSITION3 | SUBSTRING4 | POSITION4 |
|----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------|
|  5 | A MAN A PLAN A CANAL | MAN        |         3 | PLAN       |         9 | CANAL      |        16 | NULL       |         0 |
+----+----------------------+------------+-----------+------------+-----------+------------+-----------+------------+-----------+

This example shows how to retrieve the position of first, second, and third groups within the first occurrence of the pattern. In this case, the returned values are the positions of the individual letters of the word MAN.

select id, string1, 
    regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1,    'e', 1) as "SUBSTR1",
    regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 1) as "POS1",
    regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1,    'e', 2) as "SUBSTR2",
    regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 2) as "POS2",
    regexp_substr(string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1,    'e', 3) as "SUBSTR3",
    regexp_instr( string1, 'A\\W+(\\w)(\\w)(\\w)', 1, 1, 0, 'e', 3) as "POS3"
    from demo3;
+----+----------------------+---------+------+---------+------+---------+------+
| ID | STRING1              | SUBSTR1 | POS1 | SUBSTR2 | POS2 | SUBSTR3 | POS3 |
|----+----------------------+---------+------+---------+------+---------+------|
|  5 | A MAN A PLAN A CANAL | M       |    3 | A       |    4 | N       |    5 |
+----+----------------------+---------+------+---------+------+---------+------+

Additional Examples

The following example matches occurrences of the word was. Matching begins at the 1st character in the string and returns the position in the string of the character following the first occurrence:

select regexp_instr('It was the best of times, it was the worst of times', '\\bwas\\b', 1, 1) as "result" from dual;

+--------+
| result |
|--------|
|      4 |
+--------+

The following example returns the offset of the first character of the part of the string that matches the pattern. Matching begins at the 1st character in the string and returns the first occurrence of the pattern:

select regexp_instr('It was the best of times, it was the worst of times', 'the\\W+(\\w+)',1,1,0) as "result" from dual;

+--------+
| result |
|--------|
|      8 |
+--------+

The following example is the same as the previous example, but uses the e parameter to return the character offset for the part of the subject that matches the first sub-expression in the pattern (i.e. the first set of word characters after the):

select regexp_instr('It was the best of times, it was the worst of times', 'the\\W+(\\w+)',1,1,0,'e') as "result" from dual;

+--------+
| result |
|--------|
|     12 |
+--------+

The following example matches occurrences of words ending in st preceded by 2 or more alphabetic characters (case-insensitive). Matching begins at the 15th character in the string and returns the position in the string of the character following the first occurrence:

select regexp_instr('It was the best of times, it was the worst of times', '[[:alpha:]]{2,}st', 15, 1) as "result" from dual;

+--------+
| result |
|--------|
|     38 |
+--------+

Prepare the examples:

create or replace table message(body varchar(255));
insert into message values
('Hellooo World'),
('How are you doing today?'),
('the quick brown fox jumps over the lazy dog'),
('PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS');

Return the offset of the first character in the first match:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b') as result from message;

---------------------------------------------+-----------------------------------+
                    body                     | result                            |
---------------------------------------------+-----------------------------------+
 Hellooo World                               | 1                                 |
 How are you doing today?                    | 1                                 |
 the quick brown fox jumps over the lazy dog | 11                                |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 0                                 |
---------------------------------------------+-----------------------------------+

First character in the first match, starting at the third character in subject:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3) as result from message;

---------------------------------------------+--------------------------------------+
                    body                     | result                               |
---------------------------------------------+--------------------------------------+
 Hellooo World                               | 3                                    |
 How are you doing today?                    | 9                                    |
 the quick brown fox jumps over the lazy dog | 11                                   |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 0                                    |
---------------------------------------------+--------------------------------------+

First character in the third match, starting at the third character in subject:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3) as result from message;

---------------------------------------------+-----------------------------------------+
                    body                     | result                                  |
---------------------------------------------+-----------------------------------------+
 Hellooo World                               | 0                                       |
 How are you doing today?                    | 19                                      |
 the quick brown fox jumps over the lazy dog | 27                                      |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 0                                       |
---------------------------------------------+-----------------------------------------+

Last character in the third match, starting at the third character in subject:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3, 1) as result from message;

---------------------------------------------+--------------------------------------------+
                    body                     | result                                     |
---------------------------------------------+--------------------------------------------+
 Hellooo World                               | 0                                          |
 How are you doing today?                    | 24                                         |
 the quick brown fox jumps over the lazy dog | 31                                         |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 0                                          |
---------------------------------------------+--------------------------------------------+

Last character in the third match, starting at the third character in subject, case-insensitive matching:

select body, regexp_instr(body, '\\b\\S*o\\S*\\b', 3, 3, 1, 'i') as result from message;

---------------------------------------------+-------------------------------------------------+
                    body                     | result                                          |
---------------------------------------------+-------------------------------------------------+
 Hellooo World                               | 0                                               |
 How are you doing today?                    | 24                                              |
 the quick brown fox jumps over the lazy dog | 31                                              |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 35                                              |
---------------------------------------------+-------------------------------------------------+
select body, regexp_instr(body, '\\S*(o)\\S*\\b', 1, 1, 0, 'i') as result from message;

---------------------------------------------+-------------------------------------------------+
                    body                     | result                                          |
---------------------------------------------+-------------------------------------------------+
 Hellooo World                               | 1                                               |
 How are you doing today?                    | 1                                               |
 the quick brown fox jumps over the lazy dog | 11                                              |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 9                                               |
---------------------------------------------+-------------------------------------------------+
select body, regexp_instr(body, '\\S*(o)\\S*\\b', 1, 1, 0, 'ie') as result from message;

---------------------------------------------+--------------------------------------------------+
                    body                     | result                                           |
---------------------------------------------+--------------------------------------------------+
 Hellooo World                               | 7                                                |
 How are you doing today?                    | 2                                                |
 the quick brown fox jumps over the lazy dog | 13                                               |
 PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS     | 10                                               |
---------------------------------------------+--------------------------------------------------+