Categories:

String Functions (Regular Expressions)

REGEXP_SUBSTR

Returns the substring that matches a regular expression within a string. If no match is found, returns NULL.

See also String Functions (Regular Expressions).

Syntax

REGEXP_SUBSTR( <subject> , <pattern> [ , <position> , <occurrence> , <parameters> ] )

Arguments

Required:

subject

Subject to match.

pattern

Pattern to match.

Optional:

position

Number of characters from the beginning of the string where the function starts searching for matches.

Default: 1 (the search for a match starts at the first character on the left)

occurrence

Specifies which occurrence of the pattern to match. The function skips the first occurrence - 1 matches.

Default: 1

parameters

String of one or more characters that specifies the regular expression parameters used for searching for matches. The supported values are:

c , i , m , e , s

For more details, see Parameters.

Default: c

Note

By default, REGEXP_SUBSTR returns the entire matching pattern in the subject. However, if the e (for “extract”) parameter is specified, REGEXP_SUBSTR only returns the part of the subject that matches the first sub-expression in the pattern. If there is no sub-expression in the pattern, REGEXP_SUBSTR behaves as if e was not set. If there is more than one sub-expression in the pattern, the additional sub-expressions have no effect. For an example, see Examples (in this topic).

Usage Notes

  • Parentheses (( )) and square brackets ([ ]) currently must be double-escaped to parse them as literal strings.

    For example:

    SELECT REGEXP_SUBSTR('Customers - (NY)','\\([[:alnum:]\-]+\\)') as customers;
    
      +-----------+
      | CUSTOMERS |
      |-----------|
      | (NY)      |
      +-----------+
    
  • For additional usage notes, see the General Usage Notes for regular expression functions.

Collation Details

Arguments with collation specifications are currently not supported.

Examples

The following example matches a substring composed of word characters starting with the and an unlimited number of non-word characters. Matching begins at the 1st character in the string and returns the second occurrence of the substring:

select regexp_substr('It was the best of times, it was the worst of times', 'the\\W+(\\w+)', 1, 2) as "result" from dual;

+-----------+
| result    |
|-----------|
| the worst |
+-----------+

The following example is the same as the previous example, but uses the e parameter to return only the part of the string that matches the first sub-expression in the pattern (i.e. the first set of word characters after the). As in the previous example, matching begins at the 1st character in the string and returns the second occurrence of the substring:

select regexp_substr('It was the best of times, it was the worst of times', 'the\\W+(\\w+)', 1, 2, 'e') as "result" from dual;

+--------+
| result |
|--------|
| worst  |
+--------+

The following example illustrates overlapping occurrences:

-- Prepare example
create or replace table overlap (id number, a string);

insert into overlap values (1,',abc,def,ghi,jkl,');
insert into overlap values (2,',abc,,def,,ghi,,jkl,');

select * from overlap;

select id, regexp_substr(a,'[[:punct:]][[:alnum:]]+[[:punct:]]', 1, 2) from overlap;

+----+-------------------------------------------------------------+
| ID | REGEXP_SUBSTR(A,'[[:PUNCT:]][[:ALNUM:]]+[[:PUNCT:]]', 1, 2) |
|----+-------------------------------------------------------------|
|  1 | ,ghi,                                                       |
|  2 | ,def,                                                       |
+----+-------------------------------------------------------------+

The following example creates a JSON object from an Apache HTTP Server access log using pattern matching and concatenation:

-- Prepare example
CREATE OR REPLACE TABLE log (logs varchar);

INSERT INTO log (logs) VALUES
('127.0.0.1 - - [10/Jan/2018:16:55:36 -0800] "GET / HTTP/1.0" 200 2216'),
('192.168.2.20 - - [14/Feb/2018:10:27:10 -0800] "GET /cgi-bin/try/ HTTP/1.0" 200 3395');
SELECT
  '{ "ip_addr":"'
  || REGEXP_SUBSTR (logs,'\\b\\d{1,3}\.\\d{1,3}\.\\d{1,3}\.\\d{1,3}\\b')
  || '", "date":"'
  || REGEXP_SUBSTR (logs,'([\\w:\/]+\\s[+\-]\\d{4})')
  || '", "request":"'
  || REGEXP_SUBSTR (logs,'\"((\\S+) (\\S+) (\\S+))\"', 1, 1, 'e')
  || '", "status":"'
  || REGEXP_SUBSTR (logs,'(\\d{3}) \\d+', 1, 1, 'e')
  || '", "size":"'
  || REGEXP_SUBSTR (logs,'\\d{3} (\\d+)', 1, 1, 'e')
  || '"}' as Apache_HTTP_Server_Access
  FROM log;

+-----------------------------------------------------------------------------------------------------------------------------------------+
| APACHE_HTTP_SERVER_ACCESS                                                                                                               |
|-----------------------------------------------------------------------------------------------------------------------------------------|
| { "ip_addr":"127.0.0.1", "date":"10/Jan/2018:16:55:36 -0800", "request":"GET / HTTP/1.0", "status":"200", "size":"2216"}                |
| { "ip_addr":"192.168.2.20", "date":"14/Feb/2018:10:27:10 -0800", "request":"GET /cgi-bin/try/ HTTP/1.0", "status":"200", "size":"3395"} |
+-----------------------------------------------------------------------------------------------------------------------------------------+