Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
Databricks SQL
Databricks Runtime
Extracts the first string in str that matches the regexp expression and corresponds to the regex group index.
Syntax
regexp_extract(str, regexp [, idx] )
Arguments
str: ASTRINGexpression to be matched.regexp: ASTRINGexpression with a matching pattern.idx: An optional integral number expression greater or equal 0 with default 1.
Returns
A STRING.
If str, regexp, or idx is NULL, the result is NULL.
If regexp does not match any part of str, the result is an empty string.
If the overall pattern matches but the group specified by idx does not participate in the match, the result is an empty string.
The regexp string must be a Java regular expression. See Java regex patterns for the supported syntax.
When using literals, use raw-literal (r prefix) to avoid escape character pre-processing.
regexp may contain multiple groups. Capture groups are defined by enclosing part of the pattern in parentheses (...).
idx indicates which regex group to extract.
An idx of 0 means matching the entire regular expression.
The default value of idx is 1, meaning the first capture group is returned when idx is omitted.
If idx is greater than the number of capture groups defined in regexp, Databricks raises an error.
Only the first match of regexp in str is considered. To extract all matches, use regexp_extract_all.
Common error conditions
Examples
Extract specific capture groups
> SELECT regexp_extract('100-200', '(\\d+)-(\\d+)', 1);
100
> SELECT regexp_extract('100-200', '(\\d+)-(\\d+)', 2);
200
Use idx 0 to return the entire match
> SELECT regexp_extract('Order: 100-200', '(\\d+)-(\\d+)', 0);
100-200
Default idx is 1
When idx is omitted, the first capture group is returned.
> SELECT regexp_extract('abc123def', '([a-z]+)(\\d+)');
abc
No match returns an empty string
> SELECT regexp_extract('hello world', '(\\d+)', 1);
NULL input returns NULL
> SELECT regexp_extract(NULL, '(\\d+)', 1);
NULL
Use raw string literals to avoid double escaping
In a regular SQL string, backslashes must be doubled (\\d to mean \d). Use the r prefix for raw string literals to write patterns more naturally.
> SELECT regexp_extract('abc123', r'(\d+)', 1);
123
Extract a domain from a URL
> SELECT regexp_extract('https://www.databricks.com/product', r'https?://([^/]+)', 1);
www.databricks.com
Extract a component from a date string
> SELECT regexp_extract('Event on 2024-03-15', r'(\d{4})-(\d{2})-(\d{2})', 0);
2024-03-15
> SELECT regexp_extract('Event on 2024-03-15', r'(\d{4})-(\d{2})-(\d{2})', 2);
03
Optional capture group that does not participate in the match
When a group is optional and does not match, an empty string is returned for that group.
> SELECT regexp_extract('ac', r'(a)(b)?(c)', 2);
Invalid regex pattern
> SELECT regexp_extract('abc', '[invalid', 0);
Error: INVALID_PARAMETER_VALUE.PATTERN