
Introduction
regexp_matches() and regexp_match() are two similar string functions that support regular expression matching directly in the PostgreSQL database. regexp_matches() was added in PostgreSQL 8.3, and regexp_match() was added in PostgreSQL 10 (keep reading to see how ChatGPT struggled to answer this question).
As noted in the PostgreSQL 10 release notes, regexp_match():
"... only returns results from the first match so it does not need to return a set, making it easier to use for simple cases."
The differences between these two functions are nuanced, however, and they are easiest to understand with some examples.
Test data
You can create a short table with some test data in a local PostgreSQL database like so:
$ psql
psql (16.1)
Type "help" for help.
tobias=# CREATE TABLE patterns (value text);
CREATE TABLE
tobias=# INSERT INTO patterns VALUES ('foo'), ('bar'), ('foobar'), ('foo1barfoo2bar');
INSERT 0 4
tobias=# SELECT * FROM patterns;
value
----------------
foo
bar
foobar
foo1barfoo2bar
(4 rows)
regexp_matches()
Since regexp_matches() was added first to PostgreSQL, let's see how it works and perhaps why its companion regexp_match() was added later on.
tobias=# SELECT
tobias-# value
tobias-# , regexp_matches(value, 'foo\d?', 'g')
tobias-# , regexp_matches(value, 'foo\d?', 'g') is null AS is_null
tobias-# FROM patterns;
value | regexp_matches | is_null
----------------+----------------+---------
foo | {foo} | f
foobar | {foo} | f
foo1barfoo2bar | {foo1} | f
foo1barfoo2bar | {foo2} | f
(4 rows)
Observations:
- The example uses the g flag, since according to the documentation, "if you only want the first match, it's easier and more efficient to use regexp_match()."
- One row each is returned for the values "foo" and "bar", whereas two rows are returned, one for each match, in the value "foo1barfoo2bar".
- No row is returned for the value "bar" since it doesn't match the pattern.
regexp_match()
regexp_match() may be helpful for simpler use cases. Here is the same query (without the inapplicable g flag):
tobias=# SELECT
tobias-# value
tobias-# , regexp_match(value, 'foo\d?')
tobias-# , regexp_match(value, 'foo\d?') is null AS is_null
tobias-# FROM patterns;
value | regexp_match | is_null
----------------+--------------+---------
foo | {foo} | f
bar | | t
foobar | {foo} | f
foo1barfoo2bar | {foo1} | f
(4 rows)
Observations:
- regexp_match() returns a NULL if the pattern doesn't match, instead of excluding the row from the result set.
- Only the first matching pattern in the value "foo1barfoo2bar" is returned; if you wanted to match both values, you would need to specify a single pattern that matched both values (for example, (foo\d?).+(foo\d?) in this example).
- The number of rows in the result set matches the source table.
In general, I find regexp_match() more intuitive and easier to reason about, so it's my first choice for regular expression matching in PostgreSQL.
If you're using a version of PostgreSQL older than 10, the documentation notes that regexp_matches() can be placed in a sub-select to include non-matching rows in the result, for example:
tobias=# SELECT
tobias-# value
tobias-# , (SELECT regexp_matches(value, '(foo\d?)')) AS regexp_matches
tobias-# , (SELECT regexp_matches(value, '(foo\d?)')) IS NULL AS is_null
tobias-# FROM patterns;
value | regexp_matches | is_null
----------------+----------------+---------
foo | {foo} | f
bar | | t
foobar | {foo} | f
foo1barfoo2bar | {foo1} | f
(4 rows)
Note, the g flag should not be used in this form as it might return more than one row. If it does, you'll receive the error, "more than one row returned by a subquery used as an expression."
Bonus: regexp_substr()
Another function, regexp_substr(), was added in PostgreSQL 15. In the simple case where (as in these examples) there is only one pattern in the regular expression, if you are using PostgreSQL 15 or above, this function returns the matched pattern or NULL, not an array or NULL as with regexp_match().
tobias=# SELECT
tobias-# value
tobias-# , regexp_substr(value, 'foo\d?')
tobias-# , regexp_substr(value, 'foo\d?') is null AS is_null
tobias-# FROM patterns;
value | regexp_substr | is_null
----------------+---------------+---------
foo | foo | f
bar | | t
foobar | foo | f
foo1barfoo2bar | foo1 | f
(4 rows)
Observations:
- This function returns the matched pattern or NULL, like regexp_match().
- Unlike regexp_match(), the result is not an array (it does not appear within { and } above).
In cases where there is only one pattern in the regular expression, if you are using PostgreSQL 15 or above, regexp_substr() is a good choice.
Conclusion
regexp_matches(), regexp_match(), and regexp_substr() are unique and powerful functions for pattern matching in PostgreSQL. As a general rule, it may be helpful to start with the newest function available to you, and work backwards to the original regexp_matches() in case the newer functions do not meet your needs. Nonetheless, it's helpful to understand the differences between them when writing queries, and I hope this post has helped explain some of the nuances.
Annex: ChatGPT on Regular Expression Functions in PostgreSQL
As an experiment, I tried using ChatGPT to help me find the version of PostgreSQL in which regexp_match() was added. It struggled to find the right answer. Here is the full transcript.
Me: when was regexp_match added to postgres?
ChatGPT: regexp_match() was introduced in PostgreSQL 9.1, which was released on September 12, 2011. This function provides a way to perform regular expression matching in SQL queries and return matched substrings.
Me: I don't see it in the documentation for postgres 9.1
ChatGPT: You’re right to question this. The regexp_match() function was actually introduced in PostgreSQL 8.4, which was released on July 1, 2009. It supports regular expression matching and returns a text array containing the matched substrings.
By PostgreSQL 9.1, additional enhancements to regular expression functions were added, but regexp_match() itself dates back to version 8.4.
Me: I don't see it in 8.4 either
ChatGPT: You’re absolutely right to double-check. The regexp_match() function was actually introduced in PostgreSQL 9.0, which was released on September 20, 2010.
In PostgreSQL 9.0, regexp_match() was added to provide a more precise way to extract matching text from a string using regular expressions, returning a text array. This complemented existing regex functions like ~, ~*, and substring().
If you’re reviewing older documentation, PostgreSQL 9.0 is the correct starting point for regexp_match().
Me: I don't see it in 9.0 either. I do see it in the Postgres 12 documentation
ChatGPT: I checked the PostgreSQL release history — regexp_match() was actually introduced in PostgreSQL 9.1. It appears in the PostgreSQL 9.1 release notes and the corresponding documentation.
The function may have been easy to overlook because it was added alongside other enhancements, but it’s indeed part of the PostgreSQL 9.1 feature set. If you’re seeing it only in PostgreSQL 12 documentation, it’s possible the references in earlier versions were less prominent or structured differently.
Me: It's not there.
ChatGPT: You’re right, and I appreciate your persistence. After reviewing the PostgreSQL documentation history more carefully, the regexp_match() function was actually introduced in PostgreSQL 10, released on October 5, 2017.
Earlier versions like 9.0, 9.1, or even 9.6 did not include regexp_match(). PostgreSQL 10 introduced it as part of improved pattern matching capabilities.
Thanks for your patience — this is the correct answer.
Me: How do you know it's the correct answer?
ChatGPT: I apologize for the earlier inaccuracies. After reviewing the PostgreSQL documentation and available resources, I can confirm that the regexp_match() function was introduced in PostgreSQL 10, released on October 5, 2017. This function returns the first substring that matches a POSIX regular expression within a given string, providing a straightforward method for pattern matching. Prior to PostgreSQL 10, similar functionality was available through the regexp_matches() function, which returns all matches as a set of text arrays.