public inbox for [email protected]  
help / color / mirror / Atom feed
Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
4+ messages / 3 participants
[nested] [flat]

* Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
@ 2026-02-09 06:49 Giuliano Gagliardi <[email protected]>
  2026-02-11 19:48 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <[email protected]>
  2026-02-11 19:56 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <[email protected]>
  0 siblings, 2 replies; 4+ messages in thread

From: Giuliano Gagliardi @ 2026-02-09 06:49 UTC (permalink / raw)
  To: [email protected]

I noticed the following two issues while looking at the code that handles
REFRESH MATERIALIZED VIEW CONCURRENTLY (refresh_by_match_merge() in matview.c):

1.

At the beginning of the function, there is some code that checks for duplicate
rows, but it does not catch the following case:

CREATE TABLE t(a text, b text);
INSERT INTO t VALUES('test', null);
CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
CREATE UNIQUE INDEX ON m(a);
INSERT INTO t VALUES('test', null); -- t now contains two identical rows

REFRESH MATERIALIZED VIEW CONCURRENTLY m;
     -> no error, but m still contains only one row!
REFRESH MATERIALIZED VIEW m;
     -> error (as expected)

2.

Do I understand correctly that the join creating the "diff" table is given
equality conditions for all columns referenced in any unique indexes? This
led me to think that a unique index on a column with many null entries
would enlarge the "diff" table.

In the following example, creating the second unique index noticeably worsens
the performance of REFRESH MATERIALIZED VIEW CONCURRENTLY:

CREATE MATERIALIZED VIEW s AS SELECT generate_series as x, null as y FROM generate_series(1, 1000000);
CREATE UNIQUE INDEX ON s(x);
REFRESH MATERIALIZED VIEW CONCURRENTLY s;
     -> runs for ~1700 ms
CREATE UNIQUE INDEX ON s(y);
REFRESH MATERIALIZED VIEW CONCURRENTLY s;
     -> runs for ~9000 ms

Kind regards,
Giuliano







^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-09 06:49 Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Giuliano Gagliardi <[email protected]>
@ 2026-02-11 19:48 ` Rafia Sabih <[email protected]>
  1 sibling, 0 replies; 4+ messages in thread

From: Rafia Sabih @ 2026-02-11 19:48 UTC (permalink / raw)
  To: Giuliano Gagliardi <[email protected]>; +Cc: [email protected]

On Sun, 8 Feb 2026 at 22:49, Giuliano Gagliardi <[email protected]> wrote:

> I noticed the following two issues while looking at the code that handles
> REFRESH MATERIALIZED VIEW CONCURRENTLY (refresh_by_match_merge() in
> matview.c):
>
> 1.
>
> At the beginning of the function, there is some code that checks for
> duplicate
> rows, but it does not catch the following case:
>
> CREATE TABLE t(a text, b text);
> INSERT INTO t VALUES('test', null);
> CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
> CREATE UNIQUE INDEX ON m(a);
> INSERT INTO t VALUES('test', null); -- t now contains two identical rows
>
> REFRESH MATERIALIZED VIEW CONCURRENTLY m;
>      -> no error, but m still contains only one row!
> REFRESH MATERIALIZED VIEW m;
>      -> error (as expected)
>
Interesting issue and thanks for pointing it out.
Going over the code in the function you mentioned(refresh_by_match_merge()
in matview.c), I found out that it is explicitly checking for the columns
where it is not NULL.
appendStringInfo(&querybuf,
"SELECT newdata.*::%s FROM %s newdata "
"WHERE newdata.* IS NOT NULL AND EXISTS "
"(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL "
"AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
"AND newdata2.ctid OPERATOR(pg_catalog.<>) "
"newdata.ctid)",
It is mentioned in the comments above as well that it checks for the
duplicates in the rows without NULLs.
However, if I changed the query as in the attached patch, it errors out as
otherwise I would have expected.
Honestly, I do not understand why it is checking for duplicates excluding
null values.
Behaviour wise this definitely seems like a bug, but I am not sure if the
attached patch is the right way to fix it.

-- 
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH


Attachments:

  [application/octet-stream] 0001-Check-for-duplicate-rows-with-NULLs.patch (1.2K, 3-0001-Check-for-duplicate-rows-with-NULLs.patch)
  download | inline diff:
From 0ca4b1214c86f1b959e4339cc9b8e0a1c6aa0fd3 Mon Sep 17 00:00:00 2001
From: Rafia Sabih <[email protected]>
Date: Wed, 11 Feb 2026 11:38:07 -0800
Subject: [PATCH] Check for duplicate rows with NULLs

In case of concurrent refresh of materialized view, check for
columns with NULL values. If there are duplicate entries with NULLS
then error out.
---
 src/backend/commands/matview.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..48bb2582a92 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -648,9 +648,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	resetStringInfo(&querybuf);
 	appendStringInfo(&querybuf,
 					 "SELECT newdata.*::%s FROM %s newdata "
-					 "WHERE newdata.* IS NOT NULL AND EXISTS "
-					 "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL "
-					 "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
+					 "WHERE EXISTS "
+					 "(SELECT 1 FROM %s newdata2 WHERE "
+					 " newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
 					 "AND newdata2.ctid OPERATOR(pg_catalog.<>) "
 					 "newdata.ctid)",
 					 tempname, tempname, tempname);
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-09 06:49 Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Giuliano Gagliardi <[email protected]>
@ 2026-02-11 19:56 ` surya poondla <[email protected]>
  2026-02-12 00:34   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <[email protected]>
  1 sibling, 1 reply; 4+ messages in thread

From: surya poondla @ 2026-02-11 19:56 UTC (permalink / raw)
  To: Giuliano Gagliardi <[email protected]>; +Cc: [email protected]

Hi Giuliano,

Thank you for the test case, yes I am able to reproduce the behavior for
issue1

I noticed the following two issues while looking at the code that handles
> REFRESH MATERIALIZED VIEW CONCURRENTLY (refresh_by_match_merge() in
> matview.c):
>
> 1.
>
> At the beginning of the function, there is some code that checks for
> duplicate
> rows, but it does not catch the following case:
>
> CREATE TABLE t(a text, b text);
> INSERT INTO t VALUES('test', null);
> CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
> CREATE UNIQUE INDEX ON m(a);
> INSERT INTO t VALUES('test', null); -- t now contains two identical rows
>
> REFRESH MATERIALIZED VIEW CONCURRENTLY m;
>      -> no error, but m still contains only one row!
> REFRESH MATERIALIZED VIEW m;
>      -> error (as expected)
>
> Adding the output here for a complete picture.
postgres=# CREATE TABLE t(a text, b text);
CREATE TABLE
postgres=# INSERT INTO t VALUES('test', null);
INSERT 0 1
postgres=# CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# INSERT INTO t VALUES('test', null);
INSERT 0 1
postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m;
REFRESH MATERIALIZED VIEW
postgres=# SELECT * FROM m;
  a   | b
------+---
 test |
(1 row)
postgres=# REFRESH MATERIALIZED VIEW m;
ERROR:  could not create unique index "m_a_idx"
DETAIL:  Key (a)=(test) is duplicated.
postgres=# SELECT * FROM m;
  a   | b
------+---
 test |
(1 row)

Yes, I believe "REFRESH MATERIALIZED VIEW CONCURRENTLY m;" should ideally
throw the same error as REFRESH MATERIALIZED VIEW m;

I am still trying to understand the CONCURRENTLY behavior in detail and
will share more of my findings on this potential issue.

Regards,
Surya Poondla


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-09 06:49 Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Giuliano Gagliardi <[email protected]>
  2026-02-11 19:56 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <[email protected]>
@ 2026-02-12 00:34   ` surya poondla <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: surya poondla @ 2026-02-12 00:34 UTC (permalink / raw)
  To: Giuliano Gagliardi <[email protected]>; +Cc: [email protected]

Hi Giuliano,

Regarding the issue 1,
>
> I noticed the following two issues while looking at the code that handles
> REFRESH MATERIALIZED VIEW CONCURRENTLY (refresh_by_match_merge() in
> matview.c):
>
> 1.
>
> At the beginning of the function, there is some code that checks for
> duplicate
> rows, but it does not catch the following case:
>
> CREATE TABLE t(a text, b text);
> INSERT INTO t VALUES('test', null);
> CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
> CREATE UNIQUE INDEX ON m(a);
> INSERT INTO t VALUES('test', null); -- t now contains two identical rows
>
> REFRESH MATERIALIZED VIEW CONCURRENTLY m;
>      -> no error, but m still contains only one row!
> REFRESH MATERIALIZED VIEW m;
>      -> error (as expected)
>
>
Thank you for the pointers, I made a patch in refresh_by_match_merge()
which reports an error in the REFRESH MATERIALIZED VIEW CONCURRENTLY case
too.

The issue was REFRESH MATERIALIZED VIEW CONCURRENTLY was incorrectly
skipping duplicate detection for rows containing any NULL values. This was
happening because the "WHERE newdata.* IS NOT NULL" condition returns false
if any column contains NULL.
My patch removes the "IS NOT NULL" preconditions from the duplicate
detection query. The query now correctly checks all rows using the record
equality operator (*=), which treats NULL as equal to NULL (i.e True).

Here is the output with my patch:
postgres=# CREATE TABLE t(a text, b text);
CREATE TABLE
postgres=#   INSERT INTO t VALUES('test', null);
INSERT 0 1
postgres=#
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# SELECT * FROM m;
  a   | b
------+---
 test |
(1 row)

postgres=# SELECT * FROM t;
  a   | b
------+---
 test |
(1 row)

postgres=# INSERT INTO t VALUES('test', null);
INSERT 0 1
postgres=# SELECT * FROM t;
  a   | b
------+---
 test |
 test |
(2 rows)

postgres=# SELECT * FROM m;
  a   | b
------+---
 test |
(1 row)

postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m;
2026-02-11 15:57:46.751 PST [39510] ERROR:  new data for materialized view
"m" contains duplicate rows
2026-02-11 15:57:46.751 PST [39510] DETAIL:  Row: (test,)
2026-02-11 15:57:46.751 PST [39510] STATEMENT:  REFRESH MATERIALIZED VIEW
CONCURRENTLY m;
ERROR:  new data for materialized view "m" contains duplicate rows
DETAIL:  Row: (test,)
postgres=#
postgres=# REFRESH MATERIALIZED VIEW m;
2026-02-11 15:57:55.877 PST [39510] ERROR:  could not create unique index
"m_a_idx"
2026-02-11 15:57:55.877 PST [39510] DETAIL:  Key (a)=(test) is duplicated.
2026-02-11 15:57:55.877 PST [39510] STATEMENT:  REFRESH MATERIALIZED VIEW m;
ERROR:  could not create unique index "m_a_idx"
DETAIL:  Key (a)=(test) is duplicated.
postgres=#

Regards,
Surya Poondla


Attachments:

  [application/octet-stream] 0001-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect.patch (3.9K, 3-0001-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect.patch)
  download | inline diff:
From d26e084ed00cde21d917cd4dc52f2ab1467fdc19 Mon Sep 17 00:00:00 2001
From: spoondla <[email protected]>
Date: Wed, 11 Feb 2026 16:06:56 -0800
Subject: [PATCH] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY to detect NULL
 containing duplicates.

Issue:
REFRESH MATERIALIZED VIEW CONCURRENTLY was incorrectly skipping duplicate
detection for rows containing any NULL values. This would cause the refresh
to silently succeed while leaving the materialized view with stale data,
rather than properly reporting duplicate row errors.

The bug occurred because the duplicate check used "WHERE newdata.* IS NOT NULL"
which returns false if any column contains NULL. When duplicates existed in
rows with NULLs (e.g., two rows of ('test', NULL)), the check was skipped
entirely. The subsequent FULL OUTER JOIN would then match both duplicate rows
to the same old row, producing an empty diff, causing no updates to be applied.

Fix:
The fix removes the "IS NOT NULL" preconditions from the duplicate detection
query. The query now correctly checks all rows using the record equality
operator (*=), which treats NULL as equal to NULL. This matches the same
equality semantics used by the FULL OUTER JOIN in the diff query, ensuring
consistent duplicate detection.

Note:
The non-concurrent REFRESH was unaffected since it rebuilds indexes from
scratch, which properly detects duplicates during index creation.
---
 src/backend/commands/matview.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..6aaf7a68c20 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -635,11 +635,13 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
 
 	/*
-	 * We need to ensure that there are not duplicate rows without NULLs in
-	 * the new data set before we can count on the "diff" results.  Check for
-	 * that in a way that allows showing the first duplicated row found.  Even
-	 * after we pass this test, a unique index on the materialized view may
-	 * find a duplicate key problem.
+	 * We need to ensure that there are not duplicate rows in the new data set
+	 * before we can count on the "diff" results.  Check for that in a way
+	 * that allows showing the first duplicated row found.  We check for
+	 * duplicates using the record equality operator (*=), which treats NULLs
+	 * as equal to each other - the same semantics used by the FULL OUTER JOIN
+	 * in the diff query below.  Even after we pass this test, a unique index
+	 * on the materialized view may find a duplicate key problem.
 	 *
 	 * Note: here and below, we use "tablename.*::tablerowtype" as a hack to
 	 * keep ".*" from being expanded into multiple columns in a SELECT list.
@@ -648,9 +650,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	resetStringInfo(&querybuf);
 	appendStringInfo(&querybuf,
 					 "SELECT newdata.*::%s FROM %s newdata "
-					 "WHERE newdata.* IS NOT NULL AND EXISTS "
-					 "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL "
-					 "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
+					 "WHERE EXISTS "
+					 "(SELECT 1 FROM %s newdata2 "
+					 "WHERE newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
 					 "AND newdata2.ctid OPERATOR(pg_catalog.<>) "
 					 "newdata.ctid)",
 					 tempname, tempname, tempname);
@@ -667,7 +669,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 		 */
 		ereport(ERROR,
 				(errcode(ERRCODE_CARDINALITY_VIOLATION),
-				 errmsg("new data for materialized view \"%s\" contains duplicate rows without any null columns",
+				 errmsg("new data for materialized view \"%s\" contains duplicate rows",
 						RelationGetRelationName(matviewRel)),
 				 errdetail("Row: %s",
 						   SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1))));
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2026-02-12 00:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-09 06:49 Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Giuliano Gagliardi <[email protected]>
2026-02-11 19:48 ` Rafia Sabih <[email protected]>
2026-02-11 19:56 ` surya poondla <[email protected]>
2026-02-12 00:34   ` surya poondla <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox