MIME-Version: 1.0
References: <CAMjNa7eFzTQ5=oZMQiB2bMkez5KP4A77JC7SRjeVEkOrh7cUHw@mail.gmail.com>
 <CACLU5mST1LhC3ibaKGNch_=06S2cmbjR4PnoUSupKs+rtgdeyw@mail.gmail.com>
 <CAMjNa7fJUwcOxf+qV8g+tCQ-3E-YAiKgE_Qs6u-xjdxe12T0SQ@mail.gmail.com> <CAMjNa7egcgUMf2tdQ1qeTYj1J1bBvyth3thoZPioujusFsBd4Q@mail.gmail.com>
In-Reply-To: <CAMjNa7egcgUMf2tdQ1qeTYj1J1bBvyth3thoZPioujusFsBd4Q@mail.gmail.com>
From: Dharin Shah <dharinshah95@gmail.com>
Date: Thu, 15 Jan 2026 19:46:15 +0100
Message-ID: <CAOj6k6fDKg4EOpzmO1sJYARwyu8dkmX-BFCEymQc_6okw9ORVw@mail.gmail.com>
Subject: Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW
To: Adam Brusselback <adambrusselback@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: multipart/mixed; boundary="0000000000007a3551064871a43f"
Archived-At: <https://www.postgresql.org/message-id/CAOj6k6fDKg4EOpzmO1sJYARwyu8dkmX-BFCEymQc_6okw9ORVw%40mail.gmail.com>
Precedence: bulk

--0000000000007a3551064871a43f
Content-Type: multipart/alternative; boundary="0000000000007a3550064871a43d"

--0000000000007a3550064871a43d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hey Adam,

Apologies for the delay, and as promised on discord, I did a review of the
current patch (cf 6305) and wanted to share findings that line up with the
thread=E2=80=99s design discussion, plus one additional correctness bug tha=
t I
could reproduce.

1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is
built using the unique index metadata. The code currently uses indnatts
when building the ON Conflict (...) target list. That includes INCLUDE
columns, so for an index like:

CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra);
the generated statement becomes effectively ON CONFLICT (id, extra) ...,
which fails with:
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
specification

The fix appears straightforward: use indnkeyatts (key attributes only) when
generating the conflict target, and also when deciding which columns are
=E2=80=9Ckey=E2=80=9D for the UPDATE SET clause. I=E2=80=99ve attached a mi=
nimal repro SQL script
(repro_include_issue.sql)

2. Another small test quality issue: the regression script has a comment
=E2=80=9CSubqueries -> Error=E2=80=9D but the expected output shows no erro=
r for the
schema-qualified subquery. There is no explicit check forbidding subqueries
in transformRefreshWhereClause(), so schema-qualified subqueries appear
allowed.

Moving on to broader questions


> I believe the issue is that the DELETE -> INSERT strategy leaves a
> consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurre=
nt
> reads, the moment we delete the rows, we lose the physical lock on them. =
If
> a concurrent transaction inserts a colliding row during that gap, the
> materialized view ends up inconsistent with the base query (or hits a
> constraint violation).


Consistency gap in the non-concurrent mode matches what I=E2=80=99d expect:=
 with
ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE =E2=
=86=92
INSERT approach can create a window where the old tuple is gone and a
concurrent session can insert a conflicting logical row.

That said, I think it would help the patch to explicitly define the
intended safety model:
1. Is the goal to be safe against concurrent DML on base tables only (i.e.,
refresh sees a snapshot and updates MV accordingly), or also to be safe
against concurrent partial refreshes and direct writes to the MV (when
maintenance is enabled)?
2. Should the non-concurrent partial refresh be =E2=80=9Cbest effort=E2=80=
=9D like normal
DML (user coordinates), or should it be =E2=80=9Cmaintenance-like=E2=80=9D =
(serialized /
logically safe by default)?

If the intent is =E2=80=9Csafe by default=E2=80=9D, I=E2=80=99d encourage d=
ocumenting very clearly
what=E2=80=99s guaranteed, and adding regression/README-style notes for foo=
tguns

From a reviewer standpoint, I think the feature concept is sound and
valuable, but it needs a crisp statement of semantics and safety
boundaries. The tricky part is exactly what you called out: incremental
refresh implies concurrency questions that aren=E2=80=99t present with full=
 rebuild
+ strong locks.

I=E2=80=99m happy to keep reviewing iterations (especially around the advis=
ory lock
approach), and I=E2=80=99ll attach the reproduction scripts and notes I use=
d.

As a possible staging approach: it might be simplest to start with a
conservative serialization model for non-concurrent WHERE (while still
allowing readers), and then iterate toward finer-grained logical locking
if/when needed for throughput.


Thanks,
Dharin


On Sun, Jan 4, 2026 at 3:56=E2=80=AFAM Adam Brusselback <adambrusselback@gm=
ail.com>
wrote:

> Hi all,
>
> I've been running some more concurrency tests against this patch
> (specifically looking for race conditions), and I found a flaw in the
> implementation for the  REFRESH ... WHERE ... mode (without CONCURRENTLY)=
.
>
> I believe the issue is that the DELETE -> INSERT strategy leaves a
> consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurre=
nt
> reads, the moment we delete the rows, we lose the physical lock on them. =
If
> a concurrent transaction inserts a colliding row during that gap, the
> materialized view ends up inconsistent with the base query (or hits a
> constraint violation).
>
> I initially was using SELECT ... FOR UPDATE to lock the rows before
> modification, but that lock is (now that I know) obviously lost when the
> row is deleted.
>
> My plan is to replace that row-locking strategy with transaction-level
> advisory locks inside the refresh logic:
>
> Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid,
> hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.
>
> This effectively locks the "logical" ID of the row, preventing concurrent
> refreshes on the same ID even while the physical tuple is temporarily gon=
e.
> Hash collisions should not have any correctness issues that I can think o=
f.
>
> However, before I sink time into implementing that fix:
>
> Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE
> ... in core?
> If the community feels this feature is a footgun or conceptually wrong fo=
r
> Postgres, I'd rather know now before spending more time on this.
>
> If the feature concept is sound, does the advisory lock approach seem lik=
e
> the right way to handle the concurrency safety here?
>
> Thanks,
> Adam Brusselback
>

--0000000000007a3550064871a43d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hey Adam,<div><br></div><div>Apologies for the delay, and =
as promised on discord, I did a review of the current patch (cf 6305) and w=
anted to share findings that line up with the thread=E2=80=99s design discu=
ssion, plus one additional correctness bug that I could reproduce.</div><di=
v><br>1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL =
is built using the unique index metadata. The code currently uses indnatts =
when building the ON Conflict (...) target list. That includes INCLUDE colu=
mns, so for an index like:</div><div><br></div><div>CREATE UNIQUE INDEX ON =
mv(id) INCLUDE (extra);</div><div>the generated statement becomes effective=
ly ON CONFLICT (id, extra) ..., which fails with:</div><div>ERROR: there is=
 no unique or exclusion constraint matching the ON CONFLICT specification</=
div><div><div><br></div><div>The fix appears straightforward: use indnkeyat=
ts (key attributes only) when generating the conflict target, and also when=
 deciding which columns are =E2=80=9Ckey=E2=80=9D for the UPDATE SET clause=
. I=E2=80=99ve attached a minimal repro SQL script (repro_include_issue.sql=
)</div><div><br></div><div>2. Another small test quality issue: the regress=
ion script has a comment =E2=80=9CSubqueries -&gt; Error=E2=80=9D but the e=
xpected output shows no error for the schema-qualified subquery. There is n=
o explicit check forbidding subqueries in transformRefreshWhereClause(), so=
 schema-qualified subqueries appear allowed.</div><div><div><br>Moving on t=
o broader questions=C2=A0</div></div><div><br></div><div>=C2=A0</div><block=
quote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1=
px solid rgb(204,204,204);padding-left:1ex">I believe the issue is that the=
 DELETE -&gt; INSERT strategy leaves a consistency gap. Since we relied on =
ROW EXCLUSIVE locks to allow concurrent reads, the moment we delete the row=
s, we lose the physical lock on them. If a concurrent transaction inserts a=
 colliding row during that gap, the materialized view ends up inconsistent =
with the base query (or hits a constraint violation).</blockquote><div><br>=
</div><div>Consistency gap in the non-concurrent mode matches what I=E2=80=
=99d expect: with ROW EXCLUSIVE you allow concurrent readers/writers, and a=
 pure DELETE =E2=86=92 INSERT approach can create a window where the old tu=
ple is gone and a concurrent session can insert a conflicting logical row.=
=C2=A0</div><div><br></div><div>That said, I think it would help the patch =
to explicitly define the intended safety model:<br>1. Is the goal to be saf=
e against concurrent DML on base tables only (i.e., refresh sees a snapshot=
 and updates MV accordingly), or also to be safe against concurrent partial=
 refreshes and direct writes to the MV (when maintenance is enabled)?<br>2.=
 Should the non-concurrent partial refresh be =E2=80=9Cbest effort=E2=80=9D=
 like normal DML (user coordinates), or should it be =E2=80=9Cmaintenance-l=
ike=E2=80=9D (serialized / logically safe by default)?<br></div><div><br></=
div><div>If the intent is =E2=80=9Csafe by default=E2=80=9D, I=E2=80=99d en=
courage documenting very clearly what=E2=80=99s guaranteed, and adding regr=
ession/README-style notes for footguns=C2=A0</div><div><br></div><div>From =
a reviewer standpoint, I think the feature concept is sound and valuable, b=
ut it needs a crisp statement of semantics and safety boundaries. The trick=
y part is exactly what you called out: incremental refresh implies concurre=
ncy questions that aren=E2=80=99t present with full rebuild + strong locks.=
<br><br>I=E2=80=99m happy to keep reviewing iterations (especially around t=
he advisory lock approach), and I=E2=80=99ll attach the reproduction script=
s and notes I used.</div><div><br></div><div>As a possible staging approach=
: it might be simplest to start with a conservative serialization model for=
 non-concurrent WHERE (while still allowing readers), and then iterate towa=
rd finer-grained logical locking if/when needed for throughput.</div><div><=
br></div><div><br></div><div>Thanks,</div><div>Dharin</div><br></div></div>=
<br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Sun=
, Jan 4, 2026 at 3:56=E2=80=AFAM Adam Brusselback &lt;<a href=3D"mailto:ada=
mbrusselback@gmail.com" target=3D"_blank">adambrusselback@gmail.com</a>&gt;=
 wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"ltr">Hi all,<br><br>I&#39;ve been running some more concurrency tests a=
gainst this patch (specifically looking for race conditions), and I found a=
 flaw in the implementation for the=C2=A0 REFRESH ... WHERE ... mode (witho=
ut CONCURRENTLY).<br><br>I believe the issue is that the DELETE -&gt; INSER=
T strategy leaves a consistency gap. Since we relied on ROW EXCLUSIVE locks=
 to allow concurrent reads, the moment we delete the rows, we lose the phys=
ical lock on them. If a concurrent transaction inserts a colliding row duri=
ng that gap, the materialized view ends up inconsistent with the base query=
 (or hits a constraint violation).<br><br>I initially was using SELECT ... =
FOR UPDATE to lock the rows before modification, but that lock is (now that=
 I know) obviously lost when the row is deleted.<br><br>My plan is to repla=
ce that row-locking strategy with transaction-level advisory locks inside t=
he refresh logic:<br><br>Before the DELETE, run a SELECT pg_advisory_xact_l=
ock(mv_oid, hashtext(ROW(unique_keys)::text)) for the rows matching the WHE=
RE clause.<br><br>This effectively locks the &quot;logical&quot; ID of the =
row, preventing concurrent refreshes on the same ID even while the physical=
 tuple is temporarily gone. Hash collisions should not have any correctness=
 issues that I can think of.<br><br>However, before I sink time into implem=
enting that fix:<br><br>Is there general interest in having REFRESH MATERIA=
LIZED VIEW ... WHERE ... in core?<br>If the community feels this feature is=
 a footgun or conceptually wrong for Postgres, I&#39;d rather know now befo=
re spending more time on this.<br><br>If the feature concept is sound, does=
 the advisory lock approach seem like the right way to handle the concurren=
cy safety here?<br><br>Thanks,<br>Adam Brusselback</div>
</blockquote></div>

--0000000000007a3550064871a43d--
--0000000000007a3551064871a43f
Content-Type: application/octet-stream; name="test_include_bug.sql"
Content-Disposition: attachment; filename="test_include_bug.sql"
Content-Transfer-Encoding: base64
Content-ID: <f_mkfsu8eo0>
X-Attachment-Id: f_mkfsu8eo0

CkRST1AgVEFCTEUgSUYgRVhJU1RTIHRfaW5jbHVkZSBDQVNDQURFOwpDUkVBVEUgVEFCTEUgdF9p
bmNsdWRlICgKICAgIGlkIElOVCBQUklNQVJZIEtFWSwKICAgIGRhdGEgVEVYVCwKICAgIGV4dHJh
IFRFWFQKKTsKCklOU0VSVCBJTlRPIHRfaW5jbHVkZSBWQUxVRVMKICAgICgxLCAnZGF0YTEnLCAn
ZXh0cmExJyksCiAgICAoMiwgJ2RhdGEyJywgJ2V4dHJhMicpOwoKQ1JFQVRFIE1BVEVSSUFMSVpF
RCBWSUVXIG12X2luY2x1ZGUgQVMgU0VMRUNUICogRlJPTSB0X2luY2x1ZGU7CgotLSBDcmVhdGUg
aW5kZXggd2l0aCBJTkNMVURFIGNvbHVtbgpDUkVBVEUgVU5JUVVFIElOREVYIGlkeF9pbmNsdWRl
IE9OIG12X2luY2x1ZGUoaWQpIElOQ0xVREUgKGV4dHJhKTsKClxlY2hvICdJbmRleCBzdHJ1Y3R1
cmU6JwpTRUxFQ1QKICAgIGluZG5hdHRzIGFzIHRvdGFsX2F0dHJzLAogICAgaW5kbmtleWF0dHMg
YXMga2V5X2F0dHJzCkZST00gcGdfaW5kZXgKV0hFUkUgaW5kZXhyZWxpZCA9ICdpZHhfaW5jbHVk
ZSc6OnJlZ2NsYXNzOwoKXGVjaG8gJ0F0dGVtcHRpbmcgcGFydGlhbCByZWZyZXNoIHdpdGggSU5D
TFVERSBpbmRleC4uLicKVVBEQVRFIHRfaW5jbHVkZSBTRVQgZGF0YSA9ICd1cGRhdGVkMScsIGV4
dHJhID0gJ3VwZGF0ZWRfZXh0cmExJyBXSEVSRSBpZCA9IDE7Cgpcc2V0IE9OX0VSUk9SX1NUT1Ag
MApSRUZSRVNIIE1BVEVSSUFMSVpFRCBWSUVXIG12X2luY2x1ZGUgV0hFUkUgaWQgPSAxOwpcc2V0
IE9OX0VSUk9SX1NUT1AgMQoKXGVjaG8gJycKXGVjaG8gJ0V4cGVjdGVkIGVycm9yOiAibm8gdW5p
cXVlIG9yIGV4Y2x1c2lvbiBjb25zdHJhaW50IG1hdGNoaW5nIE9OIENPTkZMSUNUIicKXGVjaG8g
J1Jvb3QgY2F1c2U6IENvZGUgdXNlcyBpbmRuYXR0cyAoMikgaW5zdGVhZCBvZiBpbmRua2V5YXR0
cyAoMSknClxlY2hvICdHZW5lcmF0ZXM6IE9OIENPTkZMSUNUIChpZCwgZXh0cmEpIGJ1dCBjb25z
dHJhaW50IGlzIG9ubHkgb24gKGlkKScK
--0000000000007a3551064871a43f--