Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vgSNb-00DCPe-2q for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Jan 2026 18:47:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vgSMa-00017S-2b for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Jan 2026 18:46:33 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vgSMa-00017H-0Q for pgsql-hackers@lists.postgresql.org; Thu, 15 Jan 2026 18:46:32 +0000 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vgSMX-000byk-1b for pgsql-hackers@lists.postgresql.org; Thu, 15 Jan 2026 18:46:31 +0000 Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-b8715a4d9fdso169148266b.0 for ; Thu, 15 Jan 2026 10:46:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768502789; x=1769107589; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=c3KDtXzBGYRcRDf1WDYTYzpyLhYXnwOMQqvGD0pQDEE=; b=dFyxuFR8vj+ZNttC5txepAzW4JBOjlFXu1DYQPm1UOBAWSAgPXexRhf5gKvnYmmtcX 3Ih2myCTEvfbOFtMKoWzF9l8k1bbioFFW1p8uPMXKYkwxVYgQi8cZ7wTn/jliyKa1JW4 ZrVNRj4S1V7otVywyQx191fwXdacA3Wwc1VYMzrneo4g/wLhUvDGqK5Cng1x0gCzkbwv HStKKU5V0/xKgIFskzSmEnQzJTNA/hhWbgZM3gny4boYcJUt2huMmTXTu2CjNYRpOJ6L vpLdn/LBwzAnWKZfKskGYow95B3QStnnv58ndRc+Q9a9/AYbTyoDoZhg7d/07/VpKh8/ bu/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768502789; x=1769107589; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=c3KDtXzBGYRcRDf1WDYTYzpyLhYXnwOMQqvGD0pQDEE=; b=fisKpJV3BhX0Dw0ALC4tVNltIEtucxbstMap3+So/ZHNq0PlWvRE/KgIBaw4uAx59R 1hQgqfAPqgSiy6fSiQPHR0Fu3UzSKt3lPTVebgFrldqibj/ungl861IQQ/S4QN+e3TU+ tochzlECbKpr9846sqlhQjPYK1bYboU1Tq9daYn5j5TEaz76Pk4pKz8h76wJ6/rd6sLN w2K0nDd14LR0mCDeJU2aCF1qfJzq0BmLsZkDLNr4BcSED3E5o+XJq+/0OOhQoCMc3H+f cQZTEWl0ioiQg6PX5a/MQBKr4oUUH5pqAoRmn8vmx2mAHWJTovP+gASiNtl9v+e1r7Fq Lvxw== X-Gm-Message-State: AOJu0YzJ5a+0d0v02/GeKlOvsoUxDRi8GHQ3p1VgzkKqYqAlQg2035Sn M00iAFxwNzB649KRqUC5kIDbKEPfmqYhd/pf2gijiqUgXXhE2j3B1Y74f2EhTJSjVDj/C/mddBU Ge7Zn53Rfx7h4Lo+j13Vsd1cGzlkLaqlgVYgQ X-Gm-Gg: AY/fxX4v0ZcyyfHDNUdz3qdn0D8KfYjRiGofUIHDKreG5lFmfMjVH48XrcLm7pAluhL Our07PUk/OOjrWkDXD9E1q32kmB1rWR2XyiJrzocmZio2z28QbZVDuEH57WGrnr8iXFNNdg5q38 DS+WJMkMPStKF/4pKpe2xps+puqtJnPOP8PBonHgToQ+4QvrTrIzHIUq0gJj8zRuS2LmVwIY3Ft sDKtCDsjvuaJmKiZ2nk/HMFr3Z7JQ0rmM+JideA50RQNVBvtKq7iTOh6PCGDzHfIJ5NFhOyn4Jc 7DHVU7RL/PdGr4uXSwvVpMkqjW/D7g== X-Received: by 2002:a17:906:478f:b0:b84:2023:8fb2 with SMTP id a640c23a62f3a-b87968b6ad2mr1219466b.5.1768502788225; Thu, 15 Jan 2026 10:46:28 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Dharin Shah Date: Thu, 15 Jan 2026 19:46:15 +0100 X-Gm-Features: AZwV_Qj2_kIp4CK2VKix14PskmcF2a0ABZuV_n5RReuIg-lNBgw9wK2vIoL5eK4 Message-ID: Subject: Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW To: Adam Brusselback Cc: PostgreSQL Hackers Content-Type: multipart/mixed; boundary="0000000000007a3551064871a43f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007a3551064871a43f Content-Type: multipart/alternative; boundary="0000000000007a3550064871a43d" --0000000000007a3550064871a43d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Adam, Apologies for the delay, and as promised on discord, I did a review of the current patch (cf 6305) and wanted to share findings that line up with the thread=E2=80=99s design discussion, plus one additional correctness bug tha= t I could reproduce. 1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is built using the unique index metadata. The code currently uses indnatts when building the ON Conflict (...) target list. That includes INCLUDE columns, so for an index like: CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra); the generated statement becomes effectively ON CONFLICT (id, extra) ..., which fails with: ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification The fix appears straightforward: use indnkeyatts (key attributes only) when generating the conflict target, and also when deciding which columns are =E2=80=9Ckey=E2=80=9D for the UPDATE SET clause. I=E2=80=99ve attached a mi= nimal repro SQL script (repro_include_issue.sql) 2. Another small test quality issue: the regression script has a comment =E2=80=9CSubqueries -> Error=E2=80=9D but the expected output shows no erro= r for the schema-qualified subquery. There is no explicit check forbidding subqueries in transformRefreshWhereClause(), so schema-qualified subqueries appear allowed. Moving on to broader questions > I believe the issue is that the DELETE -> INSERT strategy leaves a > consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurre= nt > reads, the moment we delete the rows, we lose the physical lock on them. = If > a concurrent transaction inserts a colliding row during that gap, the > materialized view ends up inconsistent with the base query (or hits a > constraint violation). Consistency gap in the non-concurrent mode matches what I=E2=80=99d expect:= with ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE =E2= =86=92 INSERT approach can create a window where the old tuple is gone and a concurrent session can insert a conflicting logical row. That said, I think it would help the patch to explicitly define the intended safety model: 1. Is the goal to be safe against concurrent DML on base tables only (i.e., refresh sees a snapshot and updates MV accordingly), or also to be safe against concurrent partial refreshes and direct writes to the MV (when maintenance is enabled)? 2. Should the non-concurrent partial refresh be =E2=80=9Cbest effort=E2=80= =9D like normal DML (user coordinates), or should it be =E2=80=9Cmaintenance-like=E2=80=9D = (serialized / logically safe by default)? If the intent is =E2=80=9Csafe by default=E2=80=9D, I=E2=80=99d encourage d= ocumenting very clearly what=E2=80=99s guaranteed, and adding regression/README-style notes for foo= tguns From a reviewer standpoint, I think the feature concept is sound and valuable, but it needs a crisp statement of semantics and safety boundaries. The tricky part is exactly what you called out: incremental refresh implies concurrency questions that aren=E2=80=99t present with full= rebuild + strong locks. I=E2=80=99m happy to keep reviewing iterations (especially around the advis= ory lock approach), and I=E2=80=99ll attach the reproduction scripts and notes I use= d. As a possible staging approach: it might be simplest to start with a conservative serialization model for non-concurrent WHERE (while still allowing readers), and then iterate toward finer-grained logical locking if/when needed for throughput. Thanks, Dharin On Sun, Jan 4, 2026 at 3:56=E2=80=AFAM Adam Brusselback wrote: > Hi all, > > I've been running some more concurrency tests against this patch > (specifically looking for race conditions), and I found a flaw in the > implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY)= . > > I believe the issue is that the DELETE -> INSERT strategy leaves a > consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurre= nt > reads, the moment we delete the rows, we lose the physical lock on them. = If > a concurrent transaction inserts a colliding row during that gap, the > materialized view ends up inconsistent with the base query (or hits a > constraint violation). > > I initially was using SELECT ... FOR UPDATE to lock the rows before > modification, but that lock is (now that I know) obviously lost when the > row is deleted. > > My plan is to replace that row-locking strategy with transaction-level > advisory locks inside the refresh logic: > > Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid, > hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause. > > This effectively locks the "logical" ID of the row, preventing concurrent > refreshes on the same ID even while the physical tuple is temporarily gon= e. > Hash collisions should not have any correctness issues that I can think o= f. > > However, before I sink time into implementing that fix: > > Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE > ... in core? > If the community feels this feature is a footgun or conceptually wrong fo= r > Postgres, I'd rather know now before spending more time on this. > > If the feature concept is sound, does the advisory lock approach seem lik= e > the right way to handle the concurrency safety here? > > Thanks, > Adam Brusselback > --0000000000007a3550064871a43d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey Adam,

Apologies for the delay, and = as promised on discord, I did a review of the current patch (cf 6305) and w= anted to share findings that line up with the thread=E2=80=99s design discu= ssion, plus one additional correctness bug that I could reproduce.

1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL = is built using the unique index metadata. The code currently uses indnatts = when building the ON Conflict (...) target list. That includes INCLUDE colu= mns, so for an index like:

CREATE UNIQUE INDEX ON = mv(id) INCLUDE (extra);
the generated statement becomes effective= ly ON CONFLICT (id, extra) ..., which fails with:
ERROR: there is= no unique or exclusion constraint matching the ON CONFLICT specification

The fix appears straightforward: use indnkeyat= ts (key attributes only) when generating the conflict target, and also when= deciding which columns are =E2=80=9Ckey=E2=80=9D for the UPDATE SET clause= . I=E2=80=99ve attached a minimal repro SQL script (repro_include_issue.sql= )

2. Another small test quality issue: the regress= ion script has a comment =E2=80=9CSubqueries -> Error=E2=80=9D but the e= xpected output shows no error for the schema-qualified subquery. There is n= o explicit check forbidding subqueries in transformRefreshWhereClause(), so= schema-qualified subqueries appear allowed.

Moving on t= o broader questions=C2=A0

=C2=A0
I believe the issue is that the= DELETE -> INSERT strategy leaves a consistency gap. Since we relied on = ROW EXCLUSIVE locks to allow concurrent reads, the moment we delete the row= s, we lose the physical lock on them. If a concurrent transaction inserts a= colliding row during that gap, the materialized view ends up inconsistent = with the base query (or hits a constraint violation).

=
Consistency gap in the non-concurrent mode matches what I=E2=80= =99d expect: with ROW EXCLUSIVE you allow concurrent readers/writers, and a= pure DELETE =E2=86=92 INSERT approach can create a window where the old tu= ple is gone and a concurrent session can insert a conflicting logical row.= =C2=A0

That said, I think it would help the patch = to explicitly define the intended safety model:
1. Is the goal to be saf= e against concurrent DML on base tables only (i.e., refresh sees a snapshot= and updates MV accordingly), or also to be safe against concurrent partial= refreshes and direct writes to the MV (when maintenance is enabled)?
2.= Should the non-concurrent partial refresh be =E2=80=9Cbest effort=E2=80=9D= like normal DML (user coordinates), or should it be =E2=80=9Cmaintenance-l= ike=E2=80=9D (serialized / logically safe by default)?

If the intent is =E2=80=9Csafe by default=E2=80=9D, I=E2=80=99d en= courage documenting very clearly what=E2=80=99s guaranteed, and adding regr= ession/README-style notes for footguns=C2=A0

From = a reviewer standpoint, I think the feature concept is sound and valuable, b= ut it needs a crisp statement of semantics and safety boundaries. The trick= y part is exactly what you called out: incremental refresh implies concurre= ncy questions that aren=E2=80=99t present with full rebuild + strong locks.=

I=E2=80=99m happy to keep reviewing iterations (especially around t= he advisory lock approach), and I=E2=80=99ll attach the reproduction script= s and notes I used.

As a possible staging approach= : it might be simplest to start with a conservative serialization model for= non-concurrent WHERE (while still allowing readers), and then iterate towa= rd finer-grained logical locking if/when needed for throughput.
<= br>

Thanks,
Dharin

=
On Sun= , Jan 4, 2026 at 3:56=E2=80=AFAM Adam Brusselback <adambrusselback@gmail.com>= wrote:
Hi all,

I've been running some more concurrency tests a= gainst this patch (specifically looking for race conditions), and I found a= flaw in the implementation for the=C2=A0 REFRESH ... WHERE ... mode (witho= ut CONCURRENTLY).

I believe the issue is that the DELETE -> INSER= T strategy leaves a consistency gap. Since we relied on ROW EXCLUSIVE locks= to allow concurrent reads, the moment we delete the rows, we lose the phys= ical lock on them. If a concurrent transaction inserts a colliding row duri= ng that gap, the materialized view ends up inconsistent with the base query= (or hits a constraint violation).

I initially was using SELECT ... = FOR UPDATE to lock the rows before modification, but that lock is (now that= I know) obviously lost when the row is deleted.

My plan is to repla= ce that row-locking strategy with transaction-level advisory locks inside t= he refresh logic:

Before the DELETE, run a SELECT pg_advisory_xact_l= ock(mv_oid, hashtext(ROW(unique_keys)::text)) for the rows matching the WHE= RE clause.

This effectively locks the "logical" ID of the = row, preventing concurrent refreshes on the same ID even while the physical= tuple is temporarily gone. Hash collisions should not have any correctness= issues that I can think of.

However, before I sink time into implem= enting that fix:

Is there general interest in having REFRESH MATERIA= LIZED VIEW ... WHERE ... in core?
If the community feels this feature is= a footgun or conceptually wrong for Postgres, I'd rather know now befo= re spending more time on this.

If the feature concept is sound, does= the advisory lock approach seem like the right way to handle the concurren= cy safety here?

Thanks,
Adam Brusselback
--0000000000007a3550064871a43d-- --0000000000007a3551064871a43f Content-Type: application/octet-stream; name="test_include_bug.sql" Content-Disposition: attachment; filename="test_include_bug.sql" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_mkfsu8eo0 CkRST1AgVEFCTEUgSUYgRVhJU1RTIHRfaW5jbHVkZSBDQVNDQURFOwpDUkVBVEUgVEFCTEUgdF9p bmNsdWRlICgKICAgIGlkIElOVCBQUklNQVJZIEtFWSwKICAgIGRhdGEgVEVYVCwKICAgIGV4dHJh IFRFWFQKKTsKCklOU0VSVCBJTlRPIHRfaW5jbHVkZSBWQUxVRVMKICAgICgxLCAnZGF0YTEnLCAn ZXh0cmExJyksCiAgICAoMiwgJ2RhdGEyJywgJ2V4dHJhMicpOwoKQ1JFQVRFIE1BVEVSSUFMSVpF RCBWSUVXIG12X2luY2x1ZGUgQVMgU0VMRUNUICogRlJPTSB0X2luY2x1ZGU7CgotLSBDcmVhdGUg aW5kZXggd2l0aCBJTkNMVURFIGNvbHVtbgpDUkVBVEUgVU5JUVVFIElOREVYIGlkeF9pbmNsdWRl IE9OIG12X2luY2x1ZGUoaWQpIElOQ0xVREUgKGV4dHJhKTsKClxlY2hvICdJbmRleCBzdHJ1Y3R1 cmU6JwpTRUxFQ1QKICAgIGluZG5hdHRzIGFzIHRvdGFsX2F0dHJzLAogICAgaW5kbmtleWF0dHMg YXMga2V5X2F0dHJzCkZST00gcGdfaW5kZXgKV0hFUkUgaW5kZXhyZWxpZCA9ICdpZHhfaW5jbHVk ZSc6OnJlZ2NsYXNzOwoKXGVjaG8gJ0F0dGVtcHRpbmcgcGFydGlhbCByZWZyZXNoIHdpdGggSU5D TFVERSBpbmRleC4uLicKVVBEQVRFIHRfaW5jbHVkZSBTRVQgZGF0YSA9ICd1cGRhdGVkMScsIGV4 dHJhID0gJ3VwZGF0ZWRfZXh0cmExJyBXSEVSRSBpZCA9IDE7Cgpcc2V0IE9OX0VSUk9SX1NUT1Ag MApSRUZSRVNIIE1BVEVSSUFMSVpFRCBWSUVXIG12X2luY2x1ZGUgV0hFUkUgaWQgPSAxOwpcc2V0 IE9OX0VSUk9SX1NUT1AgMQoKXGVjaG8gJycKXGVjaG8gJ0V4cGVjdGVkIGVycm9yOiAibm8gdW5p cXVlIG9yIGV4Y2x1c2lvbiBjb25zdHJhaW50IG1hdGNoaW5nIE9OIENPTkZMSUNUIicKXGVjaG8g J1Jvb3QgY2F1c2U6IENvZGUgdXNlcyBpbmRuYXR0cyAoMikgaW5zdGVhZCBvZiBpbmRua2V5YXR0 cyAoMSknClxlY2hvICdHZW5lcmF0ZXM6IE9OIENPTkZMSUNUIChpZCwgZXh0cmEpIGJ1dCBjb25z dHJhaW50IGlzIG9ubHkgb24gKGlkKScK --0000000000007a3551064871a43f--