public inbox for [email protected]
help / color / mirror / Atom feedPossible G2-item at SERIALIZABLE
10+ messages / 4 participants
[nested] [flat]
* Possible G2-item at SERIALIZABLE
@ 2026-05-22 16:44 Kyle Kingsbury <[email protected]>
0 siblings, 2 replies; 10+ messages in thread
From: Kyle Kingsbury @ 2026-05-22 16:44 UTC (permalink / raw)
To: [email protected]
Dear Postgres maintainers,
I think I may have found a violation of Serializability in PostgreSQL
18.4 (Debian 18.4-1.pgdg13+1); the version from the PostgreSQL repo.
## Schema
I create three tables, `txn0`, `txn1`, and `txn2`. Each has identical
structure: an integer primary key `id`, a secondary key `sk` (not used
here), and a text `val`.
create table if not exists txn0 (
id int not null primary key,
sk int not null,
val text
);
## Transactions
All transactions run at SERIALIZABLE isolation, using JDBC transactions
or explicit BEGIN/COMMIT; both seem to produce the problem. Transactions
are randomly generated, and perform a mix of either appends or reads by
primary key. Reads are simple:
select (val) from txn1 where id = ?;
We append unique integers to `val` using `INSERT ... ON CONFLICT` and
the `CONCAT` function, like so:
insert into txn1 as t (id, sk, val) values (?, ?, ?) on conflict (id) do
update set val = CONCAT(t.val, ',', ?) where t.id = ?;
## Phenomenon
Consider this test run:
https://s3.amazonaws.com/jepsen.io/analyses/postgres-18.4/g2-item.zip
This test performed two transactions concurrently, using separate clients.
T1: [[:r 491 [1 2 3]] [:r 597 [2 3 4]] [:append 633 3] [:r 632 nil]]
T2: [[:append 632 2] [:r 630 [1 2 3 4]] [:r 633 nil] [:r 632 [2]]]
T1 performed two unrelated reads, then appended 3 to id 633, and read id
632, finding nothing. T2 appended 2 to id 632, performed an unrelated
read, then read key 633, finding nothing, then read its own append to 632.
The problem is that there is no way these transactions could execute in
(apparent) total order, because neither observed the other's write. Each
has a read-write anti-dependency on the other. This is one of the
canonical "dangerous structures" that Postgres' SSI is supposed to prevent.
Here are the exact SQL statements and their responses, as observed by
the client, ordered by the time their responses arrived.
T2: ["insert into txn2 as t (id, sk, val) values (?, ?, ?) on conflict
(id) do update set val = CONCAT(t.val, ',', ?) where t.id = ?" 632 632
"2" "2" 632] -> [#:next.jdbc{:update-count 1}]
T2: ["select (val) from txn0 where id = ? " 630] -> [{:val "1,2,3,4"}]
T2: ["select (val) from txn0 where id = ? " 633] -> []
T1: ["select (val) from txn1 where id = ? " 491] -> [{:val "1,2,3"}]
T1: ["select (val) from txn2 where id = ? " 597] -> [{:val "2,3,4"}]
T2: ["select (val) from txn2 where id = ? " 632] -> [{:val "2"}]
T1: ["insert into txn0 as t (id, sk, val) values (?, ?, ?) on conflict
(id) do update set val = CONCAT(t.val, ',', ?) where t.id = ?" 633 633
"3" "3" 633] -> [#:next.jdbc{:update-count 1}]
T1: ["select (val) from txn2 where id = ? " 632] -> []
For detailed timing and process information, search for these strings in
`jepsen.log`.
## Manual Repro
I've tried these same commands, and several other anomalies, at the psql
shell, but I can't seem to reproduce it there--perhaps the order is
slightly different due to request-response delay, or I'm getting the
BEGIN/COMMIT timing (which is missing from this log) wrong, or maybe it
depends on other transactions (e.g. for the update vs insert state), or
it's a probabilistic bug. I am able to reproduce it at much slower
timings--e.g. with 100-millisecond delays between statements in each
transaction. However, of the examples I've investigated by hand, each
has involved COMMITs executed at just about the same time. That might be
a hint?
## Test
My test runs in Clojure, a Lisp on the JVM. I use the Postgres JDBC
driver, org.postgresql/postgresql "42.7.11", and next.jdbc "1.3.1093".
The test itself lives at:
https://github.com/jepsen-io/postgres.
With commit 6c2bcc3f43085d3b0f21a5d78ba2b0e0e559ea8f, you can run:
lein run test-all -n n1 -w append --time-limit 30 --concurrency 30 --
log-sql --isolation serializable --max-writes-per-key 4 --leave-db-
running --key-types primary --upsert-types on-conflict --test-count 20
--nemesis none
On my machine, this spits out roughly one G2-item anomaly every 20 seconds.
Its transactions are generated by jepsen.sql, which lives here:
https://github.com/jepsen-io/sql
This is essentially the same test I ran which found a Serializability
bug in 12.3: https://jepsen.io/analyses/postgresql-12.3. I've just
broken it up into two libraries and added a few complications.
The workload which produces these results is:
https://github.com/jepsen-io/sql/
blob/6e34b76e2ac6a1c3edb5a9cab1c835eda4ca4c5e/src/jepsen/sql/append.clj.
I am relatively confident that isolation levels and transactions are
being used correctly, because when I adjust the isolation level from
Serializable to (e.g.) Read Committed, I see vastly different anomalies.
I can also reproduce the problem both using JDBC's transaction calls,
and with explicit BEGIN; SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE; ...; COMMIT;, which makes me think that it's not a bug in
the way next.jdbc or the JDBC driver handle transactions.
## System Information
This is a basically stock Debian 13 LXC container.
$ uname -a
Linux n1 6.17.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon
May 11 10:30:58 UTC 2 x86_64 GNU/Linux
The host machine runs Linux Mint 22.3 Zena. It's a Threadripper 7980X,
if that's relevant--happy to provide additional hardware details.
I install `postgresql-common` from the usual Debian repos, then run `/
usr/share/postgresql-common/pgdg/apt.postgresql.org.sh` to add
Postgres's official repos, then `apt install postgresql-18 postgresql-
client-18`, which as of 2026-05-22, installs Postgresql 18.4 (Debian
18.4-1.pgdg13+1). I leave the configuration files essentially stock,
except for enabling network access and setting `autovacuum_naptime = 5s`.
--
If y'all have any luck reproducing this, I'd love to hear about it!
Yours truly,
--Kyle
Attachments:
[image/png] pg-g2-item.png (19.3K, 2-pg-g2-item.png)
download | view image
[image/svg+xml] pg-g2-item.svg (3.2K, 3-pg-g2-item.svg)
download | view image
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-05-30 18:47 Andrey Borodin <[email protected]>
parent: Kyle Kingsbury <[email protected]>
1 sibling, 1 reply; 10+ messages in thread
From: Andrey Borodin @ 2026-05-30 18:47 UTC (permalink / raw)
To: Kyle Kingsbury <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>; [email protected]; Kirk Wolak <[email protected]>
> On 22 May 2026, at 21:44, Kyle Kingsbury <[email protected]> wrote:
>
> If y'all have any luck reproducing this, I'd love to hear about it!
It looks like I've reproduced something similar with savepoints on HEAD and
REL_18_STABLE. I can't tell if it's really all what you observe. But, probably, part of it.
If that error is raised inside a subtransaction, ROLLBACK TO SAVEPOINT swallows it
and the transaction is free to COMMIT, defeating serializable isolation.
I did not try beyond 18, but I think it always has been there.
PFA attached isolation tester and hand-wavy fix. But I suspect there are more G-items
around.
Full Jepsen test you proposed is currently tried by Nik's machines(in cc).
Best regards, Andrey Borodin.
Attachments:
[application/octet-stream] v1-0001-Add-isolation-test-SSI-serialization-failure-swal.patch (4.2K, 2-v1-0001-Add-isolation-test-SSI-serialization-failure-swal.patch)
download | inline diff:
From 5f0e1dce5974c48eaecb855d4676e497abfc0b1a Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Sat, 30 May 2026 23:07:39 +0500
Subject: [PATCH v1 1/2] Add isolation test: SSI serialization failure
swallowed by SAVEPOINT
If that error is raised inside a subtransaction, ROLLBACK TO SAVEPOINT
swallows it and the transaction is free to COMMIT, defeating
serializable isolation.
---
.../expected/serializable-savepoint.out | 25 +++++++++
src/test/isolation/isolation_schedule | 2 +
.../specs/serializable-savepoint.spec | 52 +++++++++++++++++++
3 files changed, 79 insertions(+)
create mode 100644 src/test/isolation/expected/serializable-savepoint.out
create mode 100644 src/test/isolation/specs/serializable-savepoint.spec
diff --git a/src/test/isolation/expected/serializable-savepoint.out b/src/test/isolation/expected/serializable-savepoint.out
new file mode 100644
index 00000000000..0dd4f787efd
--- /dev/null
+++ b/src/test/isolation/expected/serializable-savepoint.out
@@ -0,0 +1,25 @@
+Parsed test spec with 3 sessions
+
+starting permutation: r1 w2 w1 c1 sp2 r2 rb2 c2 rall
+step r1: SELECT v FROM t WHERE id = 2;
+v
+-
+0
+(1 row)
+
+step w2: UPDATE t SET v = 1 WHERE id = 2;
+step w1: UPDATE t SET v = 1 WHERE id = 1;
+step c1: COMMIT;
+step sp2: SAVEPOINT f;
+step r2: SELECT v FROM t WHERE id = 1;
+ERROR: could not serialize access due to read/write dependencies among transactions
+step rb2: ROLLBACK TO SAVEPOINT f;
+step c2: COMMIT;
+ERROR: could not serialize access due to read/write dependencies among transactions
+step rall: SELECT id, v FROM t ORDER BY id;
+id|v
+--+-
+ 1|1
+ 2|0
+(2 rows)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 15c33fad4c5..62a334ad3d9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -127,3 +127,5 @@ test: matview-write-skew
test: lock-nowait
test: for-portion-of
test: ddl-dependency-locking
+
+test: serializable-savepoint
diff --git a/src/test/isolation/specs/serializable-savepoint.spec b/src/test/isolation/specs/serializable-savepoint.spec
new file mode 100644
index 00000000000..de61b615529
--- /dev/null
+++ b/src/test/isolation/specs/serializable-savepoint.spec
@@ -0,0 +1,52 @@
+# Test that a serialization failure raised while *checking* a read (the
+# "conflict out to pivot, during read" cancellation) cannot be discarded by
+# rolling back to a SAVEPOINT.
+#
+# s1 and s2 form the classic write-skew dangerous structure under SERIALIZABLE:
+# s1 reads row 2 and writes row 1
+# s2 writes row 2 and reads row 1
+# s1 commits first. When s2 then reads row 1 it is correctly identified as the
+# pivot of a dangerous structure and PostgreSQL raises
+# ERROR: could not serialize access ...
+# (Canceled on conflict out to pivot ..., during read).
+#
+# Crucially, s2's *write* to row 2 happened BEFORE the savepoint, so it is not
+# undone. s2 only wraps the offending READ in a SAVEPOINT, rolls back to it
+# (swallowing the error) and commits. That COMMIT must fail: allowing it
+# leaves both s1's and s2's writes committed, which is the write-skew anomaly
+# SSI is supposed to prevent (no serial order exists).
+
+setup
+{
+ CREATE TABLE t (id int PRIMARY KEY, v int);
+ INSERT INTO t VALUES (1, 0), (2, 0);
+}
+
+teardown
+{
+ DROP TABLE t;
+}
+
+session s1
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step r1 { SELECT v FROM t WHERE id = 2; }
+step w1 { UPDATE t SET v = 1 WHERE id = 1; }
+step c1 { COMMIT; }
+
+session s2
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step w2 { UPDATE t SET v = 1 WHERE id = 2; }
+step sp2 { SAVEPOINT f; }
+step r2 { SELECT v FROM t WHERE id = 1; }
+step rb2 { ROLLBACK TO SAVEPOINT f; }
+step c2 { COMMIT; }
+
+# Used to observe the final committed state.
+session s3
+step rall { SELECT id, v FROM t ORDER BY id; }
+
+# s2 takes its snapshot at w2 (before s1 commits), writes row 2, then after s1
+# commits it reads row 1 inside a savepoint and is cancelled. After rolling
+# back to the savepoint it must not be able to commit. If it does (the bug),
+# rall shows both rows updated -- the non-serializable write-skew outcome.
+permutation r1 w2 w1 c1 sp2 r2 rb2 c2 rall
--
2.50.1 (Apple Git-155)
[application/octet-stream] v1-0002-Doom-serializable-transaction-before-raising-seri.patch (2.5K, 3-v1-0002-Doom-serializable-transaction-before-raising-seri.patch)
download | inline diff:
From 28762ab32aaf01a07d8394886d6464de3a91cdb7 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Sat, 30 May 2026 23:07:47 +0500
Subject: [PATCH v1 2/2] Doom serializable transaction before raising
serialization error
OnConflict_CheckForSerializationFailure() cancels the current
serializable transaction with ereport(ERROR) when it is identified as a
pivot, both on the "during write" and the "during read" paths. In both
cases it released SerializableXactHashLock and raised the error without
setting SXACT_FLAG_DOOMED on MySerializableXact first; the writer is only
doomed afterwards, which the ereport() longjmp skips for the current
transaction.
If the error is caught by a subtransaction abort (ROLLBACK TO
SAVEPOINT), the transaction is not doomed and is allowed to COMMIT,
which violates serializability. Set SXACT_FLAG_DOOMED on ourselves
before raising the error so that PreCommit_CheckForSerializationFailure()
catches the transaction at commit time even if the error was swallowed.
---
src/backend/storage/lmgr/predicate.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 0ae85b7d5b4..461e899727b 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -4589,6 +4589,12 @@ OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
*/
if (MySerializableXact == writer)
{
+ /*
+ * Mark ourselves doomed before raising the error. Otherwise a
+ * subtransaction abort (ROLLBACK TO SAVEPOINT) could swallow this
+ * error and let the transaction commit anyway, defeating SSI.
+ */
+ MySerializableXact->flags |= SXACT_FLAG_DOOMED;
LWLockRelease(SerializableXactHashLock);
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
@@ -4598,10 +4604,13 @@ OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
}
else if (SxactIsPrepared(writer))
{
- LWLockRelease(SerializableXactHashLock);
-
/* if we're not the writer, we have to be the reader */
Assert(MySerializableXact == reader);
+
+ /* See comment above: doom ourselves before raising the error. */
+ MySerializableXact->flags |= SXACT_FLAG_DOOMED;
+ LWLockRelease(SerializableXactHashLock);
+
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("could not serialize access due to read/write dependencies among transactions"),
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-05-31 13:19 Andrey Borodin <[email protected]>
parent: Kyle Kingsbury <[email protected]>
1 sibling, 1 reply; 10+ messages in thread
From: Andrey Borodin @ 2026-05-31 13:19 UTC (permalink / raw)
To: Kyle Kingsbury <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>
Hello Kyle!
> On 22 May 2026, at 21:44, Kyle Kingsbury <[email protected]> wrote:
>
> I am relatively confident that isolation levels and transactions are
> being used correctly, because when I adjust the isolation level from
> Serializable to (e.g.) Read Committed, I see vastly different anomalies.
> I can also reproduce the problem both using JDBC's transaction calls,
> and with explicit BEGIN; SET TRANSACTION ISOLATION LEVEL
> SERIALIZABLE; ...; COMMIT;, which makes me think that it's not a bug in
> the way next.jdbc or the JDBC driver handle transactions.
I agree the multi-statement path looks correct, and I don't think next.jdbc or the
driver are at fault either. But I think there's a narrower issue that both of your
checks leave untouched: single-operation transactions are never wrapped in
a transaction at all.
In append's invoke!, a transaction is only opened when it has more than one op [0].
Could you confirm whether you still observe G2-item anomalies with a server
configured default_transaction_isolation = 'serializable' (so that the
single-statement operations are certainly Serializable)? If they persist under that
setting I'll dig further...
Best regards, Andrey Borodin.
[0] https://github.com/jepsen-io/sql/blob/6e34b76e2ac6a1c3edb5a9cab1c835eda4ca4c5e/src/jepsen/sql/append...
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-06-01 01:49 Kyle Kingsbury <[email protected]>
parent: Andrey Borodin <[email protected]>
0 siblings, 1 reply; 10+ messages in thread
From: Kyle Kingsbury @ 2026-06-01 01:49 UTC (permalink / raw)
To: Andrey Borodin <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>
On 5/31/26 08:19, Andrey Borodin wrote:
> Could you confirm whether you still observe G2-item anomalies with a server
> configured default_transaction_isolation = 'serializable' (so that the
> single-statement operations are certainly Serializable)? If they persist under that
> setting I'll dig further...
<sigh>
Yes, this was it, thank you. I've been reworking a whole bunch of things
to make these tests portable between DBs and totally lost the
session-wide transaction isolation. This behavior disappears with:
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Thank you kindly, Andrey. :-)
--Kyle
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-06-01 06:18 Andrey Borodin <[email protected]>
parent: Kyle Kingsbury <[email protected]>
0 siblings, 1 reply; 10+ messages in thread
From: Andrey Borodin @ 2026-06-01 06:18 UTC (permalink / raw)
To: Kyle Kingsbury <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>
> On 1 Jun 2026, at 06:49, Kyle Kingsbury <[email protected]> wrote:
>
> This behavior disappears
Glad that was it. There's a separate case I'd still like your eyes on: savepoints.
README-SSI says predicate locks must survive a subtransaction rollback, because the
subxact's reads still affect what the top transaction writes. By that same reasoning,
shouldn't a serialization failure raised while checking such a read DooM the top
transaction, rather than being swallowed by ROLLBACK TO SAVEPOINT?
Best regards, Andrey Borodin.
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-06-01 10:11 Andrey Borodin <[email protected]>
parent: Andrey Borodin <[email protected]>
0 siblings, 0 replies; 10+ messages in thread
From: Andrey Borodin @ 2026-06-01 10:11 UTC (permalink / raw)
To: Kyle Kingsbury <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>
> On 1 Jun 2026, at 11:18, Andrey Borodin <[email protected]> wrote:
>
> There's a separate case
Two more SSI false-negatives in the same area, found by Nik's machines
and Mark-bot while triaging original report.
INSERT ... ON CONFLICT reads the conflicting ("arbiter") row to decide
what to do, but doesn't take an SIREAD predicate lock on it. So when
the statement ultimately writes no tuple, that read leaves no trace for
SSI and a concurrent modification of the same row can produce a
non-serializable all-commit result:
* ON CONFLICT DO UPDATE ... WHERE <false> — the no-op update branch; the
conflict row is observed but not updated.
* ON CONFLICT DO NOTHING with a concurrent DELETE of the conflict row.
Replacing the ON CONFLICT with a plain SELECT of the same row aborts
correctly, which is what convinced me the schedules are genuinely
non-serializable and the gap is in the ON CONFLICT probe path.
Both come from the same place - check_exclusion_or_unique_constraint() in
execIndexing.c finds the arbiter tuple but never calls PredicateLockTID().
Adding that lock there fixes both.
There's also one more false negative, but it is in -DTEST_SUMMARIZE_SERIAL
and IMO worth working only when we deal with what we have in production
cases.
Best regards, Andrey Borodin.
Attachments:
[application/octet-stream] 0001-Add-isolation-tests-for-ON-CONFLICT-no-op-SSI-false-.patch (10.3K, 2-0001-Add-isolation-tests-for-ON-CONFLICT-no-op-SSI-false-.patch)
download | inline diff:
From fd43ff3f82d4edb5df6ca59e13aeeaa12c296da9 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Mon, 1 Jun 2026 12:31:34 +0500
Subject: [PATCH 1/2] Add isolation tests for ON CONFLICT no-op SSI false
negatives
INSERT ... ON CONFLICT DO NOTHING, and DO UPDATE with an unsatisfied
WHERE, read the conflicting row but take no SIREAD lock, so a concurrent
UPDATE/DELETE of it can escape write-skew detection under SERIALIZABLE.
Each spec contrasts a plain-SELECT control with the ON CONFLICT variant.
The expected output encodes the serializable result (one transaction
cancelled), so these tests fail until the accompanying fix is applied.
---
.../serializable-onconflict-do-nothing.out | 68 ++++++++++++++++++
.../serializable-onconflict-noop-update.out | 69 +++++++++++++++++++
src/test/isolation/isolation_schedule | 3 +
.../serializable-onconflict-do-nothing.spec | 42 +++++++++++
.../serializable-onconflict-noop-update.spec | 42 +++++++++++
5 files changed, 224 insertions(+)
create mode 100644 src/test/isolation/expected/serializable-onconflict-do-nothing.out
create mode 100644 src/test/isolation/expected/serializable-onconflict-noop-update.out
create mode 100644 src/test/isolation/specs/serializable-onconflict-do-nothing.spec
create mode 100644 src/test/isolation/specs/serializable-onconflict-noop-update.spec
diff --git a/src/test/isolation/expected/serializable-onconflict-do-nothing.out b/src/test/isolation/expected/serializable-onconflict-do-nothing.out
new file mode 100644
index 00000000000..45ef582465d
--- /dev/null
+++ b/src/test/isolation/expected/serializable-onconflict-do-nothing.out
@@ -0,0 +1,68 @@
+Parsed test spec with 3 sessions
+
+starting permutation: reset s1read s2r2 s1w2 s2w1 s1c s2c ck
+step reset: DELETE FROM noc; INSERT INTO noc VALUES (1, 0), (2, 0);
+step s1read: SELECT v FROM noc WHERE k = 1;
+v
+-
+0
+(1 row)
+
+step s2r2: SELECT v FROM noc WHERE k = 2;
+v
+-
+0
+(1 row)
+
+step s1w2: UPDATE noc SET v = 1 WHERE k = 2;
+step s2w1: DELETE FROM noc WHERE k = 1;
+step s1c: COMMIT;
+step s2c: COMMIT;
+ERROR: could not serialize access due to read/write dependencies among transactions
+step ck: SELECT k, v FROM noc ORDER BY k;
+ SELECT CASE WHEN NOT EXISTS (SELECT 1 FROM noc WHERE k = 1)
+ AND (SELECT v FROM noc WHERE k = 2) = 1
+ THEN 'bad_both_committed'
+ ELSE 'ok_aborted_or_serial' END AS invariant;
+k|v
+-+-
+1|0
+2|1
+(2 rows)
+
+invariant
+--------------------
+ok_aborted_or_serial
+(1 row)
+
+
+starting permutation: reset s1noop s2r2 s1w2 s2w1 s1c s2c ck
+step reset: DELETE FROM noc; INSERT INTO noc VALUES (1, 0), (2, 0);
+step s1noop: INSERT INTO noc(k, v) VALUES (1, 99) ON CONFLICT (k) DO NOTHING;
+step s2r2: SELECT v FROM noc WHERE k = 2;
+v
+-
+0
+(1 row)
+
+step s1w2: UPDATE noc SET v = 1 WHERE k = 2;
+step s2w1: DELETE FROM noc WHERE k = 1;
+step s1c: COMMIT;
+step s2c: COMMIT;
+ERROR: could not serialize access due to read/write dependencies among transactions
+step ck: SELECT k, v FROM noc ORDER BY k;
+ SELECT CASE WHEN NOT EXISTS (SELECT 1 FROM noc WHERE k = 1)
+ AND (SELECT v FROM noc WHERE k = 2) = 1
+ THEN 'bad_both_committed'
+ ELSE 'ok_aborted_or_serial' END AS invariant;
+k|v
+-+-
+1|0
+2|1
+(2 rows)
+
+invariant
+--------------------
+ok_aborted_or_serial
+(1 row)
+
diff --git a/src/test/isolation/expected/serializable-onconflict-noop-update.out b/src/test/isolation/expected/serializable-onconflict-noop-update.out
new file mode 100644
index 00000000000..af8ce068cc3
--- /dev/null
+++ b/src/test/isolation/expected/serializable-onconflict-noop-update.out
@@ -0,0 +1,69 @@
+Parsed test spec with 3 sessions
+
+starting permutation: reset s1read s2r2 s1w2 s2w1 s1c s2c ck
+step reset: DELETE FROM noc; INSERT INTO noc VALUES (1, 0), (2, 0);
+step s1read: SELECT v FROM noc WHERE k = 1;
+v
+-
+0
+(1 row)
+
+step s2r2: SELECT v FROM noc WHERE k = 2;
+v
+-
+0
+(1 row)
+
+step s1w2: UPDATE noc SET v = 1 WHERE k = 2;
+step s2w1: UPDATE noc SET v = 42 WHERE k = 1;
+step s1c: COMMIT;
+step s2c: COMMIT;
+ERROR: could not serialize access due to read/write dependencies among transactions
+step ck: SELECT k, v FROM noc ORDER BY k;
+ SELECT CASE WHEN (SELECT v FROM noc WHERE k = 1) = 42
+ AND (SELECT v FROM noc WHERE k = 2) = 1
+ THEN 'bad_both_committed'
+ ELSE 'ok_aborted_or_serial' END AS invariant;
+k|v
+-+-
+1|0
+2|1
+(2 rows)
+
+invariant
+--------------------
+ok_aborted_or_serial
+(1 row)
+
+
+starting permutation: reset s1noop s2r2 s1w2 s2w1 s1c s2c ck
+step reset: DELETE FROM noc; INSERT INTO noc VALUES (1, 0), (2, 0);
+step s1noop: INSERT INTO noc(k, v) VALUES (1, 99) ON CONFLICT (k) DO UPDATE SET v = 100 WHERE noc.v = 42;
+step s2r2: SELECT v FROM noc WHERE k = 2;
+v
+-
+0
+(1 row)
+
+step s1w2: UPDATE noc SET v = 1 WHERE k = 2;
+step s2w1: UPDATE noc SET v = 42 WHERE k = 1; <waiting ...>
+step s1c: COMMIT;
+step s2w1: <... completed>
+ERROR: could not serialize access due to read/write dependencies among transactions
+step s2c: COMMIT;
+step ck: SELECT k, v FROM noc ORDER BY k;
+ SELECT CASE WHEN (SELECT v FROM noc WHERE k = 1) = 42
+ AND (SELECT v FROM noc WHERE k = 2) = 1
+ THEN 'bad_both_committed'
+ ELSE 'ok_aborted_or_serial' END AS invariant;
+k|v
+-+-
+1|0
+2|1
+(2 rows)
+
+invariant
+--------------------
+ok_aborted_or_serial
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 15c33fad4c5..313e8abba6f 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -127,3 +127,6 @@ test: matview-write-skew
test: lock-nowait
test: for-portion-of
test: ddl-dependency-locking
+
+test: serializable-onconflict-noop-update
+test: serializable-onconflict-do-nothing
diff --git a/src/test/isolation/specs/serializable-onconflict-do-nothing.spec b/src/test/isolation/specs/serializable-onconflict-do-nothing.spec
new file mode 100644
index 00000000000..da8cd10d3f5
--- /dev/null
+++ b/src/test/isolation/specs/serializable-onconflict-do-nothing.spec
@@ -0,0 +1,42 @@
+# Write-skew via INSERT ... ON CONFLICT DO NOTHING with a concurrent delete of
+# the conflicting row under SERIALIZABLE.
+#
+# s1 examines k=1 through the ON CONFLICT arbiter and writes k=2
+# s2 reads k=2 and deletes k=1
+#
+# This is a dangerous structure, so SSI must cancel one transaction.
+#
+# The question this exercises: the arbiter's conflict probe is a read that
+# decides the statement's outcome -- exactly like the plain SELECT in the
+# control permutation -- so it must participate in SSI and take an SIREAD lock.
+# Until that read was predicate-locked, only the control was cancelled while
+# the ON CONFLICT DO NOTHING variant committed a non-serializable result.
+
+setup { CREATE TABLE noc (k int PRIMARY KEY, v int NOT NULL); }
+teardown { DROP TABLE noc; }
+
+session s1
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step s1read { SELECT v FROM noc WHERE k = 1; }
+step s1noop { INSERT INTO noc(k, v) VALUES (1, 99) ON CONFLICT (k) DO NOTHING; }
+step s1w2 { UPDATE noc SET v = 1 WHERE k = 2; }
+step s1c { COMMIT; }
+
+session s2
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step s2r2 { SELECT v FROM noc WHERE k = 2; }
+step s2w1 { DELETE FROM noc WHERE k = 1; }
+step s2c { COMMIT; }
+
+# Non-transactional helper: resets the rows before each permutation and reports
+# the final committed state plus a serializability invariant.
+session ctl
+step reset { DELETE FROM noc; INSERT INTO noc VALUES (1, 0), (2, 0); }
+step ck { SELECT k, v FROM noc ORDER BY k;
+ SELECT CASE WHEN NOT EXISTS (SELECT 1 FROM noc WHERE k = 1)
+ AND (SELECT v FROM noc WHERE k = 2) = 1
+ THEN 'bad_both_committed'
+ ELSE 'ok_aborted_or_serial' END AS invariant; }
+
+permutation reset s1read s2r2 s1w2 s2w1 s1c s2c ck
+permutation reset s1noop s2r2 s1w2 s2w1 s1c s2c ck
diff --git a/src/test/isolation/specs/serializable-onconflict-noop-update.spec b/src/test/isolation/specs/serializable-onconflict-noop-update.spec
new file mode 100644
index 00000000000..44a52f4953c
--- /dev/null
+++ b/src/test/isolation/specs/serializable-onconflict-noop-update.spec
@@ -0,0 +1,42 @@
+# Write-skew via INSERT ... ON CONFLICT DO UPDATE with an unsatisfied WHERE
+# (a no-op update) under SERIALIZABLE.
+#
+# s1 examines k=1 through the ON CONFLICT arbiter and writes k=2
+# s2 reads k=2 and writes k=1
+#
+# This is a dangerous structure, so SSI must cancel one transaction.
+#
+# The question this exercises: the arbiter's conflict probe is a read that
+# decides the statement's outcome -- exactly like the plain SELECT in the
+# control permutation -- so it must participate in SSI and take an SIREAD lock.
+# Until that read was predicate-locked, only the control was cancelled while
+# the ON CONFLICT no-op variant committed a non-serializable result.
+
+setup { CREATE TABLE noc (k int PRIMARY KEY, v int NOT NULL); }
+teardown { DROP TABLE noc; }
+
+session s1
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step s1read { SELECT v FROM noc WHERE k = 1; }
+step s1noop { INSERT INTO noc(k, v) VALUES (1, 99) ON CONFLICT (k) DO UPDATE SET v = 100 WHERE noc.v = 42; }
+step s1w2 { UPDATE noc SET v = 1 WHERE k = 2; }
+step s1c { COMMIT; }
+
+session s2
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step s2r2 { SELECT v FROM noc WHERE k = 2; }
+step s2w1 { UPDATE noc SET v = 42 WHERE k = 1; }
+step s2c { COMMIT; }
+
+# Non-transactional helper: resets the rows before each permutation and reports
+# the final committed state plus a serializability invariant.
+session ctl
+step reset { DELETE FROM noc; INSERT INTO noc VALUES (1, 0), (2, 0); }
+step ck { SELECT k, v FROM noc ORDER BY k;
+ SELECT CASE WHEN (SELECT v FROM noc WHERE k = 1) = 42
+ AND (SELECT v FROM noc WHERE k = 2) = 1
+ THEN 'bad_both_committed'
+ ELSE 'ok_aborted_or_serial' END AS invariant; }
+
+permutation reset s1read s2r2 s1w2 s2w1 s1c s2c ck
+permutation reset s1noop s2r2 s1w2 s2w1 s1c s2c ck
--
2.50.1 (Apple Git-155)
[application/octet-stream] 0002-Predicate-lock-the-conflicting-row-in-INSERT-.-ON-CO.patch (2.8K, 3-0002-Predicate-lock-the-conflicting-row-in-INSERT-.-ON-CO.patch)
download | inline diff:
From 17c2e40d2c2bb8a65fa9e2ca69de8db8cc68ba93 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Mon, 1 Jun 2026 12:31:34 +0500
Subject: [PATCH 2/2] Predicate-lock the conflicting row in INSERT ... ON
CONFLICT
Under SERIALIZABLE the arbiter probe reads the conflicting row to decide
the statement's outcome but took no SIREAD lock. When ON CONFLICT then
writes no tuple (DO NOTHING, or DO UPDATE with an unsatisfied WHERE), a
concurrent UPDATE/DELETE of that row missed the rw-antidependency and a
write skew could commit. Lock the tuple where the arbiter finds it.
---
src/backend/executor/execIndexing.c | 32 +++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index eb383812901..d49e87a0eda 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -108,12 +108,14 @@
#include "access/genam.h"
#include "access/relscan.h"
+#include "access/sysattr.h"
#include "access/tableam.h"
#include "access/xact.h"
#include "catalog/index.h"
#include "executor/executor.h"
#include "nodes/nodeFuncs.h"
#include "storage/lmgr.h"
+#include "storage/predicate.h"
#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/multirangetypes.h"
@@ -906,9 +908,39 @@ retry:
*/
if (violationOK)
{
+ Datum xminDatum;
+ bool xminIsnull;
+ TransactionId xmin;
+
conflict = true;
if (conflictTid)
*conflictTid = existing_slot->tts_tid;
+
+ /*
+ * This conflicting row determined the outcome of the INSERT ...
+ * ON CONFLICT, so for serializability it was read just as a SELECT
+ * of it would be. Record an SIREAD lock so that a concurrent
+ * modification of the row creates the necessary rw-antidependency,
+ * even when ON CONFLICT writes no tuple (DO NOTHING, or DO UPDATE
+ * with an unsatisfied WHERE). A no-op outside SERIALIZABLE.
+ *
+ * XXX A reviewer familiar with predicate.c should confirm the
+ * xmin handling here. slot_getsysattr() returns the raw xmin
+ * (HeapTupleHeaderGetRawXmin), while the other PredicateLockTID()
+ * call sites pass the frozen-aware HeapTupleHeaderGetXmin(). This
+ * value only feeds the "did this xact write the tuple" early-out
+ * in PredicateLockTID(), and a frozen tuple can never belong to the
+ * current transaction, so the raw value should be equivalent here
+ * -- but the inconsistency with the other call sites is worth a
+ * second look.
+ */
+ xminDatum = slot_getsysattr(existing_slot,
+ MinTransactionIdAttributeNumber,
+ &xminIsnull);
+ Assert(!xminIsnull);
+ xmin = DatumGetTransactionId(xminDatum);
+ PredicateLockTID(heap, &existing_slot->tts_tid,
+ estate->es_snapshot, xmin);
break;
}
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-06-01 22:00 Zsolt Parragi <[email protected]>
parent: Andrey Borodin <[email protected]>
0 siblings, 1 reply; 10+ messages in thread
From: Zsolt Parragi @ 2026-06-01 22:00 UTC (permalink / raw)
To: [email protected]
+ /*
+ * Mark ourselves doomed before raising the error. Otherwise a
+ * subtransaction abort (ROLLBACK TO SAVEPOINT) could swallow this
+ * error and let the transaction commit anyway, defeating SSI.
+ */
+ MySerializableXact->flags |= SXACT_FLAG_DOOMED;
LWLockRelease(SerializableXactHashLock);
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
I wonder if it would make sense to introduce a
"DoomMyselfAndRaiseSerializationFailure" helper for this, while fixing
all occurrences we can find? That would make it more explicit, and
also less repeated.
> PFA attached isolation tester and hand-wavy fix. But I suspect there are more G-items
> around.
There are at least 2 more reproducible with the isolation tester in
CheckForSerializableConflictOut ("conflict out to old pivot %u", and
"conflict out to old committed transaction %u" directly below it).
Probably "Canceled on conflict out to old pivot." also should have the
same changes?
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-06-02 08:18 Andrey Borodin <[email protected]>
parent: Zsolt Parragi <[email protected]>
0 siblings, 1 reply; 10+ messages in thread
From: Andrey Borodin @ 2026-06-02 08:18 UTC (permalink / raw)
To: Zsolt Parragi <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>
> On 2 Jun 2026, at 03:00, Zsolt Parragi <[email protected]> wrote:
>
> + /*
> + * Mark ourselves doomed before raising the error. Otherwise a
> + * subtransaction abort (ROLLBACK TO SAVEPOINT) could swallow this
> + * error and let the transaction commit anyway, defeating SSI.
> + */
> + MySerializableXact->flags |= SXACT_FLAG_DOOMED;
> LWLockRelease(SerializableXactHashLock);
> ereport(ERROR,
> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>
> I wonder if it would make sense to introduce a
> "DoomMyselfAndRaiseSerializationFailure" helper for this, while fixing
> all occurrences we can find? That would make it more explicit, and
> also less repeated.
>
>> PFA attached isolation tester and hand-wavy fix. But I suspect there are more G-items
>> around.
>
> There are at least 2 more reproducible with the isolation tester in
> CheckForSerializableConflictOut ("conflict out to old pivot %u", and
> "conflict out to old committed transaction %u" directly below it).
> Probably "Canceled on conflict out to old pivot." also should have the
> same changes?
Thanks for looking into this!
I agree fix needs some refinement and can be improved.
However, let's settle on the bug first.
Do you think that recovering serialization error with ROLLBACK TO SAVEPOINT is a bug?
Honestly, I'm not entirely convinced myself. AFAIU SSI docs do not describe this case
clearly.
Best regards, Andrey Borodin.
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-06-02 23:03 Zsolt Parragi <[email protected]>
parent: Andrey Borodin <[email protected]>
0 siblings, 1 reply; 10+ messages in thread
From: Zsolt Parragi @ 2026-06-02 23:03 UTC (permalink / raw)
To: [email protected]
Hello
> AFAIU SSI docs do not describe this case clearly.
I agree on that part.
> Do you think that recovering serialization error with ROLLBACK TO SAVEPOINT is a bug?
It does seem like a bug to me.
s1 reads row 2, writes row 1, commits first
s2 writes row 2, then reads row 1 inside a savepoint which gets rolled back
s1 --> s2 (s1 read row 2 before s2 wrote it)
s2 --> s1 (s2 read row 1 before it saw s1's write)
Serialization is defined over what each transaction observed ("SSI is
based on the observation"), and whether those observations can be
arranged into one consistent order. I think a rolled-back read still
matters because we can't roll back the fact that the transaction
observed state.
Also, if we try to use the rolled back read for something later, like this:
SAVEPOINT s;
SELECT (balance >= 100) AS do_bonus FROM accounts WHERE id = 1 \gset
ROLLBACK TO SAVEPOINT s;
\if :do_bonus
UPDATE accounts SET bonus = bonus + 10 WHERE id = 2;
\endif
COMMIT;
it will abort on master, the questionable behavior only happens if the
read is unused. Based on this, I think even an unused read should
correctly abort for consistency.
I think we can also argue for this based on the point about locking
from README-SSI:
"Because reads in a subtransaction may cause that subtransaction
to roll back, thereby affecting what is written by the top level
transaction, predicate locks must survive a subtransaction rollback."
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: Possible G2-item at SERIALIZABLE
@ 2026-06-17 17:39 Andrey Borodin <[email protected]>
parent: Zsolt Parragi <[email protected]>
0 siblings, 0 replies; 10+ messages in thread
From: Andrey Borodin @ 2026-06-17 17:39 UTC (permalink / raw)
To: Zsolt Parragi <[email protected]>; +Cc: [email protected]
> On 3 Jun 2026, at 04:03, Zsolt Parragi <[email protected]> wrote:
>
> even an unused read should
> correctly abort for consistency
Yes, unused read is still a read. That makes sense.
Let's work towards fixes. I agree with your idea about introducing helpers
and tracking all cases where we need to DooM. Would you like to propose a
next patch version where you fix all know cases?
Best regards, Andrey Borodin.
^ permalink raw reply [nested|flat] 10+ messages in thread
end of thread, other threads:[~2026-06-17 17:39 UTC | newest]
Thread overview: 10+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-22 16:44 Possible G2-item at SERIALIZABLE Kyle Kingsbury <[email protected]>
2026-05-30 18:47 ` Andrey Borodin <[email protected]>
2026-06-01 22:00 ` Zsolt Parragi <[email protected]>
2026-06-02 08:18 ` Andrey Borodin <[email protected]>
2026-06-02 23:03 ` Zsolt Parragi <[email protected]>
2026-06-17 17:39 ` Andrey Borodin <[email protected]>
2026-05-31 13:19 ` Andrey Borodin <[email protected]>
2026-06-01 01:49 ` Kyle Kingsbury <[email protected]>
2026-06-01 06:18 ` Andrey Borodin <[email protected]>
2026-06-01 10:11 ` Andrey Borodin <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox