public inbox for [email protected]  
help / color / mirror / Atom feed
Possible premature SNAPBUILD_CONSISTENT with DB-specific running_xacts
2+ messages / 2 participants
[nested] [flat]

* Possible premature SNAPBUILD_CONSISTENT with DB-specific running_xacts
@ 2026-04-19 18:58 SATYANARAYANA NARLAPURAM <[email protected]>
  2026-04-20 06:04 ` Re: Possible premature SNAPBUILD_CONSISTENT with DB-specific running_xacts Amit Kapila <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-04-19 18:58 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; Álvaro Herrera <[email protected]>; Antonin Houska <[email protected]>

Hi hackers,

A cluster-wide decoder must never have its snapshot-builder state changed
by a database-specific running_xacts record. Adding a check to return it
early.
I think otherwise a cluster wide decoder can potentially go to
SNAPSHOT_CONSISTENT state immediately even though transactions older
than nextXid are still in progress on a different DB (not tracked by
running_xact
record). This race is now possible with cluster wide decoders and Repack
concurrently run.


Attached a patch to fix this. Thoughts?

Thanks
Satya


Attachments:

  [application/octet-stream] v1-snapbuild-only.patch (1.3K, 3-v1-snapbuild-only.patch)
  download | inline diff:
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index c8309b9..8953647 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1157,6 +1157,21 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	ReorderBufferTXN *txn;
 	TransactionId xmin;
 
+	/*
+	 * A database-specific xl_running_xacts record only describes XIDs of
+	 * transactions running in running->dbid; XIDs of transactions in other
+	 * databases (including possibly our own) are missing from xids[] and not
+	 * accounted for in oldestRunningXid/nextXid.  Such a record may only be
+	 * consumed by a decoder that itself opted out of cluster-wide tracking
+	 * (db_specific == true).  Otherwise we could mark the snapshot
+	 * SNAPBUILD_CONSISTENT while transactions older than running->nextXid are
+	 * still in progress in another database, causing their later commits to
+	 * be silently dropped from the decoded change stream (data loss in
+	 * downstream subscribers).
+	 */
+	if (!db_specific && OidIsValid(running->dbid))
+		return;
+
 	/*
 	 * If we're not consistent yet, inspect the record to see whether it
 	 * allows to get closer to being consistent. If we are consistent, dump


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Possible premature SNAPBUILD_CONSISTENT with DB-specific running_xacts
  2026-04-19 18:58 Possible premature SNAPBUILD_CONSISTENT with DB-specific running_xacts SATYANARAYANA NARLAPURAM <[email protected]>
@ 2026-04-20 06:04 ` Amit Kapila <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Amit Kapila @ 2026-04-20 06:04 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Álvaro Herrera <[email protected]>; Antonin Houska <[email protected]>

On Mon, Apr 20, 2026 at 12:29 AM SATYANARAYANA NARLAPURAM
<[email protected]> wrote:
>
> A cluster-wide decoder must never have its snapshot-builder state changed
> by a database-specific running_xacts record. Adding a check to return it early.
> I think otherwise a cluster wide decoder can potentially go to
> SNAPSHOT_CONSISTENT state immediately even though transactions older
> than nextXid are still in progress on a different DB (not tracked by running_xact
> record). This race is now possible with cluster wide decoders and Repack
> concurrently run.
>

I think this has been discussed previously, see [1]. As per my
understanding, we are not in agreement for the need of this
db-specific handling in the snapbuilder, see [2][3]. So, adding more
improvements/fixes on top of it doesn't sound advisable.

[1] - https://www.postgresql.org/message-id/CAA4eK1KWDbBk4FgbbWdivQLrPPzR4zgvfnHK4WjWC78rbuRVbg%40mail.gma...
[2] - https://www.postgresql.org/message-id/cdgw4sbbfcgk6du3iv54r2dgiy4tfywoklbotlmj4irxavdcr3%40glxfw5jj2...
[3] - https://www.postgresql.org/message-id/pveffyxhnuurhb44uzqlwo3rkyzorkfh2rot7uwzlf2axhfvbp%407nrs2omys...

-- 
With Regards,
Amit Kapila.





^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2026-04-20 06:04 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-19 18:58 Possible premature SNAPBUILD_CONSISTENT with DB-specific running_xacts SATYANARAYANA NARLAPURAM <[email protected]>
2026-04-20 06:04 ` Amit Kapila <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox