public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Buffer locking is special (hints, checksums, AIO writes)
5+ messages / 2 participants
[nested] [flat]

* Re: Buffer locking is special (hints, checksums, AIO writes)
@ 2026-02-09 22:16  Andres Freund <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Andres Freund @ 2026-02-09 22:16 UTC (permalink / raw)
  To: Antonin Houska <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Heikki Linnakangas <[email protected]>; Melanie Plageman <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers; Thomas Munro <[email protected]>; Noah Misch <[email protected]>; Robert Haas <[email protected]>; Michael Paquier <[email protected]>

Hi,

On 2026-02-09 12:42:25 +0100, Antonin Houska wrote:
> Andres Freund <[email protected]> wrote:
> 
> > On 2026-01-12 12:45:03 -0500, Andres Freund wrote:
> > > I'm doing another pass through 0003 and will push that if I don't find
> > > anything significant.
> > 
> > Done, after adjust two comments in minor ways.
> 
> I suppose this is commit 0b96e734c590.
> 
> While troubleshooting REPACK issue [1], I realized that
> HeapTupleSatisfiesMVCCBatch() can also be called during logical decoding - in
> that case we need to use a historic MVCC snapshot.

Huh. Indeed. That's unintentional - the path should never have been reached,
we are checking that an MVCC snapshot is used. Unfortunately, somebody
(i.e. probably me) at some point defined the relevant macro as

/* This macro encodes the knowledge of which snapshots are MVCC-safe */
#define IsMVCCSnapshot(snapshot)  \
	((snapshot)->snapshot_type == SNAPSHOT_MVCC || \
	 (snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)

Which makes sense for some places, but not for plenty others.

The reason this didn't cause more widespread issues is that during logical
decoding we mostly don't use sequential scans etc that are affected by the
these paths.


> My proposal to fix the problem is attached.

That's imo not at all the right fix - it'd make visibility during seqscans
checking noticeably slower.


I think we ought to instead restrict the page-at-a-time scans to only happen
with "real" mvcc snapshots. I.e. this:

	/*
	 * Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
	 */
	if (!(snapshot && IsMVCCSnapshot(snapshot)))
		scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;

should trigger for historic snapshots as well.


Does that fix the issue for you?


What's your reproducer?


Greetings,

Andres Freund






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Buffer locking is special (hints, checksums, AIO writes)
@ 2026-02-10 07:46  Antonin Houska <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Antonin Houska @ 2026-02-10 07:46 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Heikki Linnakangas <[email protected]>; Melanie Plageman <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers; Thomas Munro <[email protected]>; Noah Misch <[email protected]>; Robert Haas <[email protected]>; Michael Paquier <[email protected]>

Andres Freund <[email protected]> wrote:

> > While troubleshooting REPACK issue [1], I realized that
> > HeapTupleSatisfiesMVCCBatch() can also be called during logical decoding - in
> > that case we need to use a historic MVCC snapshot.
> 
> Huh. Indeed. That's unintentional - the path should never have been reached,
> we are checking that an MVCC snapshot is used. Unfortunately, somebody
> (i.e. probably me) at some point defined the relevant macro as
> 
> /* This macro encodes the knowledge of which snapshots are MVCC-safe */
> #define IsMVCCSnapshot(snapshot)  \
> 	((snapshot)->snapshot_type == SNAPSHOT_MVCC || \
> 	 (snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
> 
> Which makes sense for some places, but not for plenty others.
> 
> The reason this didn't cause more widespread issues is that during logical
> decoding we mostly don't use sequential scans etc that are affected by the
> these paths.

> > My proposal to fix the problem is attached.
> 
> That's imo not at all the right fix - it'd make visibility during seqscans
> checking noticeably slower.

ok

> I think we ought to instead restrict the page-at-a-time scans to only happen
> with "real" mvcc snapshots. I.e. this:
> 
> 	/*
> 	 * Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
> 	 */
> 	if (!(snapshot && IsMVCCSnapshot(snapshot)))
> 		scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;
>
> should trigger for historic snapshots as well.

I suppose you mean changing it to

	if (!(snapshot && IsMVCCSnapshot(snapshot) &&
		  !IsHistoricMVCCSnapshot(snapshot)))
		scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;

> Does that fix the issue for you?

Yes, with this change, I don't hit the problem anymore.

> What's your reproducer?

Check out this branch

https://github.com/michail-nikolaev/postgres/tree/repack_concurrently_repro_22

and run t/008_repack_concurrently.pl in contrib/amcheck. The error we saw in
most cases was "ERROR: cache lookup failed for relation". I noticed that the
related pg_class entries had hint bits set incorrectly, so I added the
following to see when exactly it happens (actually I used lower elevel first,
to find out that the decoding worker is responsible for the problem):

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 75ae268d753..ebf38460873 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -73,6 +73,7 @@
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "commands/cluster.h"
 #include "storage/bufmgr.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -938,6 +939,8 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
                                                HeapTupleHeaderGetRawXmin(tuple));
                else
                {
+                       if (am_decoding_for_repack())
+                               elog(PANIC, "HEAP_XMIN_INVALID set");
                        /* it must have aborted or crashed */
                        SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
                                                InvalidTransactionId);

The backtrace looked like:

    #3  0x0000000000c367ad errfinish (postgres + 0x8367ad)
    #4  0x0000000000517ee1 HeapTupleSatisfiesMVCC (postgres + 0x117ee1)
    #5  0x0000000000518e14 HeapTupleSatisfiesMVCCBatch (postgres + 0x118e14)
    #6  0x000000000050483e page_collect_tuples (postgres + 0x10483e)
    #7  0x0000000000504a8e heap_prepare_pagescan (postgres + 0x104a8e)
    #8  0x0000000000505344 heapgettup_pagemode (postgres + 0x105344)
    #9  0x0000000000505cff heap_getnextslot (postgres + 0x105cff)
    #10 0x000000000052ca08 table_scan_getnextslot (postgres + 0x12ca08)
    #11 0x000000000052d4ed systable_getnext (postgres + 0x12d4ed)
    #12 0x0000000000c2098e ScanPgRelation (postgres + 0x82098e)
    #13 0x0000000000c23aa2 RelationReloadIndexInfo (postgres + 0x823aa2)
    #14 0x0000000000c24376 RelationRebuildRelation (postgres + 0x824376)
    #15 0x0000000000c23784 RelationIdGetRelation (postgres + 0x823784)
    #16 0x00000000004b558f relation_open (postgres + 0xb558f)
    #17 0x000000000052e073 index_open (postgres + 0x12e073)
    #18 0x000000000052d0c5 systable_beginscan (postgres + 0x12d0c5)
    #19 0x0000000000c2097e ScanPgRelation (postgres + 0x82097e)
    #20 0x0000000000c23dde RelationReloadNailed (postgres + 0x823dde)
    #21 0x0000000000c24399 RelationRebuildRelation (postgres + 0x824399)
    #22 0x0000000000c23784 RelationIdGetRelation (postgres + 0x823784)
    #23 0x00000000004b558f relation_open (postgres + 0xb558f)
    #24 0x0000000000573ab1 table_open (postgres + 0x173ab1)
    #25 0x0000000000c20919 ScanPgRelation (postgres + 0x820919)
    #26 0x0000000000c21c98 RelationBuildDesc (postgres + 0x821c98)
    #27 0x0000000000c237d6 RelationIdGetRelation (postgres + 0x8237d6)
    #28 0x00000000009a028a ReorderBufferProcessTXN (postgres + 0x5a028a)
    #29 0x00000000009a10ed ReorderBufferReplay (postgres + 0x5a10ed)
    #30 0x00000000009a116b ReorderBufferCommit (postgres + 0x5a116b)
    #31 0x000000000098c7fe DecodeCommit (postgres + 0x58c7fe)
    #32 0x000000000098ba67 xact_decode (postgres + 0x58ba67)
    #33 0x000000000098b685 LogicalDecodingProcessRecord (postgres + 0x58b685)
    #34 0x00000000006a3e4b decode_concurrent_changes (postgres + 0x2a3e4b)
    #35 0x00000000006a5ea3 repack_worker_internal (postgres + 0x2a5ea3)
    #36 0x00000000006a5d5e RepackWorkerMain (postgres + 0x2a5d5e)
    #37 0x0000000000961b2b BackgroundWorkerMain (postgres + 0x561b2b)
    #38 0x0000000000964a61 postmaster_child_launch (postgres + 0x564a61)
    #39 0x000000000096b58d StartBackgroundWorker (postgres + 0x56b58d)
    #40 0x000000000096b80a maybe_start_bgworkers (postgres + 0x56b80a)
    #41 0x000000000096a5d9 LaunchMissingBackgroundProcesses (postgres + 0x56a5d9)
    #42 0x00000000009683fc ServerLoop (postgres + 0x5683fc)
    #43 0x0000000000967d37 PostmasterMain (postgres + 0x567d37)
    #44 0x0000000000813421 main (postgres + 0x413421)

Thanks.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Buffer locking is special (hints, checksums, AIO writes)
@ 2026-02-10 16:49  Andres Freund <[email protected]>
  parent: Antonin Houska <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Andres Freund @ 2026-02-10 16:49 UTC (permalink / raw)
  To: Antonin Houska <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Heikki Linnakangas <[email protected]>; Melanie Plageman <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers; Thomas Munro <[email protected]>; Noah Misch <[email protected]>; Robert Haas <[email protected]>; Michael Paquier <[email protected]>

Hi,

On 2026-02-10 08:46:27 +0100, Antonin Houska wrote:
> Andres Freund <[email protected]> wrote:
> > I think we ought to instead restrict the page-at-a-time scans to only happen
> > with "real" mvcc snapshots. I.e. this:
> > 
> > 	/*
> > 	 * Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
> > 	 */
> > 	if (!(snapshot && IsMVCCSnapshot(snapshot)))
> > 		scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;
> >
> > should trigger for historic snapshots as well.
> 
> I suppose you mean changing it to
> 
> 	if (!(snapshot && IsMVCCSnapshot(snapshot) &&
> 		  !IsHistoricMVCCSnapshot(snapshot)))
> 		scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;

Yes.

For something committable, I think we should probably split IsMVCCSnapshot
into IsMVCCSnapshot(), just accepting SNAPSHOT_MVCC, and IsMVCCLikeSnapshot()
accepting both SNAPSHOT_MVCC and SNAPSHOT_HISTORIC_MVCC. And then go through
all the existing callers of IsMVCCSnapshot() - only about half should stay
as-is, I think.


> > Does that fix the issue for you?
> 
> Yes, with this change, I don't hit the problem anymore.

Great!

Greetings,

Andres Freund






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Buffer locking is special (hints, checksums, AIO writes)
@ 2026-02-12 10:36  Antonin Houska <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Antonin Houska @ 2026-02-12 10:36 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Heikki Linnakangas <[email protected]>; Melanie Plageman <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers; Thomas Munro <[email protected]>; Noah Misch <[email protected]>; Robert Haas <[email protected]>; Michael Paquier <[email protected]>

Andres Freund <[email protected]> wrote:

> For something committable, I think we should probably split IsMVCCSnapshot
> into IsMVCCSnapshot(), just accepting SNAPSHOT_MVCC, and IsMVCCLikeSnapshot()
> accepting both SNAPSHOT_MVCC and SNAPSHOT_HISTORIC_MVCC. And then go through
> all the existing callers of IsMVCCSnapshot() - only about half should stay
> as-is, I think.

The attached patch tries to do that.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Attachments:

  [text/x-diff] 0001-Refine-checking-of-snapshot-type.patch (3.3K, 2-0001-Refine-checking-of-snapshot-type.patch)
  download | inline diff:
From dcdbaf3095e632a1f7f65f3abc43eccff0249d4c Mon Sep 17 00:00:00 2001
From: Antonin Houska <[email protected]>
Date: Thu, 12 Feb 2026 11:14:00 +0100
Subject: [PATCH] Refine checking of snapshot type.

It appears to be confusing if IsMVCCSnapshot() evaluates to true for both
"regular" and "historic" MVCC snapshot. This patch restricts the meaning of
the macro to the "regular" MVCC snapshot, and introduces a new macro
IsMVCCLikeSnapshot() to recognize both types.

IsMVCCLikeSnapshot() is only used in functions that can (supposedly) be called
during logical decoding.
---
 src/backend/access/heap/heapam_handler.c | 2 +-
 src/backend/access/index/indexam.c       | 2 +-
 src/backend/access/nbtree/nbtree.c       | 2 +-
 src/include/utils/snapmgr.h              | 6 ++++--
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cbef73e5d4b..332c788bab2 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -159,7 +159,7 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Only in a non-MVCC snapshot can more than one member of the HOT
 		 * chain be visible.
 		 */
-		*call_again = !IsMVCCSnapshot(snapshot);
+		*call_again = !IsMVCCLikeSnapshot(snapshot);
 
 		slot->tts_tableOid = RelationGetRelid(scan->rel);
 		ExecStoreBufferHeapTuple(&bslot->base.tupdata, slot, hscan->xs_cbuf);
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 4ed0508c605..80623350d6f 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -445,7 +445,7 @@ index_markpos(IndexScanDesc scan)
 void
 index_restrpos(IndexScanDesc scan)
 {
-	Assert(IsMVCCSnapshot(scan->xs_snapshot));
+	Assert(IsMVCCLikeSnapshot(scan->xs_snapshot));
 
 	SCAN_CHECKS;
 	CHECK_SCAN_PROCEDURE(amrestrpos);
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 3dec1ee657d..07ba5997fbc 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -422,7 +422,7 @@ btrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	 * Note: so->dropPin should never change across rescans.
 	 */
 	so->dropPin = (!scan->xs_want_itup &&
-				   IsMVCCSnapshot(scan->xs_snapshot) &&
+				   IsMVCCLikeSnapshot(scan->xs_snapshot) &&
 				   RelationNeedsWAL(scan->indexRelation) &&
 				   scan->heapRelation != NULL);
 
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b8c01a291a1..dd5aaae6953 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -53,12 +53,14 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
 
 /* This macro encodes the knowledge of which snapshots are MVCC-safe */
 #define IsMVCCSnapshot(snapshot)  \
-	((snapshot)->snapshot_type == SNAPSHOT_MVCC || \
-	 (snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
+	((snapshot)->snapshot_type == SNAPSHOT_MVCC)
 
 #define IsHistoricMVCCSnapshot(snapshot)  \
 	((snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
 
+#define IsMVCCLikeSnapshot(snapshot)  \
+	(IsMVCCSnapshot(snapshot) || IsHistoricMVCCSnapshot(snapshot))
+
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Buffer locking is special (hints, checksums, AIO writes)
@ 2026-03-11 23:09  Andres Freund <[email protected]>
  parent: Antonin Houska <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Andres Freund @ 2026-03-11 23:09 UTC (permalink / raw)
  To: Antonin Houska <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Heikki Linnakangas <[email protected]>; Melanie Plageman <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers; Thomas Munro <[email protected]>; Noah Misch <[email protected]>; Robert Haas <[email protected]>; Michael Paquier <[email protected]>

Hi,

On 2026-02-12 11:36:08 +0100, Antonin Houska wrote:
> Andres Freund <[email protected]> wrote:
>
> > For something committable, I think we should probably split IsMVCCSnapshot
> > into IsMVCCSnapshot(), just accepting SNAPSHOT_MVCC, and IsMVCCLikeSnapshot()
> > accepting both SNAPSHOT_MVCC and SNAPSHOT_HISTORIC_MVCC. And then go through
> > all the existing callers of IsMVCCSnapshot() - only about half should stay
> > as-is, I think.
>
> The attached patch tries to do that.

Thanks!


> From dcdbaf3095e632a1f7f65f3abc43eccff0249d4c Mon Sep 17 00:00:00 2001
> From: Antonin Houska <[email protected]>
> Date: Thu, 12 Feb 2026 11:14:00 +0100
> Subject: [PATCH] Refine checking of snapshot type.
>
> It appears to be confusing if IsMVCCSnapshot() evaluates to true for both
> "regular" and "historic" MVCC snapshot. This patch restricts the meaning of
> the macro to the "regular" MVCC snapshot, and introduces a new macro
> IsMVCCLikeSnapshot() to recognize both types.
>
> IsMVCCLikeSnapshot() is only used in functions that can (supposedly) be called
> during logical decoding.

I think I agree with where you selected IsMVCCSnapshot() and where you
selected IsMVCCLikeSnapshot().


> diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
> index b8c01a291a1..dd5aaae6953 100644
> --- a/src/include/utils/snapmgr.h
> +++ b/src/include/utils/snapmgr.h
> @@ -53,12 +53,14 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
>
>  /* This macro encodes the knowledge of which snapshots are MVCC-safe */
>  #define IsMVCCSnapshot(snapshot)  \
> -	((snapshot)->snapshot_type == SNAPSHOT_MVCC || \
> -	 (snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
> +	((snapshot)->snapshot_type == SNAPSHOT_MVCC)
>
>  #define IsHistoricMVCCSnapshot(snapshot)  \
>  	((snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
>
> +#define IsMVCCLikeSnapshot(snapshot)  \
> +	(IsMVCCSnapshot(snapshot) || IsHistoricMVCCSnapshot(snapshot))
> +
>  extern Snapshot GetTransactionSnapshot(void);
>  extern Snapshot GetLatestSnapshot(void);
>  extern void SnapshotSetCommandId(CommandId curcid);

Probably need to update the comments a bit.  What about something like


/*
 * Is the snapshot implemented as an MVCC snapshot (i.e. it uses
 * SNAPSHOT_MVCC).  If so, there will be at most be one visible row in a chain
 * of updated tuples, and each visible tuple will be seen exactly once.
 */
#define IsMVCCSnapshot(snapshot)  \
...

/*
 * Is the snapshot either an MVCC snapshot or has equivalent visibility
 * semantics (see IsMVCCSnapshot()).
 */
#define IsMVCCLikeSnapshot(snapshot)  \


Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 5+ messages in thread


end of thread, other threads:[~2026-03-11 23:09 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-09 22:16 Re: Buffer locking is special (hints, checksums, AIO writes) Andres Freund <[email protected]>
2026-02-10 07:46 ` Antonin Houska <[email protected]>
2026-02-10 16:49   ` Andres Freund <[email protected]>
2026-02-12 10:36     ` Antonin Houska <[email protected]>
2026-03-11 23:09       ` Andres Freund <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox