public inbox for [email protected]  
help / color / mirror / Atom feed
From: Matthias van de Meent <[email protected]>
To: Álvaro Herrera <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Michail Nikolaev <[email protected]>
Subject: Re: [SP-]GiST IOS visibility bug (was: Why doens't GiST require super-exclusive lock)
Date: Fri, 5 Jun 2026 10:28:12 +0200
Message-ID: <CAEze2WiQ61UjWbqc5d9x3pTWWw24MsW2g7Be-+o4io1fKSQuVg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAEze2WgH13m=MDST58KLo-NkZpbwBEt4xNWcgtghWBwRj3J0+A@mail.gmail.com>
	<[email protected]>

On Tue, 3 Feb 2026 at 23:39, Álvaro Herrera <[email protected]> wrote:
>
> On 2025-Apr-25, Matthias van de Meent wrote:
>
> > Hi,
> >
> > In the original thread [0] we established that GIST and SP-GiST
> > support index-only scans, but don't have sufficient interlocking to
> > prevent dead tuples from being revived for their scan by vacuum.
>
> Hello,
>
> This commitfest entry
> https://commitfest.postgresql.org/patch/5721/
> is stale.  I would have brought it forward because it's tagged as a bug
> fix, but the patches look a bit intimidating to consider it a
> backpatchable bug fix.  So, what's the status here?

Sorry for the delay. I've finally gotten back to this patch, as it's
the simpler option vs the patch in the original thread, even if it may
have a higher performance impact for GIST and SP-GIST IOS scans.

Attached is v02, which is a simple rebase with some language and
wrapping changes in comments, as well as some fixes for assert-enabled
builds.


Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)


Attachments:

  [application/octet-stream] v02-0002-GIST-Fix-visibility-issues-in-IOS.patch (6.5K, 2-v02-0002-GIST-Fix-visibility-issues-in-IOS.patch)
  download | inline diff:
From dec0eeb84b7780c4de74176ba25c305134e7b439 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <[email protected]>
Date: Fri, 25 Apr 2025 16:14:26 +0200
Subject: [PATCH v02 2/4] GIST: Fix visibility issues in IOS

Previously, GIST IOS could buffer tuples from pages while VACUUM came
along and cleaned up an ALL_DEAD tuple, marking the tuple's page
ALL_VISIBLE again and making IOS mistakenly believe the tuple is indeed
visible.

With this commit, buffer pins now conflict with GIST vacuum, and we now
do visibility checks on heap items returned from index pages while we
hold the pin, so that the IOS infrastructure knows to recheck the heap
page even if that heap page has become ALL_VISIBLE after we released
the index page.

Idea from Heikki Linnakangas

Backpatch-through: 14
---
 src/backend/access/gist/gistget.c    | 36 ++++++++++++++++++++++++----
 src/backend/access/gist/gistscan.c   | 14 +++++++++++
 src/backend/access/gist/gistvacuum.c |  6 ++---
 src/include/access/gist_private.h    |  5 ++++
 4 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 4d7c100d737..20a9cc22409 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -471,8 +471,9 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
 			so->pageData[so->nPageData].offnum = i;
 
 			/*
-			 * In an index-only scan, also fetch the data from the tuple.  The
-			 * reconstructed tuples are stored in pageDataCxt.
+			 * In an index-only scan, also fetch the data from the tuple and
+			 * its visibility state.  The reconstructed tuples are stored in
+			 * pageDataCxt.
 			 */
 			if (scan->xs_want_itup)
 			{
@@ -480,6 +481,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
 				so->pageData[so->nPageData].recontup =
 					gistFetchTuple(giststate, r, it);
 				MemoryContextSwitchTo(oldcxt);
+
+				so->pageData[so->nPageData].visrecheck =
+					table_index_vischeck_tuple(scan->heapRelation,
+											   &so->vmbuf, &it->t_tid);
 			}
 			so->nPageData++;
 		}
@@ -507,10 +512,16 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
 				item->data.heap.recheckDistances = recheck_distances;
 
 				/*
-				 * In an index-only scan, also fetch the data from the tuple.
+				 * In an index-only scan, also fetch the data from the tuple
+				 * and its visibility state.
 				 */
 				if (scan->xs_want_itup)
+				{
 					item->data.heap.recontup = gistFetchTuple(giststate, r, it);
+					item->data.heap.visrecheck =
+						table_index_vischeck_tuple(scan->heapRelation,
+												   &so->vmbuf, &it->t_tid);
+				}
 			}
 			else
 			{
@@ -595,9 +606,16 @@ getNextNearest(IndexScanDesc scan)
 												 item->distances,
 												 item->data.heap.recheckDistances);
 
-			/* in an index-only scan, also return the reconstructed tuple. */
+			/*
+			 * In an index-only scan, also return the reconstructed tuple and
+			 * the visibility check we did when we still had a pin on the
+			 * index page.
+			 */
 			if (scan->xs_want_itup)
+			{
 				scan->xs_hitup = item->data.heap.recontup;
+				scan->xs_visrecheck = item->data.heap.visrecheck;
+			}
 			res = true;
 		}
 		else
@@ -682,9 +700,17 @@ gistgettuple(IndexScanDesc scan, ScanDirection dir)
 				scan->xs_heaptid = so->pageData[so->curPageData].heapPtr;
 				scan->xs_recheck = so->pageData[so->curPageData].recheck;
 
-				/* in an index-only scan, also return the reconstructed tuple */
+				/*
+				 * In an index-only scan, also return the reconstructed tuple
+				 * and the visibility check we did when we still had a pin on
+				 * the index page.
+				 */
 				if (scan->xs_want_itup)
+				{
 					scan->xs_hitup = so->pageData[so->curPageData].recontup;
+					scan->xs_visrecheck =
+						so->pageData[so->curPageData].visrecheck;
+				}
 
 				so->curPageData++;
 
diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c
index c65f93abdae..70b593f2898 100644
--- a/src/backend/access/gist/gistscan.c
+++ b/src/backend/access/gist/gistscan.c
@@ -340,6 +340,13 @@ gistrescan(IndexScanDesc scan, ScanKey key, int nkeys,
 			pfree(fn_extras);
 	}
 
+	/* release resources used in index-only scans */
+	if (BufferIsValid(so->vmbuf))
+	{
+		ReleaseBuffer(so->vmbuf);
+		so->vmbuf = InvalidBuffer;
+	}
+
 	/* any previous xs_hitup will have been pfree'd in context resets above */
 	scan->xs_hitup = NULL;
 }
@@ -349,6 +356,13 @@ gistendscan(IndexScanDesc scan)
 {
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 
+	/* release resources used in index-only scans */
+	if (BufferIsValid(so->vmbuf))
+	{
+		ReleaseBuffer(so->vmbuf);
+		so->vmbuf = InvalidBuffer;
+	}
+
 	/*
 	 * freeGISTstate is enough to clean up everything made by gistbeginscan,
 	 * as well as the queueCxt if there is a separate context for it.
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 686a0418054..1b9acec5654 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -326,10 +326,10 @@ restart:
 	recurse_to = InvalidBlockNumber;
 
 	/*
-	 * We are not going to stay here for a long time, aggressively grab an
-	 * exclusive lock.
+	 * We are not going to stay here for a long time, aggressively grab a
+	 * cleanup lock.
 	 */
-	LockBuffer(buffer, GIST_EXCLUSIVE);
+	LockBufferForCleanup(buffer);
 	page = BufferGetPage(buffer);
 
 	if (gistPageRecyclable(page))
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 44514f1cb8d..b9002f0f0bf 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -22,6 +22,7 @@
 #include "storage/buffile.h"
 #include "utils/hsearch.h"
 #include "access/genam.h"
+#include "tableam.h"
 
 /*
  * Maximum number of "halves" a page can be split into in one operation.
@@ -124,6 +125,8 @@ typedef struct GISTSearchHeapItem
 								 * index-only scans */
 	OffsetNumber offnum;		/* track offset in page to mark tuple as
 								 * LP_DEAD */
+	uint8		visrecheck;		/* Cached visibility check result for this
+								 * heap pointer. */
 } GISTSearchHeapItem;
 
 /* Unvisited item, either index page or heap tuple */
@@ -176,6 +179,8 @@ typedef struct GISTScanOpaqueData
 	OffsetNumber curPageData;	/* next item to return */
 	MemoryContext pageDataCxt;	/* context holding the fetched tuples, for
 								 * index-only scans */
+	/* info used by Index-Only Scans */
+	Buffer		vmbuf;			/* reusable buffer for IOS' vm lookups */
 } GISTScanOpaqueData;
 
 typedef GISTScanOpaqueData *GISTScanOpaque;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v02-0001-Expose-visibility-checking-shim-for-index-usage.patch (12.8K, 3-v02-0001-Expose-visibility-checking-shim-for-index-usage.patch)
  download | inline diff:
From 065720d1e6ad31dd4b092f3ceb10c6a88200683e Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <[email protected]>
Date: Fri, 25 Apr 2025 14:57:06 +0200
Subject: [PATCH v02 1/4] Expose visibility checking shim for index usage

This will be the backbone of a fix for visibility issues in gist and
sp-gist's index-only scan code in later commits.

The issue to be solved is index scans caching tuples in memory that
are being removed by a concurrently running vacuum; to the point that
the VACUUM process has removed the tuple and marked its page as
ALL_VISIBLE by the time the IOS code returns that tuple through
amgettuple.

While the structure of IndexScanDesc is updated, this new field is
located in an alignment gap, thus making this new field safe to use.

Backpatch-through: 14
---
 src/backend/access/index/indexam.c       |  6 ++
 src/backend/executor/nodeIndexonlyscan.c | 86 ++++++++++++++++--------
 src/backend/utils/adt/selfuncs.c         | 81 ++++++++++++++--------
 src/include/access/relscan.h             |  1 +
 src/include/access/tableam.h             | 53 +++++++++++++++
 5 files changed, 169 insertions(+), 58 deletions(-)

diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 7967e939847..d4b457803d4 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -606,6 +606,12 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
 	/* XXX: we should assert that a snapshot is pushed or registered */
 	Assert(TransactionIdIsValid(RecentXmin));
 
+	/*
+	 * Reset xs_visrecheck, so we don't confuse the next tuple's visibility
+	 * state with that of the previous.
+	 */
+	scan->xs_visrecheck = TMVC_Unchecked;
+
 	/*
 	 * The AM's amgettuple proc finds the next index entry matching the scan
 	 * keys, and puts the TID into scan->xs_heaptid.  It should also set
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index d52012e8a69..50dc9682970 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -124,13 +124,16 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
 	{
 		bool		tuple_from_heap = false;
+		TMVC_Result vischeck = scandesc->xs_visrecheck;
 
 		CHECK_FOR_INTERRUPTS();
 
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
 		 * which all tuples are known visible to everybody.  In any case,
-		 * we'll use the index tuple not the heap tuple as the data source.
+		 * we'll use the index tuple not the heap tuple as the data source. We
+		 * can skip the VM lookup if the index has already issued that VM
+		 * lookup, indicated by a non-Unchecked value in xs_visrecheck.
 		 *
 		 * Note on Memory Ordering Effects: visibilitymap_get_status does not
 		 * lock the visibility map buffer, and therefore the result we read
@@ -160,37 +163,62 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 *
 		 * It's worth going through this complexity to avoid needing to lock
 		 * the VM buffer, which could cause significant contention.
+		 *
+		 * The index doing these checks for us doesn't materially change these
+		 * considerations.
 		 */
-		if (!VM_ALL_VISIBLE(scandesc->heapRelation,
-							ItemPointerGetBlockNumber(tid),
-							&node->ioss_VMBuffer))
-		{
-			/*
-			 * Rats, we have to visit the heap to check visibility.
-			 */
-			InstrCountTuples2(node, 1);
-			if (!index_fetch_heap(scandesc, node->ioss_TableSlot))
-				continue;		/* no visible tuple, try next index entry */
+		if (vischeck == TMVC_Unchecked)
+			vischeck = table_index_vischeck_tuple(scandesc->heapRelation,
+												  &node->ioss_VMBuffer,
+												  tid);
 
-			ExecClearTuple(node->ioss_TableSlot);
+		Assert(vischeck != TMVC_Unchecked);
 
-			/*
-			 * Only MVCC snapshots are supported here, so there should be no
-			 * need to keep following the HOT chain once a visible entry has
-			 * been found.  If we did want to allow that, we'd need to keep
-			 * more state to remember not to call index_getnext_tid next time.
-			 */
-			if (scandesc->xs_heap_continue)
-				elog(ERROR, "non-MVCC snapshots are not supported in index-only scans");
-
-			/*
-			 * Note: at this point we are holding a pin on the heap page, as
-			 * recorded in scandesc->xs_cbuf.  We could release that pin now,
-			 * but it's not clear whether it's a win to do so.  The next index
-			 * entry might require a visit to the same heap page.
-			 */
-
-			tuple_from_heap = true;
+		switch (vischeck)
+		{
+			case TMVC_Unchecked:
+				elog(ERROR, "Failed to check visibility for tuple");
+
+				/*
+				 * In case of compilers that don't undertand that elog(ERROR)
+				 * doens't exit, but which do have a functional
+				 * -Wimplicit-fallthrough warning:
+				 */
+				/* fallthrough */
+			case TMVC_MaybeVisible:
+				{
+					/*
+					 * Rats, we have to visit the heap to check visibility.
+					 */
+					InstrCountTuples2(node, 1);
+					if (!index_fetch_heap(scandesc, node->ioss_TableSlot))
+						continue;	/* no visible tuple, try next index entry */
+
+					ExecClearTuple(node->ioss_TableSlot);
+
+					/*
+					 * Only MVCC snapshots are supported here, so there should
+					 * be no need to keep following the HOT chain once a
+					 * visible entry has been found.  If we did want to allow
+					 * that, we'd need to keep more state to remember not to
+					 * call index_getnext_tid next time.
+					 */
+					if (scandesc->xs_heap_continue)
+						elog(ERROR, "non-MVCC snapshots are not supported in index-only scans");
+
+					/*
+					 * Note: at this point we are holding a pin on the heap
+					 * page, as recorded in scandesc->xs_cbuf.  We could
+					 * release that pin now, but it's not clear whether it's a
+					 * win to do so. The next index entry might require a
+					 * visit to the same heap page.
+					 */
+
+					tuple_from_heap = true;
+					break;
+				}
+			case TMVC_Visible:
+				break;
 		}
 
 		/*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b9449b4574a..c30a22e5640 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7195,49 +7195,71 @@ get_actual_variable_endpoint(Relation heapRel,
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
+	index_scan->xs_visrecheck = TMVC_Unchecked;
 
 	/* Fetch first/next tuple in specified direction */
 	while ((tid = index_getnext_tid(index_scan, indexscandir)) != NULL)
 	{
 		BlockNumber block = ItemPointerGetBlockNumber(tid);
+		TMVC_Result visres = index_scan->xs_visrecheck;
 
-		if (!VM_ALL_VISIBLE(heapRel,
-							block,
-							&vmbuffer))
+		if (visres == TMVC_Unchecked)
+			visres = table_index_vischeck_tuple(heapRel, &vmbuffer, tid);
+
+		Assert(visres != TMVC_Unchecked);
+
+		switch (visres)
 		{
-			/* Rats, we have to visit the heap to check visibility */
-			if (!index_fetch_heap(index_scan, tableslot))
-			{
+			case TMVC_Unchecked:
+				elog(ERROR, "Failed to check visibility for tuple");
+
 				/*
-				 * No visible tuple for this index entry, so we need to
-				 * advance to the next entry.  Before doing so, count heap
-				 * page fetches and give up if we've done too many.
-				 *
-				 * We don't charge a page fetch if this is the same heap page
-				 * as the previous tuple.  This is on the conservative side,
-				 * since other recently-accessed pages are probably still in
-				 * buffers too; but it's good enough for this heuristic.
+				 * In case of compilers that don't undertand that elog(ERROR)
+				 * doens't exit, and which have -Wimplicit-fallthrough:
 				 */
+				/* fallthrough */
+			case TMVC_MaybeVisible:
+				{
+					/* Rats, we have to visit the heap to check visibility */
+					if (!index_fetch_heap(index_scan, tableslot))
+					{
+						/*
+						 * No visible tuple for this index entry, so we need
+						 * to advance to the next entry.  Before doing so,
+						 * count heap page fetches and give up if we've done
+						 * too many.
+						 *
+						 * We don't charge a page fetch if this is the same
+						 * heap page as the previous tuple.  This is on the
+						 * conservative side, since other recently-accessed
+						 * pages are probably still in buffers too; but it's
+						 * good enough for this heuristic.
+						 */
 #define VISITED_PAGES_LIMIT 100
 
-				if (block != last_heap_block)
-				{
-					last_heap_block = block;
-					n_visited_heap_pages++;
-					if (n_visited_heap_pages > VISITED_PAGES_LIMIT)
-						break;
-				}
+						if (block != last_heap_block)
+						{
+							last_heap_block = block;
+							n_visited_heap_pages++;
+							if (n_visited_heap_pages > VISITED_PAGES_LIMIT)
+								goto exit_loop;
+						}
 
-				continue;		/* no visible tuple, try next index entry */
-			}
+						continue;	/* no visible tuple, try next index entry */
+					}
 
-			/* We don't actually need the heap tuple for anything */
-			ExecClearTuple(tableslot);
+					/* We don't actually need the heap tuple for anything */
+					ExecClearTuple(tableslot);
 
-			/*
-			 * We don't care whether there's more than one visible tuple in
-			 * the HOT chain; if any are visible, that's good enough.
-			 */
+					/*
+					 * We don't care whether there's more than one visible
+					 * tuple in the HOT chain; if any are visible, that's good
+					 * enough.
+					 */
+					break;
+				}
+			case TMVC_Visible:
+				break;
 		}
 
 		/*
@@ -7270,6 +7292,7 @@ get_actual_variable_endpoint(Relation heapRel,
 		have_data = true;
 		break;
 	}
+exit_loop:
 
 	if (vmbuffer != InvalidBuffer)
 		ReleaseBuffer(vmbuffer);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 2ea06a67a63..a2c96111617 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -188,6 +188,7 @@ typedef struct IndexScanDescData
 	IndexFetchTableData *xs_heapfetch;
 
 	bool		xs_recheck;		/* T means scan keys must be rechecked */
+	uint8		xs_visrecheck;	/* TMVC_Result; see tableam.h */
 
 	/*
 	 * When fetching with an ordering operator, the values of the ORDER BY
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f2c36696bca..77a91df22e3 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -24,6 +24,7 @@
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
+#include "visibilitymap.h"
 
 
 #define DEFAULT_TABLE_ACCESS_METHOD	"heap"
@@ -276,6 +277,28 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/*
+ * Output of table_index_vischeck_tuple()
+ *
+ * Index-only scans need to know the visibility of the associated table tuples
+ * before they can return the index tuple.  If the index tuple is known to be
+ * visible with a cheap check, we can return it directly without requesting
+ * the visibility info from the table AM directly.
+ *
+ * This AM API exposes a cheap visibility checking API to indexes, allowing
+ * these indexes to check multiple tuples worth of visibility info at once,
+ * and allowing the AM to store these checks, improving the pinning ergonomics
+ * of index AMs by allowing a scan to cache index tuples in memory without
+ * holding pins on those index tuples' pages until the index tuples were
+ * returned.
+ */
+typedef enum TMVC_Result
+{
+	TMVC_Unchecked,
+	TMVC_MaybeVisible,
+	TMVC_Visible,
+}			TMVC_Result;
+
 /*
  * "options" flag bits for table_tuple_insert.  Access methods may define
  * their own bits for internal use, as long as they don't collide with these.
@@ -1414,6 +1437,36 @@ table_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate)
 	return rel->rd_tableam->index_delete_tuples(rel, delstate);
 }
 
+/*
+ * Determine rough visibility information of index tuples based on each TID.
+ *
+ * Determines which entries from index AM caller's TM_IndexVisibilityCheckOp
+ * state point to TMVC_VISIBLE or TMVC_MAYBE_VISIBLE table tuples, at low IO
+ * overhead.  For the heap AM, the implementation is effectively a wrapper
+ * around VM_ALL_FROZEN.
+ *
+ * On return, all TM_VisChecks indicated by checkop->checktids will have been
+ * updated with the correct visibility status.
+ *
+ * Note that there is no value for "definitely dead" tuples, as the Heap AM
+ * doesn't have an efficient method to determine that a tuple is dead to all
+ * users, as it would have to go into the heap.  If and when AMs are built
+ * that would support VM checks with an equivalent to VM_ALL_DEAD this
+ * decision can be reconsidered.
+ */
+static inline TMVC_Result
+table_index_vischeck_tuple(Relation rel, Buffer *vmbuffer, ItemPointer tid)
+{
+	BlockNumber blkno = ItemPointerGetBlockNumberNoCheck(tid);
+	TMVC_Result res;
+
+	if (VM_ALL_VISIBLE(rel, blkno, vmbuffer))
+		res = TMVC_Visible;
+	else
+		res = TMVC_MaybeVisible;
+
+	return res;
+}
 
 /* ----------------------------------------------------------------------------
  *  Functions for manipulations of physical tuples.
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v02-0003-SP-GIST-Fix-visibility-issues-in-IOS.patch (10.4K, 4-v02-0003-SP-GIST-Fix-visibility-issues-in-IOS.patch)
  download | inline diff:
From 9c3cdecefb981bf3dca3a3e57f7f02090157362b Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <[email protected]>
Date: Fri, 25 Apr 2025 17:29:42 +0200
Subject: [PATCH v02 3/4] SP-GIST: Fix visibility issues in IOS

Previously, SP-GIST IOS could buffer tuples from pages while VACUUM came
along and cleaned up an ALL_DEAD tuple, marking the tuple's page
ALL_VISIBLE again and making IOS mistakenly believe the tuple is indeed
visible.

With this commit, buffer pins now conflict with SP-GIST vacuum, and we now
do visibility checks on heap items returned from index pages while we
hold the pin, so that the IOS infrastructure knows to recheck the heap
page even if that heap page has become ALL_VISIBLE after we released the
index page.

Idea from Heikki Linnakangas

Backpatch-through: 14
---
 src/backend/access/spgist/spgscan.c   | 73 ++++++++++++++++++++++-----
 src/backend/access/spgist/spgvacuum.c |  2 +-
 src/include/access/spgist_private.h   |  5 ++
 3 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 2cc5f06f5d7..455d587dfda 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -31,7 +31,8 @@
 typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
 							   Datum leafValue, bool isNull,
 							   SpGistLeafTuple leafTuple, bool recheck,
-							   bool recheckDistances, double *distances);
+							   bool recheckDistances, double *distances,
+							   TMVC_Result vischeck);
 
 /*
  * Pairing heap comparison function for the SpGistSearchItem queue.
@@ -143,6 +144,7 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
 	startEntry->traversalValue = NULL;
 	startEntry->recheck = false;
 	startEntry->recheckDistances = false;
+	startEntry->visrecheck = TMVC_Unchecked;
 
 	spgAddSearchItemToQueue(so, startEntry);
 }
@@ -190,6 +192,11 @@ resetSpGistScanOpaque(SpGistScanOpaque so)
 
 		for (i = 0; i < so->nPtrs; i++)
 			pfree(so->reconTups[i]);
+
+		if (BufferIsValid(so->vmbuf))
+			ReleaseBuffer(so->vmbuf);
+
+		so->vmbuf = InvalidBuffer;
 	}
 	so->iPtr = so->nPtrs = 0;
 }
@@ -381,6 +388,13 @@ spgrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	if (scankey && scan->numberOfKeys > 0)
 		memcpy(scan->keyData, scankey, scan->numberOfKeys * sizeof(ScanKeyData));
 
+	/* prepare index-only scan requirements */
+	if (scan->xs_want_itup)
+	{
+		if (so->visrecheck == NULL)
+			so->visrecheck = palloc(MaxIndexTuplesPerPage);
+	}
+
 	/* initialize order-by data if needed */
 	if (orderbys && scan->numberOfOrderBys > 0)
 	{
@@ -448,6 +462,9 @@ spgendscan(IndexScanDesc scan)
 		pfree(scan->xs_orderbynulls);
 	}
 
+	if (BufferIsValid(so->vmbuf))
+		ReleaseBuffer(so->vmbuf);
+
 	pfree(so);
 }
 
@@ -497,6 +514,7 @@ spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
 	item->isLeaf = true;
 	item->recheck = recheck;
 	item->recheckDistances = recheckDistances;
+	item->visrecheck = TMVC_Unchecked;
 
 	return item;
 }
@@ -510,7 +528,8 @@ spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
 static bool
 spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
 			SpGistLeafTuple leafTuple, bool isnull,
-			bool *reportedSome, storeRes_func storeRes)
+			bool *reportedSome, storeRes_func storeRes,
+			Relation tableRel)
 {
 	Datum		leafValue;
 	double	   *distances;
@@ -566,6 +585,15 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
 
 	if (result)
 	{
+		TMVC_Result vischeck = TMVC_Unchecked;
+
+		if (so->want_itup)
+		{
+			Assert(item != NULL);
+			vischeck = table_index_vischeck_tuple(tableRel, &so->vmbuf,
+												  &item->heapPtr);
+		}
+
 		/* item passes the scankeys */
 		if (so->numberOfNonNullOrderBys > 0)
 		{
@@ -579,6 +607,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
 														isnull,
 														distances);
 
+			heapItem->visrecheck = vischeck;
+
 			spgAddSearchItemToQueue(so, heapItem);
 
 			MemoryContextSwitchTo(oldCxt);
@@ -588,7 +618,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
 			/* non-ordered scan, so report the item right away */
 			Assert(!recheckDistances);
 			storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
-					 leafTuple, recheck, false, NULL);
+					 leafTuple, recheck, false, NULL, vischeck);
 			*reportedSome = true;
 		}
 	}
@@ -760,7 +790,8 @@ spgTestLeafTuple(SpGistScanOpaque so,
 				 Page page, OffsetNumber offset,
 				 bool isnull, bool isroot,
 				 bool *reportedSome,
-				 storeRes_func storeRes)
+				 storeRes_func storeRes,
+				 Relation tablerel)
 {
 	SpGistLeafTuple leafTuple = (SpGistLeafTuple)
 		PageGetItem(page, PageGetItemId(page, offset));
@@ -796,7 +827,8 @@ spgTestLeafTuple(SpGistScanOpaque so,
 
 	Assert(ItemPointerIsValid(&leafTuple->heapPtr));
 
-	spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
+	spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes,
+				tablerel);
 
 	return SGLT_GET_NEXTOFFSET(leafTuple);
 }
@@ -809,8 +841,8 @@ spgTestLeafTuple(SpGistScanOpaque so,
  * next page boundary once we have reported at least one tuple.
  */
 static void
-spgWalk(Relation index, SpGistScanOpaque so, bool scanWholeIndex,
-		storeRes_func storeRes)
+spgWalk(Relation index, Relation table, SpGistScanOpaque so,
+		bool scanWholeIndex, storeRes_func storeRes)
 {
 	Buffer		buffer = InvalidBuffer;
 	bool		reportedSome = false;
@@ -832,7 +864,8 @@ redirect:
 			Assert(so->numberOfNonNullOrderBys > 0);
 			storeRes(so, &item->heapPtr, item->value, item->isNull,
 					 item->leafTuple, item->recheck,
-					 item->recheckDistances, item->distances);
+					 item->recheckDistances, item->distances,
+					 item->visrecheck);
 			reportedSome = true;
 		}
 		else
@@ -871,7 +904,8 @@ redirect:
 					for (offset = FirstOffsetNumber; offset <= max; offset++)
 						(void) spgTestLeafTuple(so, item, page, offset,
 												isnull, true,
-												&reportedSome, storeRes);
+												&reportedSome, storeRes,
+												table);
 				}
 				else
 				{
@@ -881,7 +915,8 @@ redirect:
 						Assert(offset >= FirstOffsetNumber && offset <= max);
 						offset = spgTestLeafTuple(so, item, page, offset,
 												  isnull, false,
-												  &reportedSome, storeRes);
+												  &reportedSome, storeRes,
+												  table);
 						if (offset == SpGistRedirectOffsetNumber)
 							goto redirect;
 					}
@@ -926,7 +961,8 @@ static void
 storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
 			Datum leafValue, bool isnull,
 			SpGistLeafTuple leafTuple, bool recheck,
-			bool recheckDistances, double *distances)
+			bool recheckDistances, double *distances,
+			TMVC_Result vischeck)
 {
 	Assert(!recheckDistances && !distances);
 	tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -944,7 +980,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 	so->tbm = tbm;
 	so->ntids = 0;
 
-	spgWalk(scan->indexRelation, so, true, storeBitmap);
+	spgWalk(scan->indexRelation, scan->heapRelation, so, true, storeBitmap);
 
 	return so->ntids;
 }
@@ -954,7 +990,8 @@ static void
 storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
 			  Datum leafValue, bool isnull,
 			  SpGistLeafTuple leafTuple, bool recheck,
-			  bool recheckDistances, double *nonNullDistances)
+			  bool recheckDistances, double *nonNullDistances,
+			  TMVC_Result vischeck)
 {
 	Assert(so->nPtrs < MaxIndexTuplesPerPage);
 	so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -1013,6 +1050,8 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
 		so->reconTups[so->nPtrs] = heap_form_tuple(so->reconTupDesc,
 												   leafDatums,
 												   leafIsnulls);
+
+		so->visrecheck[so->nPtrs] = vischeck;
 	}
 	so->nPtrs++;
 }
@@ -1037,6 +1076,11 @@ spggettuple(IndexScanDesc scan, ScanDirection dir)
 			scan->xs_recheck = so->recheck[so->iPtr];
 			scan->xs_hitup = so->reconTups[so->iPtr];
 
+			if (so->want_itup)
+				scan->xs_visrecheck = so->visrecheck[so->iPtr];
+
+			Assert(!scan->xs_want_itup || scan->xs_visrecheck != TMVC_Unchecked);
+
 			if (so->numberOfOrderBys > 0)
 				index_store_float8_orderby_distances(scan, so->orderByTypes,
 													 so->distances[so->iPtr],
@@ -1065,7 +1109,8 @@ spggettuple(IndexScanDesc scan, ScanDirection dir)
 		}
 		so->iPtr = so->nPtrs = 0;
 
-		spgWalk(scan->indexRelation, so, false, storeGettuple);
+		spgWalk(scan->indexRelation, scan->heapRelation, so, false,
+				storeGettuple);
 
 		if (so->nPtrs == 0)
 			break;				/* must have completed scan */
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index c461f8dc02d..49f21559094 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -625,7 +625,7 @@ spgvacuumpage(spgBulkDeleteState *bds, Buffer buffer)
 	BlockNumber blkno = BufferGetBlockNumber(buffer);
 	Page		page;
 
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+	LockBufferForCleanup(buffer);
 	page = BufferGetPage(buffer);
 
 	if (PageIsNew(page))
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index ec6d6f5f74d..f28e896593f 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -21,6 +21,7 @@
 #include "storage/buf.h"
 #include "utils/geo_decls.h"
 #include "utils/relcache.h"
+#include "tableam.h"
 
 
 typedef struct SpGistOptions
@@ -175,6 +176,7 @@ typedef struct SpGistSearchItem
 	bool		isLeaf;			/* SearchItem is heap item */
 	bool		recheck;		/* qual recheck is needed */
 	bool		recheckDistances;	/* distance recheck is needed */
+	uint8		visrecheck;		/* IOS: TMVC_Result of contained heap tuple */
 
 	/* array with numberOfOrderBys entries */
 	double		distances[FLEXIBLE_ARRAY_MEMBER];
@@ -223,6 +225,7 @@ typedef struct SpGistScanOpaqueData
 
 	/* These fields are only used in amgettuple scans: */
 	bool		want_itup;		/* are we reconstructing tuples? */
+	Buffer		vmbuf;			/* IOS: used for table_index_vischeck_tuples */
 	TupleDesc	reconTupDesc;	/* if so, descriptor for reconstructed tuples */
 	int			nPtrs;			/* number of TIDs found on current page */
 	int			iPtr;			/* index for scanning through same */
@@ -231,6 +234,8 @@ typedef struct SpGistScanOpaqueData
 	bool		recheckDistances[MaxIndexTuplesPerPage];	/* distance recheck
 															 * flags */
 	HeapTuple	reconTups[MaxIndexTuplesPerPage];	/* reconstructed tuples */
+	/* support for IOS */
+	uint8	   *visrecheck;		/* IOS vis check results, counted by nPtrs */
 
 	/* distances (for recheck) */
 	IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v02-0004-Test-for-IOS-Vacuum-race-conditions-in-index-AMs.patch (11.5K, 5-v02-0004-Test-for-IOS-Vacuum-race-conditions-in-index-AMs.patch)
  download | inline diff:
From 87bb9c767b7e8729a21500ae984b9bb003a73a50 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <[email protected]>
Date: Fri, 21 Mar 2025 16:41:31 +0100
Subject: [PATCH v02 4/4] Test for IOS/Vacuum race conditions in index AMs

Add regression tests that demonstrate wrong results can occur with index-only
scans in GiST and SP-GiST indexes when encountering tuples being removed by a
concurrent VACUUM operation.

With these tests the index AMs are also expected to not block VACUUM even when
they're used inside a cursor.

Reported-by: Peter Geoghegan <[email protected]>
Co-authored-by: Matthias van de Meent <[email protected]>
Co-authored-by: Peter Geoghegan <[email protected]>
Co-authored-by: Michail Nikolaev <[email protected]>
Discussion: https://postgr.es/m/flat/CANtu0oi0rkR%2BFsgyLXnGZ-uW2950-urApAWLhy-%2BV1WJD%3D_ZXA%40mail.gmail.com
---
 .../expected/index-only-scan-gist-vacuum.out  |  53 +++++++++
 .../index-only-scan-spgist-vacuum.out         |  53 +++++++++
 src/test/isolation/isolation_schedule         |   2 +
 .../specs/index-only-scan-gist-vacuum.spec    | 112 ++++++++++++++++++
 .../specs/index-only-scan-spgist-vacuum.spec  | 112 ++++++++++++++++++
 5 files changed, 332 insertions(+)
 create mode 100644 src/test/isolation/expected/index-only-scan-gist-vacuum.out
 create mode 100644 src/test/isolation/expected/index-only-scan-spgist-vacuum.out
 create mode 100644 src/test/isolation/specs/index-only-scan-gist-vacuum.spec
 create mode 100644 src/test/isolation/specs/index-only-scan-spgist-vacuum.spec

diff --git a/src/test/isolation/expected/index-only-scan-gist-vacuum.out b/src/test/isolation/expected/index-only-scan-gist-vacuum.out
new file mode 100644
index 00000000000..b7c02ee9529
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-gist-vacuum.out
@@ -0,0 +1,53 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s2_mod s1_begin s1_prepare_sorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit
+step s2_mod: 
+  DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+
+step s1_begin: BEGIN;
+step s1_prepare_sorted: 
+	DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)';
+
+step s1_fetch_1: 
+	FETCH FROM foo;
+
+                 x
+------------------
+1.4142135623730951
+(1 row)
+
+step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock;
+step s1_fetch_all: 
+	FETCH ALL FROM foo;
+
+x
+-
+(0 rows)
+
+step s1_commit: COMMIT;
+
+starting permutation: s2_mod s1_begin s1_prepare_unsorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit
+step s2_mod: 
+  DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+
+step s1_begin: BEGIN;
+step s1_prepare_unsorted: 
+	DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a;
+
+step s1_fetch_1: 
+	FETCH FROM foo;
+
+a    
+-----
+(1,1)
+(1 row)
+
+step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock;
+step s1_fetch_all: 
+	FETCH ALL FROM foo;
+
+a
+-
+(0 rows)
+
+step s1_commit: COMMIT;
diff --git a/src/test/isolation/expected/index-only-scan-spgist-vacuum.out b/src/test/isolation/expected/index-only-scan-spgist-vacuum.out
new file mode 100644
index 00000000000..b7c02ee9529
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-spgist-vacuum.out
@@ -0,0 +1,53 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s2_mod s1_begin s1_prepare_sorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit
+step s2_mod: 
+  DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+
+step s1_begin: BEGIN;
+step s1_prepare_sorted: 
+	DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)';
+
+step s1_fetch_1: 
+	FETCH FROM foo;
+
+                 x
+------------------
+1.4142135623730951
+(1 row)
+
+step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock;
+step s1_fetch_all: 
+	FETCH ALL FROM foo;
+
+x
+-
+(0 rows)
+
+step s1_commit: COMMIT;
+
+starting permutation: s2_mod s1_begin s1_prepare_unsorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit
+step s2_mod: 
+  DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+
+step s1_begin: BEGIN;
+step s1_prepare_unsorted: 
+	DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a;
+
+step s1_fetch_1: 
+	FETCH FROM foo;
+
+a    
+-----
+(1,1)
+(1 row)
+
+step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock;
+step s1_fetch_all: 
+	FETCH ALL FROM foo;
+
+a
+-
+(0 rows)
+
+step s1_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 15c33fad4c5..78969f1cf73 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -18,6 +18,8 @@ test: two-ids
 test: multiple-row-versions
 test: index-only-scan
 test: index-only-bitmapscan
+test: index-only-scan-gist-vacuum
+test: index-only-scan-spgist-vacuum
 test: predicate-lock-hot-tuple
 test: update-conflict-out
 test: deadlock-simple
diff --git a/src/test/isolation/specs/index-only-scan-gist-vacuum.spec b/src/test/isolation/specs/index-only-scan-gist-vacuum.spec
new file mode 100644
index 00000000000..9d241b25920
--- /dev/null
+++ b/src/test/isolation/specs/index-only-scan-gist-vacuum.spec
@@ -0,0 +1,112 @@
+# index-only-scan test showing wrong results with GiST
+#
+setup
+{
+	-- by using a low fillfactor and a wide tuple we can get multiple blocks
+	-- with just few rows
+	CREATE TABLE ios_needs_cleanup_lock (a point NOT NULL, b int not null, pad char(1024) default '')
+	WITH (AUTOVACUUM_ENABLED = false, FILLFACTOR = 10);
+
+	INSERT INTO ios_needs_cleanup_lock SELECT point(g.i, g.i), g.i FROM generate_series(1, 10) g(i);
+
+	CREATE INDEX ios_spgist_a ON ios_needs_cleanup_lock USING gist(a);
+}
+setup
+{
+	VACUUM (ANALYZE) ios_needs_cleanup_lock;
+}
+
+teardown
+{
+	DROP TABLE ios_needs_cleanup_lock;
+}
+
+
+session s1
+
+# Force an index-only scan, where possible:
+setup {
+	SET enable_bitmapscan = false;
+	SET enable_indexonlyscan = true;
+	SET enable_indexscan = true;
+}
+
+step s1_begin { BEGIN; }
+step s1_commit { COMMIT; }
+
+step s1_prepare_sorted {
+	DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)';
+}
+
+step s1_prepare_unsorted {
+	DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a;
+}
+
+step s1_fetch_1 {
+	FETCH FROM foo;
+}
+
+step s1_fetch_all {
+	FETCH ALL FROM foo;
+}
+
+
+session s2
+
+# Don't delete row 1 so we have a row for the cursor to "rest" on.
+step s2_mod
+{
+  DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+}
+
+# Disable truncation, as otherwise we'll just wait for a timeout while trying
+# to acquire the lock
+step s2_vacuum  { VACUUM (TRUNCATE false) ios_needs_cleanup_lock; }
+
+permutation
+  # delete nearly all rows, to make issue visible
+  s2_mod
+  # create a cursor
+  s1_begin
+  s1_prepare_sorted
+
+  # fetch one row from the cursor, that ensures the index scan portion is done
+  # before the vacuum in the next step
+  s1_fetch_1
+
+  # with the bug this vacuum will mark pages as all-visible that the scan in
+  # the next step then considers all-visible, despite all rows from those
+  # pages having been removed.
+  # Because this should block on buffer-level locks, this won't ever be
+  # considered "blocked" by isolation tester, and so we only have a single
+  # step we can work with concurrently.
+  s2_vacuum
+
+  # if this returns any rows, we're busted
+  s1_fetch_all
+
+  s1_commit
+
+permutation
+  # delete nearly all rows, to make issue visible
+  s2_mod
+  # create a cursor
+  s1_begin
+  s1_prepare_unsorted
+
+  # fetch one row from the cursor, that ensures the index scan portion is done
+  # before the vacuum in the next step
+  s1_fetch_1
+
+  # with the bug this vacuum will mark pages as all-visible that the scan in
+  # the next step then considers all-visible, despite all rows from those
+  # pages having been removed.
+  # Because this should block on buffer-level locks, this won't ever be
+  # considered "blocked" by isolation tester, and so we only have a single
+  # step we can work with concurrently.
+  s2_vacuum
+
+  # if this returns any rows, we're busted
+  s1_fetch_all
+
+  s1_commit
diff --git a/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec b/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec
new file mode 100644
index 00000000000..cd621d4f7f2
--- /dev/null
+++ b/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec
@@ -0,0 +1,112 @@
+# index-only-scan test showing wrong results with SPGiST
+#
+setup
+{
+	-- by using a low fillfactor and a wide tuple we can get multiple blocks
+	-- with just few rows
+	CREATE TABLE ios_needs_cleanup_lock (a point NOT NULL, b int not null, pad char(1024) default '')
+	WITH (AUTOVACUUM_ENABLED = false, FILLFACTOR = 10);
+
+	INSERT INTO ios_needs_cleanup_lock SELECT point(g.i, g.i), g.i FROM generate_series(1, 10) g(i);
+
+	CREATE INDEX ios_spgist_a ON ios_needs_cleanup_lock USING spgist(a);
+}
+setup
+{
+	VACUUM (ANALYZE) ios_needs_cleanup_lock;
+}
+
+teardown
+{
+	DROP TABLE ios_needs_cleanup_lock;
+}
+
+
+session s1
+
+# Force an index-only scan, where possible:
+setup {
+	SET enable_bitmapscan = false;
+	SET enable_indexonlyscan = true;
+	SET enable_indexscan = true;
+}
+
+step s1_begin { BEGIN; }
+step s1_commit { COMMIT; }
+
+step s1_prepare_sorted {
+	DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)';
+}
+
+step s1_prepare_unsorted {
+	DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a;
+}
+
+step s1_fetch_1 {
+	FETCH FROM foo;
+}
+
+step s1_fetch_all {
+	FETCH ALL FROM foo;
+}
+
+
+session s2
+
+# Don't delete row 1 so we have a row for the cursor to "rest" on.
+step s2_mod
+{
+  DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+}
+
+# Disable truncation, as otherwise we'll just wait for a timeout while trying
+# to acquire the lock
+step s2_vacuum  { VACUUM (TRUNCATE false) ios_needs_cleanup_lock; }
+
+permutation
+  # delete nearly all rows, to make issue visible
+  s2_mod
+  # create a cursor
+  s1_begin
+  s1_prepare_sorted
+
+  # fetch one row from the cursor, that ensures the index scan portion is done
+  # before the vacuum in the next step
+  s1_fetch_1
+
+  # with the bug this vacuum will mark pages as all-visible that the scan in
+  # the next step then considers all-visible, despite all rows from those
+  # pages having been removed.
+  # Because this should block on buffer-level locks, this won't ever be
+  # considered "blocked" by isolation tester, and so we only have a single
+  # step we can work with concurrently.
+  s2_vacuum
+
+  # if this returns any rows, we're busted
+  s1_fetch_all
+
+  s1_commit
+
+permutation
+  # delete nearly all rows, to make issue visible
+  s2_mod
+  # create a cursor
+  s1_begin
+  s1_prepare_unsorted
+
+  # fetch one row from the cursor, that ensures the index scan portion is done
+  # before the vacuum in the next step
+  s1_fetch_1
+
+  # with the bug this vacuum will mark pages as all-visible that the scan in
+  # the next step then considers all-visible, despite all rows from those
+  # pages having been removed.
+  # Because this should block on buffer-level locks, this won't ever be
+  # considered "blocked" by isolation tester, and so we only have a single
+  # step we can work with concurrently.
+  s2_vacuum
+
+  # if this returns any rows, we're busted
+  s1_fetch_all
+
+  s1_commit
-- 
2.50.1 (Apple Git-155)



view thread (3+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: [SP-]GiST IOS visibility bug (was: Why doens't GiST require super-exclusive lock)
  In-Reply-To: <CAEze2WiQ61UjWbqc5d9x3pTWWw24MsW2g7Be-+o4io1fKSQuVg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox