public inbox for [email protected]  
help / color / mirror / Atom feed
From: Michail Nikolaev <[email protected]>
To: Zhijie Hou (Fujitsu) <[email protected]>
Cc: Amit Kapila <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Andres Freund <[email protected]>
Subject: Re: [BUG?] check_exclusion_or_unique_constraint false negative
Date: Sat, 23 Nov 2024 17:20:41 +0100
Message-ID: <CANtu0oj8LzpjCvF1zSwdLJxPddhbdY0=uh=7-wT6vwhmju_-PQ@mail.gmail.com> (raw)
In-Reply-To: <CANtu0og_4FVsRMxXue8SXUn03MwBAT0SiZSib_wcPMLDkpn-RA@mail.gmail.com>
References: <CANtu0oiktqQ2pwExoXqDpByXNCJa-KE5vQRodTRnmFHN_+qwHg@mail.gmail.com>
	<CANtu0ohU2XRV9shtu14CffLPDS1x10q7ebOGf-vX0p+45_L8jw@mail.gmail.com>
	<CANtu0oh0tspW-xWzDGWP9ehz96KPt9aUP1c9JYhdBYxKsB0jpA@mail.gmail.com>
	<CANtu0ohUB9ky45iiMAYN1fGyt82+cg=+UYBom=P7drb+=97G9w@mail.gmail.com>
	<TYAPR01MB56921C9C3D21B0D62FF76330F5B22@TYAPR01MB5692.jpnprd01.prod.outlook.com>
	<CANtu0og=5v4j8onS4nyJ4zMPdh-EPFxmiEi5PLoyZrmqHA6RKw@mail.gmail.com>
	<CAA4eK1Jfb0xviXYon-_TvHNKeAY7ngAeo++Knu-0RPR6EkSBjA@mail.gmail.com>
	<CANtu0ohHmYXsK5bxU9Thcq1FbELLAk0S2Zap0r8AnU3OTmcCOA@mail.gmail.com>
	<CAA4eK1+_V1PWXrrgAM01p+CByP6JwYRxejZrcxOu83a-v_+zZg@mail.gmail.com>
	<CANtu0ogDDQnXbrv6p7Xtc2dT_MZ1fjdPgB9-0B5Lw1b4pQGd2A@mail.gmail.com>
	<OS0PR01MB5716FFD8DBBADB55E8E6935994852@OS0PR01MB5716.jpnprd01.prod.outlook.com>
	<CANtu0oiziTBM8+WDtkktMZv0rhGBroYGWwqSQW+MzOWpmk-XEw@mail.gmail.com>
	<OS0PR01MB5716E30952F542E256DD72E294802@OS0PR01MB5716.jpnprd01.prod.outlook.com>
	<CANtu0oh69b+VCiASX86dF_eY=9=A2RmMQ_+0+uxZ_Zir+oNhhw@mail.gmail.com>
	<CANtu0og_4FVsRMxXue8SXUn03MwBAT0SiZSib_wcPMLDkpn-RA@mail.gmail.com>

Hello, everyone!

A rebased version is attached. Also, fixed potential race in the test and
detailed commit messages.

Best regards,
Mikhail.

>


Attachments:

  [application/octet-stream] v3-0001-Fix-possible-lost-tuples-in-non-MVCC-index-scans-.patch (18.4K, 3-v3-0001-Fix-possible-lost-tuples-in-non-MVCC-index-scans-.patch)
  download | inline diff:
From f1d6b9d0096e18c9f187914d3722ba8675bc964d Mon Sep 17 00:00:00 2001
From: nkey <[email protected]>
Date: Sat, 23 Nov 2024 13:14:40 +0100
Subject: [PATCH v3 1/2] Fix possible lost tuples in non-MVCC index scans using
 SnapshotDirty

In certain scenarios, non-MVCC index scans using SnapshotDirty can miss tuples that are deleted and re-inserted concurrently during the scan. This issue arises because the scan might skip over deleted tuples and fail to see newly inserted ones if the page content is cached.

To address this, we modify the SnapshotDirty mechanism to track the maximum xmax (the highest transaction ID of deleting transactions) encountered during the scan. If this xmax is newer than the latest completed transaction ID at the start of the scan, we retry the index scan to ensure all relevant tuples are observed.

Key changes include:

* Updated HeapTupleSatisfiesDirty to record the maximum xmax seen.
* Modified InitDirtySnapshot to accept an optional parameter for tracking xmax.
* Added logic in index uniqueness checks (_bt_check_unique) and replication tuple searches (RelationFindReplTupleByIndex) to retry scans based on the tracked xmax.
* Introduced a new function ReadLastCompletedFullTransactionId to obtain the latest completed transaction ID.
* Updated documentation in nbtree/README to explain the issue and the solution.
* Added a regression test to cover this edge case.

This fix ensures that non-MVCC index scans are more robust in the face of concurrent data modifications, preventing potential data inconsistencies.
---
 contrib/pgstattuple/pgstattuple.c             |  2 +-
 src/backend/access/heap/heapam_handler.c      |  2 +-
 src/backend/access/heap/heapam_visibility.c   | 20 +++++++-
 src/backend/access/nbtree/README              | 10 ++++
 src/backend/access/nbtree/nbtinsert.c         |  2 +-
 src/backend/access/transam/varsup.c           | 11 +++++
 src/backend/executor/execIndexing.c           | 29 +++++++++++-
 src/backend/executor/execReplication.c        | 26 +++++++++-
 src/backend/replication/logical/origin.c      |  2 +-
 src/include/access/transam.h                  | 16 +++++++
 src/include/utils/snapmgr.h                   |  8 +++-
 src/include/utils/snapshot.h                  |  5 +-
 src/test/modules/test_misc/meson.build        |  1 +
 .../test_misc/t/007_dirty_index_scan.pl       | 47 +++++++++++++++++++
 14 files changed, 169 insertions(+), 12 deletions(-)
 create mode 100644 src/test/modules/test_misc/t/007_dirty_index_scan.pl

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 48cb8f59c4f..bc310fcb332 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -335,7 +335,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 	scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false);
 	hscan = (HeapScanDesc) scan;
 
-	InitDirtySnapshot(SnapshotDirty);
+	InitDirtySnapshot(SnapshotDirty, NULL);
 
 	nblocks = hscan->rs_nblocks;	/* # blocks to be scanned */
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index a8d95e0f1c1..4e99ea61c6e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -402,7 +402,7 @@ tuple_lock_retry:
 			 *
 			 * Loop here to deal with updated or busy tuples
 			 */
-			InitDirtySnapshot(SnapshotDirty);
+			InitDirtySnapshot(SnapshotDirty, NULL);
 			for (;;)
 			{
 				if (ItemPointerIndicatesMovedPartitions(tid))
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 9243feed01f..9001ce4362c 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -719,6 +719,12 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 		return TM_Deleted;		/* deleted by other */
 }
 
+static void UpdateDirtyMaxXmax(Snapshot snapshot, TransactionId xmax)
+{
+	if (snapshot->xip != NULL)
+		snapshot->xip[0] = TransactionIdNewer(xmax, snapshot->xip[0]);
+}
+
 /*
  * HeapTupleSatisfiesDirty
  *		True iff heap tuple is valid including effects of open transactions.
@@ -737,7 +743,9 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
  * Similarly for snapshot->xmax and the tuple's xmax.  If the tuple was
  * inserted speculatively, meaning that the inserter might still back down
  * on the insertion without aborting the whole transaction, the associated
- * token is also returned in snapshot->speculativeToken.
+ * token is also returned in snapshot->speculativeToken. If xip is != NULL
+ * xip[0] may be set to xid of deleter if it newer than previously store
+ * value.
  */
 static bool
 HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
@@ -750,6 +758,10 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 
 	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
 	snapshot->speculativeToken = 0;
+	/*
+	 * We intentionally keep snapshot->xip values unchanged as they should
+	 * be reset by logic out of the single heap fetch.
+	 */
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -870,6 +882,7 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	{
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 			return true;
+		UpdateDirtyMaxXmax(snapshot, HeapTupleHeaderGetRawXmax(tuple));
 		return false;			/* updated by other */
 	}
 
@@ -893,7 +906,10 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 			return true;
 		}
 		if (TransactionIdDidCommit(xmax))
+		{
+			UpdateDirtyMaxXmax(snapshot, xmax);
 			return false;
+		}
 		/* it must have aborted or crashed */
 		return true;
 	}
@@ -902,6 +918,7 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	{
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 			return true;
+		UpdateDirtyMaxXmax(snapshot, HeapTupleHeaderGetRawXmax(tuple));
 		return false;
 	}
 
@@ -931,6 +948,7 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 
 	SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
 				HeapTupleHeaderGetRawXmax(tuple));
+	UpdateDirtyMaxXmax(snapshot, HeapTupleHeaderGetRawXmax(tuple));
 	return false;				/* updated by other */
 }
 
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index 52e646c7f75..6de72a29ca0 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -489,6 +489,16 @@ on the leaf page at all when the page's LSN has changed.  (That won't work
 with an unlogged index, so for now we don't ever apply the "don't hold
 onto pin" optimization there.)
 
+Despite the locking protocol in place, it is still possible to receive an
+incorrect result during non-MVCC scans. This issue can occur if a concurrent
+transaction deletes a tuple and inserts a new tuple with a new TID in the
+same page. If the scan has already visited the page and cached its content
+in the buffer cache, it might skip the old tuple due to deletion and miss
+the new tuple because of the cache. This is a known limitation of the
+SnapshotDirty and SnapshotAny non-MVCC scans. However, for SnapshotDirty,
+it is possible to work around this limitation by using the returned max(xmax)
+to compare it with the latest committed transaction before the scan started.
+
 Fastpath For Index Insertion
 ----------------------------
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 99043da8412..ef63d62778e 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -427,7 +427,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 	/* Assume unique until we find a duplicate */
 	*is_unique = true;
 
-	InitDirtySnapshot(SnapshotDirty);
+	InitDirtySnapshot(SnapshotDirty, NULL);
 
 	page = BufferGetPage(insertstate->buf);
 	opaque = BTPageGetOpaque(page);
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index cfe8c6cf8dc..43026e58406 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -296,6 +296,17 @@ ReadNextFullTransactionId(void)
 	return fullXid;
 }
 
+FullTransactionId ReadLastCompletedFullTransactionId(void)
+{
+	FullTransactionId fullXid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	fullXid = TransamVariables->latestCompletedXid;
+	LWLockRelease(XidGenLock);
+
+	return fullXid;
+}
+
 /*
  * Advance nextXid to the value after a given xid.  The epoch is inferred.
  * This must only be called during recovery or from two-phase start-up code.
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index f9a2fac79e4..34e09dee17f 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -117,6 +117,7 @@
 #include "utils/multirangetypes.h"
 #include "utils/rangetypes.h"
 #include "utils/snapmgr.h"
+#include "utils/injection_point.h"
 
 /* waitMode argument to check_exclusion_or_unique_constraint() */
 typedef enum
@@ -711,6 +712,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 	IndexScanDesc index_scan;
 	ScanKeyData scankeys[INDEX_MAX_KEYS];
 	SnapshotData DirtySnapshot;
+	TransactionId maxXmax,
+				  latestCompletedXid;
 	int			i;
 	bool		conflict;
 	bool		found_self;
@@ -773,9 +776,10 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 
 	/*
 	 * Search the tuples that are in the index for any violations, including
-	 * tuples that aren't visible yet.
+	 * tuples that aren't visible yet. Also, detect cases index scan skip the
+	 * tuple in case of parallel update after index page content was cached.
 	 */
-	InitDirtySnapshot(DirtySnapshot);
+	InitDirtySnapshot(DirtySnapshot, &maxXmax);
 
 	for (i = 0; i < indnkeyatts; i++)
 	{
@@ -809,6 +813,12 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
+	/*
+	 * Each time we retry - remember last completed transaction before start
+	 * of the scan. Aso reset maxXmax.
+	 */
+	latestCompletedXid = XidFromFullTransactionId(ReadLastCompletedFullTransactionId());
+	maxXmax = InvalidTransactionId;
 	index_scan = index_beginscan(heap, index, &DirtySnapshot, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
@@ -924,6 +934,19 @@ retry:
 	}
 
 	index_endscan(index_scan);
+	/*
+	 * Check for the case when index scan fetched records before some other
+	 * transaction deleted tuple and inserted a new one.
+	 */
+	if (!conflict && TransactionIdIsValid(maxXmax) && !TransactionIdIsCurrentTransactionId(maxXmax))
+	{
+		/*
+		 * If we have skipped some tuple because it was deleted, but deletion happened after
+		 * start of the index scan - retry to be sure.
+		 */
+		if (TransactionIdPrecedes(latestCompletedXid, maxXmax))
+			goto retry;
+	}
 
 	/*
 	 * Ordinarily, at this point the search should have found the originally
@@ -937,6 +960,8 @@ retry:
 
 	ExecDropSingleTupleTableSlot(existing_slot);
 
+	if (!conflict)
+		INJECTION_POINT("check_exclusion_or_unique_constraint_no_conflict");
 	return !conflict;
 }
 
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 54025c9f150..cf8e847262a 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -229,6 +229,8 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	IndexScanDesc scan;
 	SnapshotData snap;
 	TransactionId xwait;
+	TransactionId maxXmax,
+				  latestCompletedXid;
 	Relation	idxrel;
 	bool		found;
 	TypeCacheEntry **eq = NULL;
@@ -239,7 +241,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 
 	isIdxSafeToSkipDuplicates = (GetRelationIdentityOrPK(rel) == idxoid);
 
-	InitDirtySnapshot(snap);
+	InitDirtySnapshot(snap, &maxXmax);
 
 	/* Build scan key. */
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
@@ -249,6 +251,12 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 
 retry:
 	found = false;
+	/*
+	 * Each time we retry - remember last completed transaction before start
+ 	 * of the scan. Aso reset maxXmax.
+ 	 */
+	maxXmax = InvalidTransactionId;
+	latestCompletedXid = XidFromFullTransactionId(ReadLastCompletedFullTransactionId());
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
@@ -288,6 +296,20 @@ retry:
 		break;
 	}
 
+	/*
+	 * Check for the case when index scan fetched records before some other
+	 * transaction deleted tuple and inserted a new one.
+	 */
+	if (!found && TransactionIdIsValid(maxXmax) && !TransactionIdIsCurrentTransactionId(maxXmax))
+	{
+		/*
+		 * If we have skipped some tuple because it was deleted, but deletion happened after
+		 * start of the index scan - retry to be sure.
+		 */
+		if (TransactionIdPrecedes(latestCompletedXid, maxXmax))
+			goto retry;
+	}
+
 	/* Found tuple, try to lock it in the lockmode. */
 	if (found)
 	{
@@ -411,7 +433,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 	eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
 
 	/* Start a heap scan. */
-	InitDirtySnapshot(snap);
+	InitDirtySnapshot(snap, NULL);
 	scan = table_beginscan(rel, &snap, 0, NULL);
 	scanslot = table_slot_create(rel, NULL);
 
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index baf696d8e68..e1e52c639d1 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -282,7 +282,7 @@ replorigin_create(const char *roname)
 	 * to the exclusive lock there's no danger that new rows can appear while
 	 * we're checking.
 	 */
-	InitDirtySnapshot(SnapshotDirty);
+	InitDirtySnapshot(SnapshotDirty, NULL);
 
 	rel = table_open(ReplicationOriginRelationId, ExclusiveLock);
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 28a2d287fd5..aae0ad90c2e 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -288,6 +288,7 @@ extern void VarsupShmemInit(void);
 extern FullTransactionId GetNewTransactionId(bool isSubXact);
 extern void AdvanceNextFullTransactionIdPastXid(TransactionId xid);
 extern FullTransactionId ReadNextFullTransactionId(void);
+extern FullTransactionId ReadLastCompletedFullTransactionId(void);
 extern void SetTransactionIdLimit(TransactionId oldest_datfrozenxid,
 								  Oid oldest_datoid);
 extern void AdvanceOldestClogXid(TransactionId oldest_datfrozenxid);
@@ -344,6 +345,21 @@ TransactionIdOlder(TransactionId a, TransactionId b)
 	return b;
 }
 
+/* return the newer of the two IDs */
+static inline TransactionId
+TransactionIdNewer(TransactionId a, TransactionId b)
+{
+	if (!TransactionIdIsValid(a))
+		return b;
+
+	if (!TransactionIdIsValid(b))
+		return a;
+
+	if (TransactionIdPrecedes(a, b))
+		return b;
+	return a;
+}
+
 /* return the older of the two IDs, assuming they're both normal */
 static inline TransactionId
 NormalTransactionIdOlder(TransactionId a, TransactionId b)
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 9398a84051c..5e6f3a7e76a 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -36,9 +36,13 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
  * We don't provide a static SnapshotDirty variable because it would be
  * non-reentrant.  Instead, users of that snapshot type should declare a
  * local variable of type SnapshotData, and initialize it with this macro.
+ * pxid is optional and can be NULL. If it is not NULL, pxid[0] will be set
+ * to the transaction ID of deleting transaction if the tuple is deleted
+ * and it newer than pxid[0].
  */
-#define InitDirtySnapshot(snapshotdata)  \
-	((snapshotdata).snapshot_type = SNAPSHOT_DIRTY)
+#define InitDirtySnapshot(snapshotdata, pxid)  \
+	((snapshotdata).snapshot_type = SNAPSHOT_DIRTY, \
+	 (snapshotdata).xip = (pxid))
 
 /*
  * Similarly, some initialization is required for a NonVacuumable snapshot.
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 8d1e31e888e..a68114e500f 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -96,7 +96,10 @@ typedef enum SnapshotType
 	 * xmax.  If the tuple was inserted speculatively, meaning that the
 	 * inserter might still back down on the insertion without aborting the
 	 * whole transaction, the associated token is also returned in
-	 * snapshot->speculativeToken.  See also InitDirtySnapshot().
+	 * snapshot->speculativeToken. If xip is non-NULL, the xid of the
+	 * deleting transaction is stored into xip[0] if it newer than existing
+	 * xip[0] value.
+	 * See also InitDirtySnapshot().
 	 * -------------------------------------------------------------------------
 	 */
 	SNAPSHOT_DIRTY,
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 283ffa751aa..68d291336e1 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -15,6 +15,7 @@ tests += {
       't/004_io_direct.pl',
       't/005_timeouts.pl',
       't/006_signal_autovacuum.pl',
+      't/007_dirty_index_scan.pl',
     ],
   },
 }
diff --git a/src/test/modules/test_misc/t/007_dirty_index_scan.pl b/src/test/modules/test_misc/t/007_dirty_index_scan.pl
new file mode 100644
index 00000000000..4d116e659e7
--- /dev/null
+++ b/src/test/modules/test_misc/t/007_dirty_index_scan.pl
@@ -0,0 +1,47 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test issue with lost tuple in case of DirtySnapshot index scans
+use strict;
+use warnings;
+
+use Config;
+use Errno;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($node, $result);
+$node = PostgreSQL::Test::Cluster->new('DirtyScan_test');
+$node->init;
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'autovacuum = off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+$node->safe_psql('postgres', q(CREATE TABLE tbl(i int primary key, n int)));
+
+$node->safe_psql('postgres', q(INSERT INTO tbl VALUES(42,1)));
+$node->safe_psql('postgres', q(SELECT injection_points_attach('check_exclusion_or_unique_constraint_no_conflict', 'error')));
+
+$node->pgbench(
+	'--no-vacuum --client=40 --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent UPSERT',
+	{
+		'on_conflicts' => q(
+			INSERT INTO tbl VALUES(42,1) on conflict(i) do update set n = EXCLUDED.n + 1;
+		)
+	});
+
+$node->safe_psql('postgres', q(SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict')));
+
+$node->stop;
+done_testing();
\ No newline at end of file
-- 
2.43.0



  [application/octet-stream] v3-0002-Add-isolation-test-to-reproduce-dirty-snapshot-sc.patch (6.7K, 4-v3-0002-Add-isolation-test-to-reproduce-dirty-snapshot-sc.patch)
  download | inline diff:
From 3c663bfd3fcb62b5a4d4726919be67becc7c1ba5 Mon Sep 17 00:00:00 2001
From: nkey <[email protected]>
Date: Sat, 23 Nov 2024 13:25:11 +0100
Subject: [PATCH v3 2/2] Add isolation test to reproduce dirty snapshot scan
 issue

This commit introduces an isolation test to reliably reproduce the issue where non-MVCC index scans using SnapshotDirty can miss tuples due to concurrent modifications. This situation can lead to incorrect results.

To facilitate this test, new injection points are added in the index_getnext_slot and check_exclusion_or_unique_constraint functions. These injection points allow the test to control the timing of operations, ensuring the race condition is triggered consistently.

Changes include:

* Added injection points in src/backend/access/index/indexam.c and src/backend/executor/execIndexing.c.
* Updated Makefile and meson.build to include the new dirty_index_scan isolation test.
* Created a new isolation spec dirty_index_scan.spec and its expected output to define and verify the test steps.
* This test complements the previous fix by demonstrating the issue and verifying that the fix effectively addresses it.
---
 src/backend/access/index/indexam.c            |  8 ++++
 src/backend/executor/execIndexing.c           |  1 +
 src/test/modules/injection_points/Makefile    |  2 +-
 .../expected/dirty_index_scan.out             | 39 +++++++++++++++++
 src/test/modules/injection_points/meson.build |  1 +
 .../specs/dirty_index_scan.spec               | 43 +++++++++++++++++++
 6 files changed, 93 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/injection_points/expected/dirty_index_scan.out
 create mode 100644 src/test/modules/injection_points/specs/dirty_index_scan.spec

diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 1859be614c0..1a70de2f470 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -57,6 +57,7 @@
 #include "utils/ruleutils.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/injection_point.h"
 
 
 /* ----------------------------------------------------------------
@@ -696,6 +697,13 @@ index_getnext_slot(IndexScanDesc scan, ScanDirection direction, TupleTableSlot *
 		 * the index.
 		 */
 		Assert(ItemPointerIsValid(&scan->xs_heaptid));
+#ifdef USE_INJECTION_POINTS
+		if (!IsCatalogRelationOid(scan->indexRelation->rd_id))
+		{
+			INJECTION_POINT("index_getnext_slot_before_fetch");
+		}
+#endif
+
 		if (index_fetch_heap(scan, slot))
 			return true;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 34e09dee17f..2cf5bd236d4 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -810,6 +810,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 	 * May have to restart scan from this point if a potential conflict is
 	 * found.
 	 */
+	INJECTION_POINT("check_exclusion_or_unique_constraint_before_index_scan");
 retry:
 	conflict = false;
 	found_self = false;
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index 0753a9df58c..11a9bacc750 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -13,7 +13,7 @@ PGFILEDESC = "injection_points - facility for injection points"
 REGRESS = injection_points reindex_conc
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace
+ISOLATION = basic inplace dirty_index_scan
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/dirty_index_scan.out b/src/test/modules/injection_points/expected/dirty_index_scan.out
new file mode 100644
index 00000000000..0451c7513b6
--- /dev/null
+++ b/src/test/modules/injection_points/expected/dirty_index_scan.out
@@ -0,0 +1,39 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1_s1 s3_s1 s2_s1 s3_s2
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s1_s1: INSERT INTO test.tbl VALUES(42, 1) on conflict(i) do update set n = EXCLUDED.n + 1; <waiting ...>
+step s3_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_before_index_scan');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_before_index_scan');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_s1: UPDATE test.tbl SET n = n + 1 WHERE i = 42;
+step s3_s2: 
+	SELECT injection_points_detach('index_getnext_slot_before_fetch');
+	SELECT injection_points_wakeup('index_getnext_slot_before_fetch');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_s1: <... completed>
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 58f19001157..26a98bfa148 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -44,6 +44,7 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'dirty_index_scan',
     ],
   },
   'tap': {
diff --git a/src/test/modules/injection_points/specs/dirty_index_scan.spec b/src/test/modules/injection_points/specs/dirty_index_scan.spec
new file mode 100644
index 00000000000..6fb5b985431
--- /dev/null
+++ b/src/test/modules/injection_points/specs/dirty_index_scan.spec
@@ -0,0 +1,43 @@
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA test;
+	CREATE UNLOGGED TABLE test.tbl(i int primary key, n int);
+	CREATE INDEX tbl_n_idx ON test.tbl(n);
+	INSERT INTO test.tbl VALUES(42,1);
+}
+
+teardown
+{
+	DROP SCHEMA test CASCADE;
+	DROP EXTENSION injection_points;
+}
+
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('check_exclusion_or_unique_constraint_no_conflict', 'error');
+	SELECT injection_points_attach('check_exclusion_or_unique_constraint_before_index_scan', 'wait');
+	SELECT injection_points_attach('index_getnext_slot_before_fetch', 'wait');
+}
+
+step s1_s1	{ INSERT INTO test.tbl VALUES(42, 1) on conflict(i) do update set n = EXCLUDED.n + 1; }
+
+session s2
+step s2_s1	{ UPDATE test.tbl SET n = n + 1 WHERE i = 42; }
+
+session s3
+step s3_s1		{
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_before_index_scan');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_before_index_scan');
+}
+step s3_s2		{
+	SELECT injection_points_detach('index_getnext_slot_before_fetch');
+	SELECT injection_points_wakeup('index_getnext_slot_before_fetch');
+}
+
+permutation
+	s1_s1
+	s3_s1
+	s2_s1
+	s3_s2
\ No newline at end of file
-- 
2.43.0



view thread (37+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: [BUG?] check_exclusion_or_unique_constraint false negative
  In-Reply-To: <CANtu0oj8LzpjCvF1zSwdLJxPddhbdY0=uh=7-wT6vwhmju_-PQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox