public inbox for [email protected]  
help / color / mirror / Atom feed
From: Michail Nikolaev <[email protected]>
To: Matthias van de Meent <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: Melanie Plageman <[email protected]>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: Sat, 21 Dec 2024 19:00:01 +0100
Message-ID: <CANtu0oiuUF7L0wTGxOHfumyoVge3n7C4rAjdmFo=efeEwobXbg@mail.gmail.com> (raw)
In-Reply-To: <CANtu0oi+nbipJUsMZcoUfodCyuTN_DAXD22UstjMTYWG=tJ4jw@mail.gmail.com>
References: <CANtu0oiLc-+7h9zfzOVy2cv2UuYk_5MUReVLnVbOay6OgD_KGg@mail.gmail.com>
	<CAEze2WgW6pj48xJhG_YLUE1QS+n9Yv0AZQwaWeb-r+X=HAxU_g@mail.gmail.com>
	<CANtu0oizNtPUrPB0Mh+2vyjdijTX=LZvO5_dZN3+NqvE-CFPtw@mail.gmail.com>
	<CAEze2Wi3BFLkFBcZ+Brfbr-mGBCcWXcWuHucnCnw5ZOQotc6Eg@mail.gmail.com>
	<CANtu0ojRX=osoiXL9JJG6g6qOowXVbVYX+mDsN+2jmFVe=eG7w@mail.gmail.com>
	<CAEze2Wg03Ps_StwEhgCdSn7VXY9ZUM=zCrf-m1dRZpTWv6wD_A@mail.gmail.com>
	<CANtu0oj66JjAq8xyRSeO=MuRHYS2XsYbhHRRESHtOcLJs=3+Sw@mail.gmail.com>
	<CANtu0ogT2Qn7-q_jK6+DqBQvFoTt69eQJDKxJARXV9pdWjd0Gg@mail.gmail.com>
	<CANtu0ogXgNkEuxbDRwznAZpxEXRmj3NzOen3y-RGHDwig0YBRw@mail.gmail.com>
	<CANtu0oi+FTMqDb+6Bv8w7VHiTFVMB1uAAip_P841WQH+ktPixw@mail.gmail.com>
	<CAEze2WgeyVnDb_j4gJQYC4+HcSsYQAdeRA1-F0KDnJ=Y0A_TzA@mail.gmail.com>
	<CANtu0oga9zqqEFhdmcWyJTK4d6EGMJsMB_LMgVSE8ar0xVm7Ew@mail.gmail.com>
	<CANtu0oirtBK_g4jxtw3jehSop3b0WSQaek5Sv5OGSXwxgcHwZQ@mail.gmail.com>
	<CANtu0oijWPRGRpaRR_OvT2R5YALzscvcOTFh-=uZKUpNJmuZtw@mail.gmail.com>
	<CAEze2WgHFnYdxkNUmvqxOc-cFUNEYaTqL7+Pei=CtA-ZrTOFyw@mail.gmail.com>
	<CANtu0oipL3e8fLnejbH4HnByMW6G_auR4v+ns8j-UHhuPW=9og@mail.gmail.com>
	<CANtu0ojmVw8GW5bJknnqSp7Dp1xEuoBewdu2imtQ2tGnWpiWEg@mail.gmail.com>
	<CAEze2WgNHTWfw_bP6O0zW_=vi1D-yi1nh6-JDj9kd=8UaB-zLA@mail.gmail.com>
	<CANtu0ojA5=rT8BN5==OAiQJZh8CAxD_U8thFhZ3mwrZQ6roNOA@mail.gmail.com>
	<CAEze2Wh3eSAnXFdY_6roNPb3WD-YsKbNLiKf=cPmAGHkPUd22w@mail.gmail.com>
	<CANtu0og_=ypCbH2ZFayn44i=CL0HAXKW390LfZhQ1F56HoFXtQ@mail.gmail.com>
	<CAEze2WghpUS29bJJh5GCZ+WtpO4qWmxiFF-CTWFiP4Qq62G58w@mail.gmail.com>
	<CANtu0oiuuGRvYRsH-y0iQjfc+JpT9o4mPUXVkz97+sW9BXA+FA@mail.gmail.com>
	<CANtu0og6P+O10XLm-AsoqOhZYEjr8SEHFETadSJ8ifO01YP1qg@mail.gmail.com>
	<CANtu0oj0pakvxXhHJhsiKgk=ywY57m623G=OvhJnLVFWe9JCpg@mail.gmail.com>
	<CANtu0oiOj65kZqP8ngnsc9O+gywUJATgOjOP6pUARXWsmS9cBQ@mail.gmail.com>
	<CANtu0oiT9SPFhs=h8DR4YVPox7TJ6jkfR9JgqT-0L+=uy=Lxng@mail.gmail.com>
	<CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A@mail.gmail.com>
	<CAEze2Wj9SgwOpe_1CWnS_D-txQaQyXArR=dm4DTnha93=yua4g@mail.gmail.com>
	<CANtu0ohFr7OzNSbxqBhUpR0mXDYyt0Xt6+=Tbq0EC7as7kr+Lg@mail.gmail.com>
	<CANtu0oh4PwBn_h+4p_MxFigRAyJvF-0nA9Tm5NFRwfsWWjZQiA@mail.gmail.com>
	<CANtu0ojHEVU9U_bxgViRmtqNTJ92LnF+76-yzn4axYjGsK2kqQ@mail.gmail.com>
	<CANtu0ogS871NkdUnZW9P_LVpLzhSJ1+cETK0b55cYjs=v2qbPA@mail.gmail.com>
	<CANtu0ohRVBDf4x7Ge3oVzgf4NzMb_DhmTM1ae0u1WUA+CD0UqA@mail.gmail.com>
	<CANtu0ogTfyng-H4yWr3Pm_+PXX+XvDx1AM1sXTy1V7DM6jJ+Bw@mail.gmail.com>
	<CANtu0oi+nbipJUsMZcoUfodCyuTN_DAXD22UstjMTYWG=tJ4jw@mail.gmail.com>

Hello!

Added STIR access method, next step is validating indexes using it.

Best regards,
Mikhail.

>


Attachments:

  [application/octet-stream] v7-0001-this-is-https-commitfest.postgresql.org-50-5160-m.patch (61.5K, 3-v7-0001-this-is-https-commitfest.postgresql.org-50-5160-m.patch)
  download | inline diff:
From 12efb82206cee7843bf17ccabacc91435d0bac5a Mon Sep 17 00:00:00 2001
From: nkey <[email protected]>
Date: Sat, 30 Nov 2024 11:36:28 +0100
Subject: [PATCH v7 1/6] this is https://commitfest.postgresql.org/50/5160/
 merged in single commit. it is required for stability of stress tests.

---
 src/backend/commands/indexcmds.c              |   4 +-
 src/backend/executor/execIndexing.c           |   3 +
 src/backend/executor/execPartition.c          | 119 ++++++++-
 src/backend/executor/nodeModifyTable.c        |   2 +
 src/backend/optimizer/util/plancat.c          | 135 +++++++---
 src/backend/utils/time/snapmgr.c              |   2 +
 src/test/modules/injection_points/Makefile    |   7 +-
 .../expected/index_concurrently_upsert.out    |  80 ++++++
 .../index_concurrently_upsert_predicate.out   |  80 ++++++
 .../expected/reindex_concurrently_upsert.out  | 238 ++++++++++++++++++
 ...ndex_concurrently_upsert_on_constraint.out | 238 ++++++++++++++++++
 ...eindex_concurrently_upsert_partitioned.out | 238 ++++++++++++++++++
 src/test/modules/injection_points/meson.build |  11 +
 .../specs/index_concurrently_upsert.spec      |  68 +++++
 .../index_concurrently_upsert_predicate.spec  |  70 ++++++
 .../specs/reindex_concurrently_upsert.spec    |  86 +++++++
 ...dex_concurrently_upsert_on_constraint.spec |  86 +++++++
 ...index_concurrently_upsert_partitioned.spec |  88 +++++++
 18 files changed, 1505 insertions(+), 50 deletions(-)
 create mode 100644 src/test/modules/injection_points/expected/index_concurrently_upsert.out
 create mode 100644 src/test/modules/injection_points/expected/index_concurrently_upsert_predicate.out
 create mode 100644 src/test/modules/injection_points/expected/reindex_concurrently_upsert.out
 create mode 100644 src/test/modules/injection_points/expected/reindex_concurrently_upsert_on_constraint.out
 create mode 100644 src/test/modules/injection_points/expected/reindex_concurrently_upsert_partitioned.out
 create mode 100644 src/test/modules/injection_points/specs/index_concurrently_upsert.spec
 create mode 100644 src/test/modules/injection_points/specs/index_concurrently_upsert_predicate.spec
 create mode 100644 src/test/modules/injection_points/specs/reindex_concurrently_upsert.spec
 create mode 100644 src/test/modules/injection_points/specs/reindex_concurrently_upsert_on_constraint.spec
 create mode 100644 src/test/modules/injection_points/specs/reindex_concurrently_upsert_partitioned.spec

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 4049ce1a10f..932854d6c60 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1766,6 +1766,7 @@ DefineIndex(Oid tableId,
 	 * before the reference snap was taken, we have to wait out any
 	 * transactions that might have older snapshots.
 	 */
+	INJECTION_POINT("define_index_before_set_valid");
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForOlderSnapshots(limitXmin, true);
@@ -4206,7 +4207,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * the same time to make sure we only get constraint violations from the
 	 * indexes with the correct names.
 	 */
-
+	INJECTION_POINT("reindex_relation_concurrently_before_swap");
 	StartTransactionCommand();
 
 	/*
@@ -4285,6 +4286,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * index_drop() for more details.
 	 */
 
+	INJECTION_POINT("reindex_relation_concurrently_before_set_dead");
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index f0a5f8879a9..820749239ca 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -117,6 +117,7 @@
 #include "utils/multirangetypes.h"
 #include "utils/rangetypes.h"
 #include "utils/snapmgr.h"
+#include "utils/injection_point.h"
 
 /* waitMode argument to check_exclusion_or_unique_constraint() */
 typedef enum
@@ -936,6 +937,8 @@ retry:
 	econtext->ecxt_scantuple = save_scantuple;
 
 	ExecDropSingleTupleTableSlot(existing_slot);
+	if (!conflict)
+		INJECTION_POINT("check_exclusion_or_unique_constraint_no_conflict");
 
 	return !conflict;
 }
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 76518862291..aeeee41d5f1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -483,6 +483,48 @@ ExecFindPartition(ModifyTableState *mtstate,
 	return rri;
 }
 
+/*
+ * IsIndexCompatibleAsArbiter
+ * 		Checks if the indexes are identical in terms of being used
+ * 		as arbiters for the INSERT ON CONFLICT operation by comparing
+ * 		them to the provided arbiter index.
+ *
+ * Returns the true if indexes are compatible.
+ */
+static bool
+IsIndexCompatibleAsArbiter(Relation	arbiterIndexRelation,
+						   IndexInfo  *arbiterIndexInfo,
+						   Relation	indexRelation,
+						   IndexInfo  *indexInfo)
+{
+	int i;
+
+	if (arbiterIndexInfo->ii_Unique != indexInfo->ii_Unique)
+		return false;
+	/* it is not supported for cases of exclusion constraints. */
+	if (arbiterIndexInfo->ii_ExclusionOps != NULL || indexInfo->ii_ExclusionOps != NULL)
+		return false;
+	if (arbiterIndexRelation->rd_index->indnkeyatts != indexRelation->rd_index->indnkeyatts)
+		return false;
+
+	for (i = 0; i < indexRelation->rd_index->indnkeyatts; i++)
+	{
+		int			arbiterAttoNo = arbiterIndexRelation->rd_index->indkey.values[i];
+		int			attoNo = indexRelation->rd_index->indkey.values[i];
+		if (arbiterAttoNo != attoNo)
+			return false;
+	}
+
+	if (list_difference(RelationGetIndexExpressions(arbiterIndexRelation),
+						RelationGetIndexExpressions(indexRelation)) != NIL)
+		return false;
+
+	if (list_difference(RelationGetIndexPredicate(arbiterIndexRelation),
+						RelationGetIndexPredicate(indexRelation)) != NIL)
+		return false;
+	return true;
+}
+
 /*
  * ExecInitPartitionInfo
  *		Lock the partition and initialize ResultRelInfo.  Also setup other
@@ -693,6 +735,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 		if (rootResultRelInfo->ri_onConflictArbiterIndexes != NIL)
 		{
 			List	   *childIdxs;
+			List 	   *nonAncestorIdxs = NIL;
+			int		   i, j, additional_arbiters = 0;
 
 			childIdxs = RelationGetIndexList(leaf_part_rri->ri_RelationDesc);
 
@@ -703,23 +747,74 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 				ListCell   *lc2;
 
 				ancestors = get_partition_ancestors(childIdx);
-				foreach(lc2, rootResultRelInfo->ri_onConflictArbiterIndexes)
+				if (ancestors)
 				{
-					if (list_member_oid(ancestors, lfirst_oid(lc2)))
-						arbiterIndexes = lappend_oid(arbiterIndexes, childIdx);
+					foreach(lc2, rootResultRelInfo->ri_onConflictArbiterIndexes)
+					{
+						if (list_member_oid(ancestors, lfirst_oid(lc2)))
+							arbiterIndexes = lappend_oid(arbiterIndexes, childIdx);
+					}
 				}
+				else /* No ancestor was found for that index. Save it for rechecking later. */
+					nonAncestorIdxs = lappend_oid(nonAncestorIdxs, childIdx);
 				list_free(ancestors);
 			}
+
+			/*
+			 * If any non-ancestor indexes are found, we need to compare them with other
+			 * indexes of the relation that will be used as arbiters. This is necessary
+			 * when a partitioned index is processed by REINDEX CONCURRENTLY. Both indexes
+			 * must be considered as arbiters to ensure that all concurrent transactions
+			 * use the same set of arbiters.
+			 */
+			if (nonAncestorIdxs)
+			{
+				for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+				{
+					if (list_member_oid(nonAncestorIdxs, leaf_part_rri->ri_IndexRelationDescs[i]->rd_index->indexrelid))
+					{
+						Relation nonAncestorIndexRelation = leaf_part_rri->ri_IndexRelationDescs[i];
+						IndexInfo *nonAncestorIndexInfo = leaf_part_rri->ri_IndexRelationInfo[i];
+						Assert(!list_member_oid(arbiterIndexes, nonAncestorIndexRelation->rd_index->indexrelid));
+
+						/* It is too early to us non-ready indexes as arbiters */
+						if (!nonAncestorIndexInfo->ii_ReadyForInserts)
+							continue;
+
+						for (j = 0; j < leaf_part_rri->ri_NumIndices; j++)
+						{
+							if (list_member_oid(arbiterIndexes,
+												leaf_part_rri->ri_IndexRelationDescs[j]->rd_index->indexrelid))
+							{
+								Relation arbiterIndexRelation = leaf_part_rri->ri_IndexRelationDescs[j];
+								IndexInfo *arbiterIndexInfo = leaf_part_rri->ri_IndexRelationInfo[j];
+
+								/* If non-ancestor index are compatible to arbiter - use it as arbiter too. */
+								if (IsIndexCompatibleAsArbiter(arbiterIndexRelation, arbiterIndexInfo,
+															   nonAncestorIndexRelation, nonAncestorIndexInfo))
+								{
+									arbiterIndexes = lappend_oid(arbiterIndexes,
+																 nonAncestorIndexRelation->rd_index->indexrelid);
+									additional_arbiters++;
+								}
+							}
+						}
+					}
+				}
+			}
+			list_free(nonAncestorIdxs);
+
+			/*
+			 * If the resulting lists are of inequal length, something is wrong.
+			 * (This shouldn't happen, since arbiter index selection should not
+			 * pick up a non-ready index.)
+			 *
+			 * But we need to consider an additional arbiter indexes also.
+			 */
+			if (list_length(rootResultRelInfo->ri_onConflictArbiterIndexes) !=
+				list_length(arbiterIndexes) - additional_arbiters)
+				elog(ERROR, "invalid arbiter index list");
 		}
-
-		/*
-		 * If the resulting lists are of inequal length, something is wrong.
-		 * (This shouldn't happen, since arbiter index selection should not
-		 * pick up an invalid index.)
-		 */
-		if (list_length(rootResultRelInfo->ri_onConflictArbiterIndexes) !=
-			list_length(arbiterIndexes))
-			elog(ERROR, "invalid arbiter index list");
 		leaf_part_rri->ri_onConflictArbiterIndexes = arbiterIndexes;
 
 		/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1161520f76b..23cf4c6b540 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -69,6 +69,7 @@
 #include "utils/datum.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
+#include "utils/injection_point.h"
 
 
 typedef struct MTTargetRelLookup
@@ -1087,6 +1088,7 @@ ExecInsert(ModifyTableContext *context,
 					return NULL;
 				}
 			}
+			INJECTION_POINT("exec_insert_before_insert_speculative");
 
 			/*
 			 * Before we start insertion proper, acquire our "speculative
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 153390f2dc9..56b58d1ed74 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -714,12 +714,14 @@ infer_arbiter_indexes(PlannerInfo *root)
 	List	   *indexList;
 	ListCell   *l;
 
-	/* Normalized inference attributes and inference expressions: */
-	Bitmapset  *inferAttrs = NULL;
-	List	   *inferElems = NIL;
+	/* Normalized required attributes and expressions: */
+	Bitmapset  *requiredArbiterAttrs = NULL;
+	List	   *requiredArbiterElems = NIL;
+	List	   *requiredIndexPredExprs = (List *) onconflict->arbiterWhere;
 
 	/* Results */
 	List	   *results = NIL;
+	bool	   foundValid = false;
 
 	/*
 	 * Quickly return NIL for ON CONFLICT DO NOTHING without an inference
@@ -754,8 +756,8 @@ infer_arbiter_indexes(PlannerInfo *root)
 
 		if (!IsA(elem->expr, Var))
 		{
-			/* If not a plain Var, just shove it in inferElems for now */
-			inferElems = lappend(inferElems, elem->expr);
+			/* If not a plain Var, just shove it in requiredArbiterElems for now */
+			requiredArbiterElems = lappend(requiredArbiterElems, elem->expr);
 			continue;
 		}
 
@@ -767,30 +769,76 @@ infer_arbiter_indexes(PlannerInfo *root)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("whole row unique index inference specifications are not supported")));
 
-		inferAttrs = bms_add_member(inferAttrs,
+		requiredArbiterAttrs = bms_add_member(requiredArbiterAttrs,
 									attno - FirstLowInvalidHeapAttributeNumber);
 	}
 
+	indexList = RelationGetIndexList(relation);
+
 	/*
 	 * Lookup named constraint's index.  This is not immediately returned
-	 * because some additional sanity checks are required.
+	 * because some additional sanity checks are required. Additionally, we
+	 * need to process other indexes as potential arbiters to account for
+	 * cases where REINDEX CONCURRENTLY is processing an index used as a
+	 * named constraint.
 	 */
 	if (onconflict->constraint != InvalidOid)
 	{
 		indexOidFromConstraint = get_constraint_index(onconflict->constraint);
 
 		if (indexOidFromConstraint == InvalidOid)
+		{
 			ereport(ERROR,
 					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-					 errmsg("constraint in ON CONFLICT clause has no associated index")));
+					errmsg("constraint in ON CONFLICT clause has no associated index")));
+		}
+
+		/*
+		 * Find the named constraint index to extract its attributes and predicates.
+		 * We open all indexes in the loop to avoid deadlock of changed order of locks.
+		 * */
+		foreach(l, indexList)
+		{
+			Oid			indexoid = lfirst_oid(l);
+			Relation	idxRel;
+			Form_pg_index idxForm;
+			AttrNumber	natt;
+
+			idxRel = index_open(indexoid, rte->rellockmode);
+			idxForm = idxRel->rd_index;
+
+			if (idxForm->indisready)
+			{
+				if (indexOidFromConstraint == idxForm->indexrelid)
+				{
+					/*
+					 * Prepare requirements for other indexes to be used as arbiter together
+					 * with indexOidFromConstraint. It is required to involve both equals indexes
+					 * in case of REINDEX CONCURRENTLY.
+					 */
+					for (natt = 0; natt < idxForm->indnkeyatts; natt++)
+					{
+						int			attno = idxRel->rd_index->indkey.values[natt];
+
+						if (attno != 0)
+							requiredArbiterAttrs = bms_add_member(requiredArbiterAttrs,
+														  attno - FirstLowInvalidHeapAttributeNumber);
+					}
+					requiredArbiterElems = RelationGetIndexExpressions(idxRel);
+					requiredIndexPredExprs = RelationGetIndexPredicate(idxRel);
+					/* We are done, so, quite the loop. */
+					index_close(idxRel, NoLock);
+					break;
+				}
+			}
+			index_close(idxRel, NoLock);
+		}
 	}
 
 	/*
 	 * Using that representation, iterate through the list of indexes on the
 	 * target relation to try and find a match
 	 */
-	indexList = RelationGetIndexList(relation);
-
 	foreach(l, indexList)
 	{
 		Oid			indexoid = lfirst_oid(l);
@@ -813,7 +861,13 @@ infer_arbiter_indexes(PlannerInfo *root)
 		idxRel = index_open(indexoid, rte->rellockmode);
 		idxForm = idxRel->rd_index;
 
-		if (!idxForm->indisvalid)
+		/*
+		 * We need to consider both indisvalid and indisready indexes because
+		 * them may become indisvalid before execution phase. It is required
+		 * to keep set of indexes used as arbiter to be the same for all
+		 * concurrent transactions.
+		 */
+		if (!idxForm->indisready)
 			goto next;
 
 		/*
@@ -833,27 +887,23 @@ infer_arbiter_indexes(PlannerInfo *root)
 				ereport(ERROR,
 						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 						 errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints")));
-
-			results = lappend_oid(results, idxForm->indexrelid);
-			list_free(indexList);
-			index_close(idxRel, NoLock);
-			table_close(relation, NoLock);
-			return results;
+			goto found;
 		}
 		else if (indexOidFromConstraint != InvalidOid)
 		{
-			/* No point in further work for index in named constraint case */
-			goto next;
+			/* In the case of "ON constraint_name DO UPDATE" we need to skip non-unique candidates. */
+			if (!idxForm->indisunique && onconflict->action == ONCONFLICT_UPDATE)
+				goto next;
+		}  else {
+			/*
+			 * Only considering conventional inference at this point (not named
+			 * constraints), so index under consideration can be immediately
+			 * skipped if it's not unique
+			 */
+			if (!idxForm->indisunique)
+				goto next;
 		}
 
-		/*
-		 * Only considering conventional inference at this point (not named
-		 * constraints), so index under consideration can be immediately
-		 * skipped if it's not unique
-		 */
-		if (!idxForm->indisunique)
-			goto next;
-
 		/*
 		 * So-called unique constraints with WITHOUT OVERLAPS are really
 		 * exclusion constraints, so skip those too.
@@ -873,7 +923,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 		}
 
 		/* Non-expression attributes (if any) must match */
-		if (!bms_equal(indexedAttrs, inferAttrs))
+		if (!bms_equal(indexedAttrs, requiredArbiterAttrs))
 			goto next;
 
 		/* Expression attributes (if any) must match */
@@ -881,6 +931,10 @@ infer_arbiter_indexes(PlannerInfo *root)
 		if (idxExprs && varno != 1)
 			ChangeVarNodes((Node *) idxExprs, 1, varno, 0);
 
+		/*
+		 * If arbiterElems are present, check them. If name >constraint is
+		 * present arbiterElems == NIL.
+		 */
 		foreach(el, onconflict->arbiterElems)
 		{
 			InferenceElem *elem = (InferenceElem *) lfirst(el);
@@ -918,27 +972,35 @@ infer_arbiter_indexes(PlannerInfo *root)
 		}
 
 		/*
-		 * Now that all inference elements were matched, ensure that the
+		 * In case of the conventional inference involved ensure that the
 		 * expression elements from inference clause are not missing any
 		 * cataloged expressions.  This does the right thing when unique
 		 * indexes redundantly repeat the same attribute, or if attributes
 		 * redundantly appear multiple times within an inference clause.
+		 *
+		 * In the case of named constraint ensure candidate has equal set
+		 * of expressions as the named constraint index.
 		 */
-		if (list_difference(idxExprs, inferElems) != NIL)
+		if (list_difference(idxExprs, requiredArbiterElems) != NIL)
 			goto next;
 
-		/*
-		 * If it's a partial index, its predicate must be implied by the ON
-		 * CONFLICT's WHERE clause.
-		 */
 		predExprs = RelationGetIndexPredicate(idxRel);
 		if (predExprs && varno != 1)
 			ChangeVarNodes((Node *) predExprs, 1, varno, 0);
 
-		if (!predicate_implied_by(predExprs, (List *) onconflict->arbiterWhere, false))
+		/*
+		 * If it's a partial index and conventional inference, its predicate must be implied
+		 * by the ON CONFLICT's WHERE clause.
+		 */
+		if (indexOidFromConstraint == InvalidOid && !predicate_implied_by(predExprs, requiredIndexPredExprs, false))
+			goto next;
+		/* If it's a partial index and named constraint predicates must be equal. */
+		if (indexOidFromConstraint != InvalidOid && list_difference(predExprs, requiredIndexPredExprs) != NIL)
 			goto next;
 
+found:
 		results = lappend_oid(results, idxForm->indexrelid);
+		foundValid |= idxForm->indisvalid;
 next:
 		index_close(idxRel, NoLock);
 	}
@@ -946,7 +1008,8 @@ next:
 	list_free(indexList);
 	table_close(relation, NoLock);
 
-	if (results == NIL)
+	/* It is required to have at least one indisvalid index during the planning. */
+	if (results == NIL || !foundValid)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
 				 errmsg("there is no unique or exclusion constraint matching the ON CONFLICT specification")));
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index a1a0c2adeb6..2189bf0d9ae 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -64,6 +64,7 @@
 #include "utils/resowner.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/injection_point.h"
 
 
 /*
@@ -392,6 +393,7 @@ InvalidateCatalogSnapshot(void)
 		pairingheap_remove(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
 		CatalogSnapshot = NULL;
 		SnapshotResetXmin();
+		INJECTION_POINT("invalidate_catalog_snapshot_end");
 	}
 }
 
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index 0753a9df58c..f8f86e8f3b6 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -13,7 +13,12 @@ PGFILEDESC = "injection_points - facility for injection points"
 REGRESS = injection_points reindex_conc
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace
+ISOLATION = basic inplace \
+			reindex_concurrently_upsert \
+			index_concurrently_upsert \
+			reindex_concurrently_upsert_partitioned \
+			reindex_concurrently_upsert_on_constraint \
+			index_concurrently_upsert_predicate
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/index_concurrently_upsert.out b/src/test/modules/injection_points/expected/index_concurrently_upsert.out
new file mode 100644
index 00000000000..7f0659e8369
--- /dev/null
+++ b/src/test/modules/injection_points/expected/index_concurrently_upsert.out
@@ -0,0 +1,80 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s3_start_create_index s1_start_upsert s4_wakeup_define_index_before_set_valid s2_start_upsert s4_wakeup_s1_from_invalidate_catalog_snapshot s4_wakeup_s2 s4_wakeup_s1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_create_index: CREATE UNIQUE INDEX CONCURRENTLY tbl_pkey_duplicate ON test.tbl(i); <waiting ...>
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_define_index_before_set_valid: 
+	SELECT injection_points_detach('define_index_before_set_valid');
+	SELECT injection_points_wakeup('define_index_before_set_valid');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_create_index: <... completed>
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1_from_invalidate_catalog_snapshot: 
+	SELECT injection_points_detach('invalidate_catalog_snapshot_end');
+	SELECT injection_points_wakeup('invalidate_catalog_snapshot_end');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: <... completed>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
diff --git a/src/test/modules/injection_points/expected/index_concurrently_upsert_predicate.out b/src/test/modules/injection_points/expected/index_concurrently_upsert_predicate.out
new file mode 100644
index 00000000000..2300d5165e9
--- /dev/null
+++ b/src/test/modules/injection_points/expected/index_concurrently_upsert_predicate.out
@@ -0,0 +1,80 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s3_start_create_index s1_start_upsert s4_wakeup_define_index_before_set_valid s2_start_upsert s4_wakeup_s1_from_invalidate_catalog_snapshot s4_wakeup_s2 s4_wakeup_s1
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_create_index: CREATE UNIQUE INDEX CONCURRENTLY tbl_pkey_special_duplicate ON test.tbl(abs(i)) WHERE i < 10000; <waiting ...>
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(abs(i)) where i < 100 do update set updated_at = now(); <waiting ...>
+step s4_wakeup_define_index_before_set_valid: 
+	SELECT injection_points_detach('define_index_before_set_valid');
+	SELECT injection_points_wakeup('define_index_before_set_valid');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_create_index: <... completed>
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now())  on conflict(abs(i)) where i < 100 do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1_from_invalidate_catalog_snapshot: 
+	SELECT injection_points_detach('invalidate_catalog_snapshot_end');
+	SELECT injection_points_wakeup('invalidate_catalog_snapshot_end');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: <... completed>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
diff --git a/src/test/modules/injection_points/expected/reindex_concurrently_upsert.out b/src/test/modules/injection_points/expected/reindex_concurrently_upsert.out
new file mode 100644
index 00000000000..24bbbcbdd88
--- /dev/null
+++ b/src/test/modules/injection_points/expected/reindex_concurrently_upsert.out
@@ -0,0 +1,238 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s3_start_reindex s1_start_upsert s4_wakeup_to_swap s2_start_upsert s4_wakeup_s1 s4_wakeup_s2 s4_wakeup_to_set_dead
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_pkey; <waiting ...>
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+
+starting permutation: s3_start_reindex s2_start_upsert s4_wakeup_to_swap s1_start_upsert s4_wakeup_s1 s4_wakeup_s2 s4_wakeup_to_set_dead
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_pkey; <waiting ...>
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+
+starting permutation: s3_start_reindex s4_wakeup_to_swap s1_start_upsert s2_start_upsert s4_wakeup_s1 s4_wakeup_to_set_dead s4_wakeup_s2
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_pkey; <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+step s2_start_upsert: <... completed>
diff --git a/src/test/modules/injection_points/expected/reindex_concurrently_upsert_on_constraint.out b/src/test/modules/injection_points/expected/reindex_concurrently_upsert_on_constraint.out
new file mode 100644
index 00000000000..d1cfd1731c8
--- /dev/null
+++ b/src/test/modules/injection_points/expected/reindex_concurrently_upsert_on_constraint.out
@@ -0,0 +1,238 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s3_start_reindex s1_start_upsert s4_wakeup_to_swap s2_start_upsert s4_wakeup_s1 s4_wakeup_s2 s4_wakeup_to_set_dead
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_pkey; <waiting ...>
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict on constraint tbl_pkey do update set updated_at = now(); <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict on constraint tbl_pkey do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+
+starting permutation: s3_start_reindex s2_start_upsert s4_wakeup_to_swap s1_start_upsert s4_wakeup_s1 s4_wakeup_s2 s4_wakeup_to_set_dead
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_pkey; <waiting ...>
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict on constraint tbl_pkey do update set updated_at = now(); <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict on constraint tbl_pkey do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+
+starting permutation: s3_start_reindex s4_wakeup_to_swap s1_start_upsert s2_start_upsert s4_wakeup_s1 s4_wakeup_to_set_dead s4_wakeup_s2
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_pkey; <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict on constraint tbl_pkey do update set updated_at = now(); <waiting ...>
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict on constraint tbl_pkey do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+step s2_start_upsert: <... completed>
diff --git a/src/test/modules/injection_points/expected/reindex_concurrently_upsert_partitioned.out b/src/test/modules/injection_points/expected/reindex_concurrently_upsert_partitioned.out
new file mode 100644
index 00000000000..c95ff264f12
--- /dev/null
+++ b/src/test/modules/injection_points/expected/reindex_concurrently_upsert_partitioned.out
@@ -0,0 +1,238 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s3_start_reindex s1_start_upsert s4_wakeup_to_swap s2_start_upsert s4_wakeup_s1 s4_wakeup_s2 s4_wakeup_to_set_dead
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_partition_pkey; <waiting ...>
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+
+starting permutation: s3_start_reindex s2_start_upsert s4_wakeup_to_swap s1_start_upsert s4_wakeup_s1 s4_wakeup_s2 s4_wakeup_to_set_dead
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_partition_pkey; <waiting ...>
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s2_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+
+starting permutation: s3_start_reindex s4_wakeup_to_swap s1_start_upsert s2_start_upsert s4_wakeup_s1 s4_wakeup_to_set_dead s4_wakeup_s2
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+injection_points_attach
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: REINDEX INDEX CONCURRENTLY test.tbl_partition_pkey; <waiting ...>
+step s4_wakeup_to_swap: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s2_start_upsert: INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); <waiting ...>
+step s4_wakeup_s1: 
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s1_start_upsert: <... completed>
+step s4_wakeup_to_set_dead: 
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s4_wakeup_s2: 
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+
+injection_points_detach
+-----------------------
+                       
+(1 row)
+
+injection_points_wakeup
+-----------------------
+                       
+(1 row)
+
+step s3_start_reindex: <... completed>
+step s2_start_upsert: <... completed>
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 58f19001157..91fc8ce687f 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -44,7 +44,16 @@ tests += {
     'specs': [
       'basic',
       'inplace',
+      'reindex_concurrently_upsert',
+      'index_concurrently_upsert',
+      'reindex_concurrently_upsert_partitioned',
+      'reindex_concurrently_upsert_on_constraint',
+      'index_concurrently_upsert_predicate',
     ],
+    # The injection points are cluster-wide, so disable installcheck
+    'runningcheck': false,
+    # We waiting for all snapshots, so, avoid parallel test executions
+    'runningcheck-parallel': false,
   },
   'tap': {
     'env': {
@@ -53,5 +62,7 @@ tests += {
     'tests': [
       't/001_stats.pl',
     ],
+    # The injection points are cluster-wide, so disable installcheck
+    'runningcheck': false,
   },
 }
diff --git a/src/test/modules/injection_points/specs/index_concurrently_upsert.spec b/src/test/modules/injection_points/specs/index_concurrently_upsert.spec
new file mode 100644
index 00000000000..075450935b6
--- /dev/null
+++ b/src/test/modules/injection_points/specs/index_concurrently_upsert.spec
@@ -0,0 +1,68 @@
+# Test race conditions involving:
+# - s1: UPSERT a tuple
+# - s2: UPSERT the same tuple
+# - s3: CREATE UNIQUE INDEX CONCURRENTLY
+# - s4: operations with injection points
+
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA test;
+	CREATE UNLOGGED TABLE test.tbl(i int primary key, updated_at timestamp);
+	ALTER TABLE test.tbl SET (parallel_workers=0);
+}
+
+teardown
+{
+	DROP SCHEMA test CASCADE;
+	DROP EXTENSION injection_points;
+}
+
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('check_exclusion_or_unique_constraint_no_conflict', 'wait');
+	SELECT injection_points_attach('invalidate_catalog_snapshot_end', 'wait');
+}
+step s1_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); }
+
+session s2
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('exec_insert_before_insert_speculative', 'wait');
+}
+step s2_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); }
+
+session s3
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('define_index_before_set_valid', 'wait');
+}
+step s3_start_create_index		{ CREATE UNIQUE INDEX CONCURRENTLY tbl_pkey_duplicate ON test.tbl(i); }
+
+session s4
+step s4_wakeup_s1		{
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+}
+step s4_wakeup_s1_from_invalidate_catalog_snapshot	{
+	SELECT injection_points_detach('invalidate_catalog_snapshot_end');
+	SELECT injection_points_wakeup('invalidate_catalog_snapshot_end');
+}
+step s4_wakeup_s2		{
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+}
+step s4_wakeup_define_index_before_set_valid	{
+	SELECT injection_points_detach('define_index_before_set_valid');
+	SELECT injection_points_wakeup('define_index_before_set_valid');
+}
+
+permutation
+	s3_start_create_index
+	s1_start_upsert
+	s4_wakeup_define_index_before_set_valid
+	s2_start_upsert
+	s4_wakeup_s1_from_invalidate_catalog_snapshot
+	s4_wakeup_s2
+	s4_wakeup_s1
\ No newline at end of file
diff --git a/src/test/modules/injection_points/specs/index_concurrently_upsert_predicate.spec b/src/test/modules/injection_points/specs/index_concurrently_upsert_predicate.spec
new file mode 100644
index 00000000000..70a27475e10
--- /dev/null
+++ b/src/test/modules/injection_points/specs/index_concurrently_upsert_predicate.spec
@@ -0,0 +1,70 @@
+# Test race conditions involving:
+# - s1: UPSERT a tuple
+# - s2: UPSERT the same tuple
+# - s3: CREATE UNIQUE INDEX CONCURRENTLY
+# - s4: operations with injection points
+
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA test;
+	CREATE UNLOGGED TABLE test.tbl(i int, updated_at timestamp);
+
+	CREATE UNIQUE INDEX tbl_pkey_special ON test.tbl(abs(i)) WHERE i < 1000;
+	ALTER TABLE test.tbl SET (parallel_workers=0);
+}
+
+teardown
+{
+	DROP SCHEMA test CASCADE;
+	DROP EXTENSION injection_points;
+}
+
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('check_exclusion_or_unique_constraint_no_conflict', 'wait');
+	SELECT injection_points_attach('invalidate_catalog_snapshot_end', 'wait');
+}
+step s1_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict(abs(i)) where i < 100 do update set updated_at = now(); }
+
+session s2
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('exec_insert_before_insert_speculative', 'wait');
+}
+step s2_start_upsert	{ INSERT INTO test.tbl VALUES(13,now())  on conflict(abs(i)) where i < 100 do update set updated_at = now(); }
+
+session s3
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('define_index_before_set_valid', 'wait');
+}
+step s3_start_create_index		{ CREATE UNIQUE INDEX CONCURRENTLY tbl_pkey_special_duplicate ON test.tbl(abs(i)) WHERE i < 10000;}
+
+session s4
+step s4_wakeup_s1		{
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+}
+step s4_wakeup_s1_from_invalidate_catalog_snapshot	{
+	SELECT injection_points_detach('invalidate_catalog_snapshot_end');
+	SELECT injection_points_wakeup('invalidate_catalog_snapshot_end');
+}
+step s4_wakeup_s2		{
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+}
+step s4_wakeup_define_index_before_set_valid	{
+	SELECT injection_points_detach('define_index_before_set_valid');
+	SELECT injection_points_wakeup('define_index_before_set_valid');
+}
+
+permutation
+	s3_start_create_index
+	s1_start_upsert
+	s4_wakeup_define_index_before_set_valid
+	s2_start_upsert
+	s4_wakeup_s1_from_invalidate_catalog_snapshot
+	s4_wakeup_s2
+	s4_wakeup_s1
\ No newline at end of file
diff --git a/src/test/modules/injection_points/specs/reindex_concurrently_upsert.spec b/src/test/modules/injection_points/specs/reindex_concurrently_upsert.spec
new file mode 100644
index 00000000000..38b86d84345
--- /dev/null
+++ b/src/test/modules/injection_points/specs/reindex_concurrently_upsert.spec
@@ -0,0 +1,86 @@
+# Test race conditions involving:
+# - s1: UPSERT a tuple
+# - s2: UPSERT the same tuple
+# - s3: REINDEX concurrent primary key index
+# - s4: operations with injection points
+
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA test;
+	CREATE UNLOGGED TABLE test.tbl(i int primary key, updated_at timestamp);
+	ALTER TABLE test.tbl SET (parallel_workers=0);
+}
+
+teardown
+{
+	DROP SCHEMA test CASCADE;
+	DROP EXTENSION injection_points;
+}
+
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('check_exclusion_or_unique_constraint_no_conflict', 'wait');
+}
+step s1_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); }
+
+session s2
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('exec_insert_before_insert_speculative', 'wait');
+}
+step s2_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); }
+
+session s3
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('reindex_relation_concurrently_before_set_dead', 'wait');
+	SELECT injection_points_attach('reindex_relation_concurrently_before_swap', 'wait');
+}
+step s3_start_reindex			{ REINDEX INDEX CONCURRENTLY test.tbl_pkey; }
+
+session s4
+step s4_wakeup_to_swap		{
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+}
+step s4_wakeup_s1		{
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+}
+step s4_wakeup_s2		{
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+}
+step s4_wakeup_to_set_dead		{
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+}
+
+permutation
+	s3_start_reindex
+	s1_start_upsert
+	s4_wakeup_to_swap
+	s2_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_s2
+	s4_wakeup_to_set_dead
+
+permutation
+	s3_start_reindex
+	s2_start_upsert
+	s4_wakeup_to_swap
+	s1_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_s2
+	s4_wakeup_to_set_dead
+
+permutation
+	s3_start_reindex
+	s4_wakeup_to_swap
+	s1_start_upsert
+	s2_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_to_set_dead
+	s4_wakeup_s2
\ No newline at end of file
diff --git a/src/test/modules/injection_points/specs/reindex_concurrently_upsert_on_constraint.spec b/src/test/modules/injection_points/specs/reindex_concurrently_upsert_on_constraint.spec
new file mode 100644
index 00000000000..7d8e371bb0a
--- /dev/null
+++ b/src/test/modules/injection_points/specs/reindex_concurrently_upsert_on_constraint.spec
@@ -0,0 +1,86 @@
+# Test race conditions involving:
+# - s1: UPSERT a tuple
+# - s2: UPSERT the same tuple
+# - s3: REINDEX concurrent primary key index
+# - s4: operations with injection points
+
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA test;
+	CREATE UNLOGGED TABLE test.tbl(i int primary key, updated_at timestamp);
+	ALTER TABLE test.tbl SET (parallel_workers=0);
+}
+
+teardown
+{
+	DROP SCHEMA test CASCADE;
+	DROP EXTENSION injection_points;
+}
+
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('check_exclusion_or_unique_constraint_no_conflict', 'wait');
+}
+step s1_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict on constraint tbl_pkey do update set updated_at = now(); }
+
+session s2
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('exec_insert_before_insert_speculative', 'wait');
+}
+step s2_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict on constraint tbl_pkey do update set updated_at = now(); }
+
+session s3
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('reindex_relation_concurrently_before_set_dead', 'wait');
+	SELECT injection_points_attach('reindex_relation_concurrently_before_swap', 'wait');
+}
+step s3_start_reindex			{ REINDEX INDEX CONCURRENTLY test.tbl_pkey; }
+
+session s4
+step s4_wakeup_to_swap		{
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+}
+step s4_wakeup_s1		{
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+}
+step s4_wakeup_s2		{
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+}
+step s4_wakeup_to_set_dead		{
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+}
+
+permutation
+	s3_start_reindex
+	s1_start_upsert
+	s4_wakeup_to_swap
+	s2_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_s2
+	s4_wakeup_to_set_dead
+
+permutation
+	s3_start_reindex
+	s2_start_upsert
+	s4_wakeup_to_swap
+	s1_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_s2
+	s4_wakeup_to_set_dead
+
+permutation
+	s3_start_reindex
+	s4_wakeup_to_swap
+	s1_start_upsert
+	s2_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_to_set_dead
+	s4_wakeup_s2
\ No newline at end of file
diff --git a/src/test/modules/injection_points/specs/reindex_concurrently_upsert_partitioned.spec b/src/test/modules/injection_points/specs/reindex_concurrently_upsert_partitioned.spec
new file mode 100644
index 00000000000..b9253463039
--- /dev/null
+++ b/src/test/modules/injection_points/specs/reindex_concurrently_upsert_partitioned.spec
@@ -0,0 +1,88 @@
+# Test race conditions involving:
+# - s1: UPSERT a tuple
+# - s2: UPSERT the same tuple
+# - s3: REINDEX concurrent primary key index
+# - s4: operations with injection points
+
+setup
+{
+	CREATE EXTENSION injection_points;
+	CREATE SCHEMA test;
+	CREATE TABLE test.tbl(i int primary key, updated_at timestamp) PARTITION BY RANGE (i);
+	CREATE TABLE test.tbl_partition PARTITION OF test.tbl
+		FOR VALUES FROM (0) TO (10000)
+		WITH (parallel_workers = 0);
+}
+
+teardown
+{
+	DROP SCHEMA test CASCADE;
+	DROP EXTENSION injection_points;
+}
+
+session s1
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('check_exclusion_or_unique_constraint_no_conflict', 'wait');
+}
+step s1_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); }
+
+session s2
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('exec_insert_before_insert_speculative', 'wait');
+}
+step s2_start_upsert	{ INSERT INTO test.tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); }
+
+session s3
+setup	{
+	SELECT injection_points_set_local();
+	SELECT injection_points_attach('reindex_relation_concurrently_before_set_dead', 'wait');
+	SELECT injection_points_attach('reindex_relation_concurrently_before_swap', 'wait');
+}
+step s3_start_reindex			{ REINDEX INDEX CONCURRENTLY test.tbl_partition_pkey; }
+
+session s4
+step s4_wakeup_to_swap		{
+	SELECT injection_points_detach('reindex_relation_concurrently_before_swap');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_swap');
+}
+step s4_wakeup_s1		{
+	SELECT injection_points_detach('check_exclusion_or_unique_constraint_no_conflict');
+	SELECT injection_points_wakeup('check_exclusion_or_unique_constraint_no_conflict');
+}
+step s4_wakeup_s2		{
+	SELECT injection_points_detach('exec_insert_before_insert_speculative');
+	SELECT injection_points_wakeup('exec_insert_before_insert_speculative');
+}
+step s4_wakeup_to_set_dead		{
+	SELECT injection_points_detach('reindex_relation_concurrently_before_set_dead');
+	SELECT injection_points_wakeup('reindex_relation_concurrently_before_set_dead');
+}
+
+permutation
+	s3_start_reindex
+	s1_start_upsert
+	s4_wakeup_to_swap
+	s2_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_s2
+	s4_wakeup_to_set_dead
+
+permutation
+	s3_start_reindex
+	s2_start_upsert
+	s4_wakeup_to_swap
+	s1_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_s2
+	s4_wakeup_to_set_dead
+
+permutation
+	s3_start_reindex
+	s4_wakeup_to_swap
+	s1_start_upsert
+	s2_start_upsert
+	s4_wakeup_s1
+	s4_wakeup_to_set_dead
+	s4_wakeup_s2
\ No newline at end of file
-- 
2.43.0



  [application/octet-stream] v7-0003-Allow-advancing-xmin-during-non-unique-non-parall.patch (36.3K, 4-v7-0003-Allow-advancing-xmin-during-non-unique-non-parall.patch)
  download | inline diff:
From 452ef7089db779a08421a1084584c13c599d1320 Mon Sep 17 00:00:00 2001
From: nkey <[email protected]>
Date: Sat, 30 Nov 2024 17:41:29 +0100
Subject: [PATCH v7 3/6] Allow advancing xmin during non-unique, non-parallel 
 concurrent index builds by periodically resetting snapshots

Long-running transactions like those used by CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY can hold back the global xmin horizon, preventing VACUUM from cleaning up dead tuples and potentially leading to transaction ID wraparound issues. In PostgreSQL 14, commit d9d076222f5b attempted to allow VACUUM to ignore indexing transactions with CONCURRENTLY to mitigate this problem. However, this was reverted in commit e28bb8851969 because it could cause indexes to miss heap tuples that were HOT-updated and HOT-pruned during the index creation, leading to index corruption.

This patch introduces a safe alternative by periodically resetting the snapshot used during non-unique, non-parallel concurrent index builds. By resetting the snapshot every N pages during the heap scan, we allow the xmin horizon to advance without risking index corruption. This approach is safe for non-unique index builds because they do not enforce uniqueness constraints that require a consistent snapshot across the entire scan.

Currently, this technique is applied to:

Non-parallel index builds: Parallel index builds are not yet supported and will be addressed in a future commit.
Non-unique indexes: Unique index builds still require a consistent snapshot to enforce uniqueness constraints, and support for them may be added in the future.
Only during the first scan of the heap: The second scan during index validation still uses a single snapshot to ensure index correctness.

To implement this, a new scan option SO_RESET_SNAPSHOT is introduced. When set, it causes the snapshot to be reset every SO_RESET_SNAPSHOT_EACH_N_PAGE pages during the scan. The heap scan code is adjusted to support this option, and the index build code is modified to use it for applicable concurrent index builds that are not on system catalogs and not using parallel workers.

This addresses the issues that led to the reversion of commit d9d076222f5b, providing a safe way to allow xmin advancement during long-running non-unique, non-parallel concurrent index builds while ensuring index correctness.

Regression tests are added to verify the behavior.
---
 contrib/amcheck/verify_nbtree.c               |   3 +-
 contrib/pgstattuple/pgstattuple.c             |   2 +-
 src/backend/access/brin/brin.c                |  14 +++
 src/backend/access/heap/heapam.c              |  46 ++++++++
 src/backend/access/heap/heapam_handler.c      |  57 ++++++++--
 src/backend/access/index/genam.c              |   2 +-
 src/backend/access/nbtree/nbtsort.c           |  14 +++
 src/backend/catalog/index.c                   |  30 ++++-
 src/backend/commands/indexcmds.c              |  14 +--
 src/backend/optimizer/plan/planner.c          |   9 ++
 src/include/access/tableam.h                  |  28 ++++-
 src/test/modules/injection_points/Makefile    |   2 +-
 .../expected/cic_reset_snapshots.out          | 107 ++++++++++++++++++
 src/test/modules/injection_points/meson.build |   1 +
 .../sql/cic_reset_snapshots.sql               |  86 ++++++++++++++
 15 files changed, 384 insertions(+), 31 deletions(-)
 create mode 100644 src/test/modules/injection_points/expected/cic_reset_snapshots.out
 create mode 100644 src/test/modules/injection_points/sql/cic_reset_snapshots.sql

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index ffe4f721672..7fb052ce3de 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -689,7 +689,8 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 									 0, /* number of keys */
 									 NULL,	/* scan key */
 									 true,	/* buffer access strategy OK */
-									 true); /* syncscan OK? */
+									 true, /* syncscan OK? */
+									 false);
 
 		/*
 		 * Scan will behave as the first scan of a CREATE INDEX CONCURRENTLY
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 48cb8f59c4f..ff7cc07df99 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -332,7 +332,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 				 errmsg("only heap AM is supported")));
 
 	/* Disable syncscan because we assume we scan from block zero upwards */
-	scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false);
+	scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false, false);
 	hscan = (HeapScanDesc) scan;
 
 	InitDirtySnapshot(SnapshotDirty);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 3aedec882cd..d69859ac4df 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2366,6 +2366,7 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	WalUsage   *walusage;
 	BufferUsage *bufferusage;
 	bool		leaderparticipates = true;
+	bool		need_pop_active_snapshot = true;
 	int			querylen;
 
 #ifdef DISABLE_LEADER_PARTICIPATION
@@ -2391,9 +2392,16 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	 * live according to that.
 	 */
 	if (!isconcurrent)
+	{
+		Assert(ActiveSnapshotSet());
 		snapshot = SnapshotAny;
+		need_pop_active_snapshot = false;
+	}
 	else
+	{
 		snapshot = RegisterSnapshot(GetTransactionSnapshot());
+		PushActiveSnapshot(GetTransactionSnapshot());
+	}
 
 	/*
 	 * Estimate size for our own PARALLEL_KEY_BRIN_SHARED workspace.
@@ -2436,6 +2444,8 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	/* If no DSM segment was available, back out (do serial build) */
 	if (pcxt->seg == NULL)
 	{
+		if (need_pop_active_snapshot)
+			PopActiveSnapshot();
 		if (IsMVCCSnapshot(snapshot))
 			UnregisterSnapshot(snapshot);
 		DestroyParallelContext(pcxt);
@@ -2515,6 +2525,8 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	/* If no workers were successfully launched, back out (do serial build) */
 	if (pcxt->nworkers_launched == 0)
 	{
+		if (need_pop_active_snapshot)
+			PopActiveSnapshot();
 		_brin_end_parallel(brinleader, NULL);
 		return;
 	}
@@ -2531,6 +2543,8 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	 * sure that the failure-to-start case will not hang forever.
 	 */
 	WaitForParallelWorkersToAttach(pcxt);
+	if (need_pop_active_snapshot)
+		PopActiveSnapshot();
 }
 
 /*
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d00300c5dcb..1fdfdf96482 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -51,6 +51,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/spccache.h"
+#include "utils/injection_point.h"
 
 
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
@@ -566,6 +567,36 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 }
 
+/*
+ * Reset the active snapshot during a scan.
+ * This ensures the xmin horizon can advance while maintaining safe tuple visibility.
+ * Note: No other snapshot should be active during this operation.
+ */
+static inline void
+heap_reset_scan_snapshot(TableScanDesc sscan)
+{
+	/* Make sure no other snapshot was set as active. */
+	Assert(GetActiveSnapshot() == sscan->rs_snapshot);
+	/* And make sure active snapshot is not registered. */
+	Assert(GetActiveSnapshot()->regd_count == 0);
+	PopActiveSnapshot();
+
+	sscan->rs_snapshot = InvalidSnapshot; /* just ot be tidy */
+	Assert(!HaveRegisteredOrActiveSnapshot());
+	InvalidateCatalogSnapshot();
+
+	/* Goal of snapshot reset is to allow horizon to advance. */
+	Assert(!TransactionIdIsValid(MyProc->xmin));
+#if USE_INJECTION_POINTS
+	/* In some cases it is still not possible due xid assign. */
+	if (!TransactionIdIsValid(MyProc->xid))
+		INJECTION_POINT("heap_reset_scan_snapshot_effective");
+#endif
+
+	PushActiveSnapshot(GetLatestSnapshot());
+	sscan->rs_snapshot = GetActiveSnapshot();
+}
+
 /*
  * heap_fetch_next_buffer - read and pin the next block from MAIN_FORKNUM.
  *
@@ -607,7 +638,13 @@ heap_fetch_next_buffer(HeapScanDesc scan, ScanDirection dir)
 
 	scan->rs_cbuf = read_stream_next_buffer(scan->rs_read_stream, NULL);
 	if (BufferIsValid(scan->rs_cbuf))
+	{
 		scan->rs_cblock = BufferGetBlockNumber(scan->rs_cbuf);
+#define SO_RESET_SNAPSHOT_EACH_N_PAGE 64
+		if ((scan->rs_base.rs_flags & SO_RESET_SNAPSHOT) &&
+			(scan->rs_cblock % SO_RESET_SNAPSHOT_EACH_N_PAGE == 0))
+			heap_reset_scan_snapshot((TableScanDesc) scan);
+	}
 }
 
 /*
@@ -1233,6 +1270,15 @@ heap_endscan(TableScanDesc sscan)
 	if (scan->rs_parallelworkerdata != NULL)
 		pfree(scan->rs_parallelworkerdata);
 
+	if (scan->rs_base.rs_flags & SO_RESET_SNAPSHOT)
+	{
+		Assert(!(scan->rs_base.rs_flags & SO_TEMP_SNAPSHOT));
+		/* Make sure no other snapshot was set as active. */
+		Assert(GetActiveSnapshot() == sscan->rs_snapshot);
+		/* And make sure snapshot is not registered. */
+		Assert(GetActiveSnapshot()->regd_count == 0);
+	}
+
 	if (scan->rs_base.rs_flags & SO_TEMP_SNAPSHOT)
 		UnregisterSnapshot(scan->rs_base.rs_snapshot);
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index a8d95e0f1c1..980c51e32b9 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1190,6 +1190,8 @@ heapam_index_build_range_scan(Relation heapRelation,
 	ExprContext *econtext;
 	Snapshot	snapshot;
 	bool		need_unregister_snapshot = false;
+	bool		need_pop_active_snapshot = false;
+	bool		reset_snapshots = false;
 	TransactionId OldestXmin;
 	BlockNumber previous_blkno = InvalidBlockNumber;
 	BlockNumber root_blkno = InvalidBlockNumber;
@@ -1224,9 +1226,6 @@ heapam_index_build_range_scan(Relation heapRelation,
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
 
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
-
 	/*
 	 * Prepare for scan of the base relation.  In a normal index build, we use
 	 * SnapshotAny because we must retrieve all tuples and do our own time
@@ -1236,6 +1235,15 @@ heapam_index_build_range_scan(Relation heapRelation,
 	 */
 	OldestXmin = InvalidTransactionId;
 
+	/*
+	 * For unique index we need consistent snapshot for the whole scan.
+	 * In case of parallel scan some additional infrastructure required
+	 * to perform scan with SO_RESET_SNAPSHOT which is not yet ready.
+	 */
+	reset_snapshots = indexInfo->ii_Concurrent &&
+					  !indexInfo->ii_Unique &&
+					  !is_system_catalog; /* just for the case */
+
 	/* okay to ignore lazy VACUUMs here */
 	if (!IsBootstrapProcessingMode() && !indexInfo->ii_Concurrent)
 		OldestXmin = GetOldestNonRemovableTransactionId(heapRelation);
@@ -1244,24 +1252,41 @@ heapam_index_build_range_scan(Relation heapRelation,
 	{
 		/*
 		 * Serial index build.
-		 *
-		 * Must begin our own heap scan in this case.  We may also need to
-		 * register a snapshot whose lifetime is under our direct control.
 		 */
 		if (!TransactionIdIsValid(OldestXmin))
 		{
-			snapshot = RegisterSnapshot(GetTransactionSnapshot());
-			need_unregister_snapshot = true;
+			snapshot = GetTransactionSnapshot();
+			/*
+			 * Must begin our own heap scan in this case.  We may also need to
+			 * register a snapshot whose lifetime is under our direct control.
+			 * In case of resetting of snapshot during the scan registration is
+			 * not allowed because snapshot is going to be changed every so
+			 * often.
+			 */
+			if (!reset_snapshots)
+			{
+				snapshot = RegisterSnapshot(snapshot);
+				need_unregister_snapshot = true;
+			}
+			Assert(!ActiveSnapshotSet());
+			PushActiveSnapshot(snapshot);
+			/* store link to snapshot because it may be copied */
+			snapshot = GetActiveSnapshot();
+			need_pop_active_snapshot = true;
 		}
 		else
+		{
+			Assert(!indexInfo->ii_Concurrent);
 			snapshot = SnapshotAny;
+		}
 
 		scan = table_beginscan_strat(heapRelation,	/* relation */
 									 snapshot,	/* snapshot */
 									 0, /* number of keys */
 									 NULL,	/* scan key */
 									 true,	/* buffer access strategy OK */
-									 allow_sync);	/* syncscan OK? */
+									 allow_sync,	/* syncscan OK? */
+									 reset_snapshots /* reset snapshots? */);
 	}
 	else
 	{
@@ -1275,6 +1300,8 @@ heapam_index_build_range_scan(Relation heapRelation,
 		Assert(!IsBootstrapProcessingMode());
 		Assert(allow_sync);
 		snapshot = scan->rs_snapshot;
+		PushActiveSnapshot(snapshot);
+		need_pop_active_snapshot = true;
 	}
 
 	hscan = (HeapScanDesc) scan;
@@ -1289,6 +1316,13 @@ heapam_index_build_range_scan(Relation heapRelation,
 	Assert(snapshot == SnapshotAny ? TransactionIdIsValid(OldestXmin) :
 		   !TransactionIdIsValid(OldestXmin));
 	Assert(snapshot == SnapshotAny || !anyvisible);
+	Assert(snapshot == SnapshotAny || ActiveSnapshotSet());
+
+	/* Set up execution state for predicate, if any. */
+	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+	/* Clear reference to snapshot since it may be changed by the scan itself. */
+	if (reset_snapshots)
+		snapshot = InvalidSnapshot;
 
 	/* Publish number of blocks to scan */
 	if (progress)
@@ -1724,6 +1758,8 @@ heapam_index_build_range_scan(Relation heapRelation,
 
 	table_endscan(scan);
 
+	if (need_pop_active_snapshot)
+		PopActiveSnapshot();
 	/* we can now forget our snapshot, if set and registered by us */
 	if (need_unregister_snapshot)
 		UnregisterSnapshot(snapshot);
@@ -1796,7 +1832,8 @@ heapam_index_validate_scan(Relation heapRelation,
 								 0, /* number of keys */
 								 NULL,	/* scan key */
 								 true,	/* buffer access strategy OK */
-								 false);	/* syncscan not OK */
+								 false,	/* syncscan not OK */
+								 false);
 	hscan = (HeapScanDesc) scan;
 
 	pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_TOTAL,
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 4b4ebff6a17..a104ba9df74 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -463,7 +463,7 @@ systable_beginscan(Relation heapRelation,
 		 */
 		sysscan->scan = table_beginscan_strat(heapRelation, snapshot,
 											  nkeys, key,
-											  true, false);
+											  true, false, false);
 		sysscan->iscan = NULL;
 	}
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 17a352d040c..5c4581afb1a 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1410,6 +1410,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	WalUsage   *walusage;
 	BufferUsage *bufferusage;
 	bool		leaderparticipates = true;
+	bool		need_pop_active_snapshot = true;
 	int			querylen;
 
 #ifdef DISABLE_LEADER_PARTICIPATION
@@ -1435,9 +1436,16 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	 * live according to that.
 	 */
 	if (!isconcurrent)
+	{
+		Assert(ActiveSnapshotSet());
 		snapshot = SnapshotAny;
+		need_pop_active_snapshot = false;
+	}
 	else
+	{
 		snapshot = RegisterSnapshot(GetTransactionSnapshot());
+		PushActiveSnapshot(snapshot);
+	}
 
 	/*
 	 * Estimate size for our own PARALLEL_KEY_BTREE_SHARED workspace, and
@@ -1491,6 +1499,8 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	/* If no DSM segment was available, back out (do serial build) */
 	if (pcxt->seg == NULL)
 	{
+		if (need_pop_active_snapshot)
+			PopActiveSnapshot();
 		if (IsMVCCSnapshot(snapshot))
 			UnregisterSnapshot(snapshot);
 		DestroyParallelContext(pcxt);
@@ -1585,6 +1595,8 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	/* If no workers were successfully launched, back out (do serial build) */
 	if (pcxt->nworkers_launched == 0)
 	{
+		if (need_pop_active_snapshot)
+			PopActiveSnapshot();
 		_bt_end_parallel(btleader);
 		return;
 	}
@@ -1601,6 +1613,8 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	 * sure that the failure-to-start case will not hang forever.
 	 */
 	WaitForParallelWorkersToAttach(pcxt);
+	if (need_pop_active_snapshot)
+		PopActiveSnapshot();
 }
 
 /*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 05dc6add7eb..e0ada5ce159 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -79,6 +79,7 @@
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
+#include "storage/proc.h"
 
 /* Potentially set by pg_upgrade_support functions */
 Oid			binary_upgrade_next_index_pg_class_oid = InvalidOid;
@@ -1490,8 +1491,8 @@ index_concurrently_build(Oid heapRelationId,
 	Relation	indexRelation;
 	IndexInfo  *indexInfo;
 
-	/* This had better make sure that a snapshot is active */
-	Assert(ActiveSnapshotSet());
+	Assert(!TransactionIdIsValid(MyProc->xmin));
+	Assert(!TransactionIdIsValid(MyProc->xid));
 
 	/* Open and lock the parent heap relation */
 	heapRel = table_open(heapRelationId, ShareUpdateExclusiveLock);
@@ -1509,19 +1510,28 @@ index_concurrently_build(Oid heapRelationId,
 
 	indexRelation = index_open(indexRelationId, RowExclusiveLock);
 
+	/* BuildIndexInfo may require as snapshot for expressions and predicates */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	/*
 	 * We have to re-build the IndexInfo struct, since it was lost in the
 	 * commit of the transaction where this concurrent index was created at
 	 * the catalog level.
 	 */
 	indexInfo = BuildIndexInfo(indexRelation);
+	/* Done with snapshot */
+	PopActiveSnapshot();
 	Assert(!indexInfo->ii_ReadyForInserts);
 	indexInfo->ii_Concurrent = true;
 	indexInfo->ii_BrokenHotChain = false;
+	Assert(!TransactionIdIsValid(MyProc->xmin));
 
 	/* Now build the index */
 	index_build(heapRel, indexRelation, indexInfo, false, true);
 
+	/* Invalidate catalog snapshot just for assert */
+	InvalidateCatalogSnapshot();
+	Assert((indexInfo->ii_ParallelWorkers || indexInfo->ii_Unique) || !TransactionIdIsValid(MyProc->xmin));
+
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
 
@@ -1532,12 +1542,19 @@ index_concurrently_build(Oid heapRelationId,
 	table_close(heapRel, NoLock);
 	index_close(indexRelation, NoLock);
 
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	/*
 	 * Update the pg_index row to mark the index as ready for inserts. Once we
 	 * commit this transaction, any new transactions that open the table must
 	 * insert new entries into the index for insertions and non-HOT updates.
 	 */
 	index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+	/* we can do away with our snapshot */
+	PopActiveSnapshot();
 }
 
 /*
@@ -3205,7 +3222,8 @@ IndexCheckExclusion(Relation heapRelation,
 								 0, /* number of keys */
 								 NULL,	/* scan key */
 								 true,	/* buffer access strategy OK */
-								 true); /* syncscan OK */
+								 true, /* syncscan OK */
+								 false);
 
 	while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 	{
@@ -3268,12 +3286,16 @@ IndexCheckExclusion(Relation heapRelation,
  * as of the start of the scan (see table_index_build_scan), whereas a normal
  * build takes care to include recently-dead tuples.  This is OK because
  * we won't mark the index valid until all transactions that might be able
- * to see those tuples are gone.  The reason for doing that is to avoid
+ * to see those tuples are gone.  One of reasons for doing that is to avoid
  * bogus unique-index failures due to concurrent UPDATEs (we might see
  * different versions of the same row as being valid when we pass over them,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
  * does not contain any tuples added to the table while we built the index.
  *
+ * Furthermore, in case of non-unique index we set SO_RESET_SNAPSHOT for the
+ * scan, which causes new snapshot to be set as active every so often. The reason
+ * for that is to propagate the xmin horizon forward.
+ *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
  * commit the second transaction and start a third.  Again we wait for all
  * transactions that could have been modifying the table to terminate.  Now
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 932854d6c60..6c1fce8ed25 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1670,23 +1670,17 @@ DefineIndex(Oid tableId,
 	 * chains can be created where the new tuple and the old tuple in the
 	 * chain have different index keys.
 	 *
-	 * We now take a new snapshot, and build the index using all tuples that
-	 * are visible in this snapshot.  We can be sure that any HOT updates to
+	 * We build the index using all tuples that are visible using single or
+	 * multiple refreshing snapshots. We can be sure that any HOT updates to
 	 * these tuples will be compatible with the index, since any updates made
 	 * by transactions that didn't know about the index are now committed or
 	 * rolled back.  Thus, each visible tuple is either the end of its
 	 * HOT-chain or the extension of the chain is HOT-safe for this index.
 	 */
 
-	/* Set ActiveSnapshot since functions in the indexes may need it */
-	PushActiveSnapshot(GetTransactionSnapshot());
-
 	/* Perform concurrent build of index */
 	index_concurrently_build(tableId, indexRelationId);
 
-	/* we can do away with our snapshot */
-	PopActiveSnapshot();
-
 	/*
 	 * Commit this transaction to make the indisready update visible.
 	 */
@@ -4084,9 +4078,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		if (newidx->safe)
 			set_indexsafe_procflags();
 
-		/* Set ActiveSnapshot since functions in the indexes may need it */
-		PushActiveSnapshot(GetTransactionSnapshot());
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
@@ -4101,7 +4092,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		/* Perform concurrent build of new index */
 		index_concurrently_build(newidx->tableId, newidx->indexId);
 
-		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f3856c519f6..5c7514c96ac 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -61,6 +61,7 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/selfuncs.h"
+#include "utils/snapmgr.h"
 
 /* GUC parameters */
 double		cursor_tuple_fraction = DEFAULT_CURSOR_TUPLE_FRACTION;
@@ -6779,6 +6780,7 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
 	Relation	heap;
 	Relation	index;
 	RelOptInfo *rel;
+	bool		need_pop_active_snapshot = false;
 	int			parallel_workers;
 	BlockNumber heap_blocks;
 	double		reltuples;
@@ -6834,6 +6836,11 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
 	heap = table_open(tableOid, NoLock);
 	index = index_open(indexOid, NoLock);
 
+	/* Set ActiveSnapshot since functions in the indexes may need it */
+	if (!ActiveSnapshotSet()) {
+		PushActiveSnapshot(GetTransactionSnapshot());
+		need_pop_active_snapshot = true;
+	}
 	/*
 	 * Determine if it's safe to proceed.
 	 *
@@ -6891,6 +6898,8 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
 		parallel_workers--;
 
 done:
+	if (need_pop_active_snapshot)
+		PopActiveSnapshot();
 	index_close(index, NoLock);
 	table_close(heap, NoLock);
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index adb478a93ca..f4c7d2a92bf 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -24,6 +24,7 @@
 #include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
+#include "utils/injection_point.h"
 
 
 #define DEFAULT_TABLE_ACCESS_METHOD	"heap"
@@ -69,6 +70,17 @@ typedef enum ScanOptions
 	 * needed. If table data may be needed, set SO_NEED_TUPLES.
 	 */
 	SO_NEED_TUPLES = 1 << 10,
+	/*
+	 * Reset scan and catalog snapshot every so often? If so, each
+	 * SO_RESET_SNAPSHOT_EACH_N_PAGE pages active snapshot is popped,
+	 * catalog snapshot invalidated, latest snapshot pushed as active.
+	 *
+	 * At the end of the scan snapshot is not popped.
+	 * Goal of such mode is keep xmin propagating horizon forward.
+	 *
+	 * see heap_reset_scan_snapshot for details.
+	 */
+	SO_RESET_SNAPSHOT = 1 << 11,
 }			ScanOptions;
 
 /*
@@ -935,7 +947,8 @@ extern TableScanDesc table_beginscan_catalog(Relation relation, int nkeys,
 static inline TableScanDesc
 table_beginscan_strat(Relation rel, Snapshot snapshot,
 					  int nkeys, struct ScanKeyData *key,
-					  bool allow_strat, bool allow_sync)
+					  bool allow_strat, bool allow_sync,
+					  bool reset_snapshot)
 {
 	uint32		flags = SO_TYPE_SEQSCAN | SO_ALLOW_PAGEMODE;
 
@@ -943,6 +956,15 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 		flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
+	if (reset_snapshot)
+	{
+		INJECTION_POINT("table_beginscan_strat_reset_snapshots");
+		/* Active snapshot is required on start. */
+		Assert(GetActiveSnapshot() == snapshot);
+		/* Active snapshot should not be registered to keep xmin propagating. */
+		Assert(GetActiveSnapshot()->regd_count == 0);
+		flags |= (SO_RESET_SNAPSHOT);
+	}
 
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -1779,6 +1801,10 @@ table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
  * very hard to detect whether they're really incompatible with the chain tip.
  * This only really makes sense for heap AM, it might need to be generalized
  * for other AMs later.
+ *
+ * In case of non-unique index and non-parallel concurrent build
+ * SO_RESET_SNAPSHOT is applied for the scan. That leads for changing snapshots
+ * on the fly to allow xmin horizon propagate.
  */
 static inline double
 table_index_build_scan(Relation table_rel,
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index f8f86e8f3b6..73893d351bb 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -10,7 +10,7 @@ EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
 
-REGRESS = injection_points reindex_conc
+REGRESS = injection_points reindex_conc cic_reset_snapshots
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
 ISOLATION = basic inplace \
diff --git a/src/test/modules/injection_points/expected/cic_reset_snapshots.out b/src/test/modules/injection_points/expected/cic_reset_snapshots.out
new file mode 100644
index 00000000000..5db54530f17
--- /dev/null
+++ b/src/test/modules/injection_points/expected/cic_reset_snapshots.out
@@ -0,0 +1,107 @@
+CREATE EXTENSION injection_points;
+SELECT injection_points_set_local();
+ injection_points_set_local 
+----------------------------
+ 
+(1 row)
+
+SELECT injection_points_attach('heap_reset_scan_snapshot_effective', 'notice');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
+SELECT injection_points_attach('table_beginscan_strat_reset_snapshots', 'notice');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
+CREATE SCHEMA cic_reset_snap;
+CREATE TABLE cic_reset_snap.tbl(i int primary key, j int);
+INSERT INTO cic_reset_snap.tbl SELECT i, i * I FROM generate_series(1, 200) s(i);
+CREATE FUNCTION cic_reset_snap.predicate_stable(integer) RETURNS bool IMMUTABLE
+									  LANGUAGE plpgsql AS $$
+BEGIN
+    EXECUTE 'SELECT txid_current()';
+    RETURN MOD($1, 2) = 0;
+END; $$;
+CREATE FUNCTION cic_reset_snap.predicate_stable_no_param() RETURNS bool IMMUTABLE
+									  LANGUAGE plpgsql AS $$
+BEGIN
+    EXECUTE 'SELECT txid_current()';
+    RETURN false;
+END; $$;
+----------------
+ALTER TABLE cic_reset_snap.tbl SET (parallel_workers=0);
+CREATE UNIQUE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(MOD(i, 2), j) WHERE MOD(i, 2) = 0;
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable(i);
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable_no_param();
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl USING BRIN(i);
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+-- The same in parallel mode
+ALTER TABLE cic_reset_snap.tbl SET (parallel_workers=2);
+CREATE UNIQUE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(MOD(i, 2), j) WHERE MOD(i, 2) = 0;
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable(i);
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable_no_param();
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i DESC NULLS LAST);
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl USING BRIN(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP SCHEMA cic_reset_snap CASCADE;
+NOTICE:  drop cascades to 3 other objects
+DETAIL:  drop cascades to table cic_reset_snap.tbl
+drop cascades to function cic_reset_snap.predicate_stable(integer)
+drop cascades to function cic_reset_snap.predicate_stable_no_param()
+DROP EXTENSION injection_points;
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 91fc8ce687f..f288633da4f 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -35,6 +35,7 @@ tests += {
     'sql': [
       'injection_points',
       'reindex_conc',
+      'cic_reset_snapshots',
     ],
     'regress_args': ['--dlpath', meson.build_root() / 'src/test/regress'],
     # The injection points are cluster-wide, so disable installcheck
diff --git a/src/test/modules/injection_points/sql/cic_reset_snapshots.sql b/src/test/modules/injection_points/sql/cic_reset_snapshots.sql
new file mode 100644
index 00000000000..5072535b355
--- /dev/null
+++ b/src/test/modules/injection_points/sql/cic_reset_snapshots.sql
@@ -0,0 +1,86 @@
+CREATE EXTENSION injection_points;
+
+SELECT injection_points_set_local();
+SELECT injection_points_attach('heap_reset_scan_snapshot_effective', 'notice');
+SELECT injection_points_attach('table_beginscan_strat_reset_snapshots', 'notice');
+
+
+CREATE SCHEMA cic_reset_snap;
+CREATE TABLE cic_reset_snap.tbl(i int primary key, j int);
+INSERT INTO cic_reset_snap.tbl SELECT i, i * I FROM generate_series(1, 200) s(i);
+
+CREATE FUNCTION cic_reset_snap.predicate_stable(integer) RETURNS bool IMMUTABLE
+									  LANGUAGE plpgsql AS $$
+BEGIN
+    EXECUTE 'SELECT txid_current()';
+    RETURN MOD($1, 2) = 0;
+END; $$;
+
+CREATE FUNCTION cic_reset_snap.predicate_stable_no_param() RETURNS bool IMMUTABLE
+									  LANGUAGE plpgsql AS $$
+BEGIN
+    EXECUTE 'SELECT txid_current()';
+    RETURN false;
+END; $$;
+
+----------------
+ALTER TABLE cic_reset_snap.tbl SET (parallel_workers=0);
+
+CREATE UNIQUE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(MOD(i, 2), j) WHERE MOD(i, 2) = 0;
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable_no_param();
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl USING BRIN(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+-- The same in parallel mode
+ALTER TABLE cic_reset_snap.tbl SET (parallel_workers=2);
+
+CREATE UNIQUE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(MOD(i, 2), j) WHERE MOD(i, 2) = 0;
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable_no_param();
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i DESC NULLS LAST);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl USING BRIN(i);
+REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+DROP INDEX CONCURRENTLY cic_reset_snap.idx;
+
+DROP SCHEMA cic_reset_snap CASCADE;
+
+DROP EXTENSION injection_points;
-- 
2.43.0



  [application/octet-stream] v7-0002-Add-stress-tests-for-concurrent-index-operations.patch (6.5K, 5-v7-0002-Add-stress-tests-for-concurrent-index-operations.patch)
  download | inline diff:
From b4f22a1da4bbbff6a268c0f62196a264cb126896 Mon Sep 17 00:00:00 2001
From: nkey <[email protected]>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v7 2/6] Add stress tests for concurrent index operations

Add comprehensive stress tests for concurrent index operations, focusing on:
* Testing CREATE/REINDEX/DROP INDEX CONCURRENTLY under heavy write load
* Verifying index integrity during concurrent HOT updates
* Testing various index types including unique and partial indexes
* Validating index correctness using amcheck
* Exercising parallel worker configurations

These stress tests help ensure reliability of concurrent index operations
under heavy load conditions.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 144 ++++++++++++++++++++++++++++++++
 2 files changed, 145 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 292b33eb094..4a8f4fbc8b0 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..142e8fb845e
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,144 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp)));
+$node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);));
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=2500',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY idx_2 ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY idx_2 ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY idx_2 ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY idx_2 ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY idx_2 ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY idx_2 ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\sleep 10 ms
+					SELECT bt_index_check('idx_2', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY idx_2;
+					\sleep 10 ms
+					SELECT bt_index_check('idx_2', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY idx_2;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=2500',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops_unique_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE UNIQUE INDEX CONCURRENTLY idx_2 ON tbl(i);
+					\sleep 10 ms
+					SELECT bt_index_check('idx_2', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY idx_2;
+					\sleep 10 ms
+					SELECT bt_index_check('idx_2', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY idx_2;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->stop;
+done_testing();
\ No newline at end of file
-- 
2.43.0



  [application/octet-stream] v7-0004-Allow-snapshot-resets-during-parallel-concurrent-.patch (29.5K, 6-v7-0004-Allow-snapshot-resets-during-parallel-concurrent-.patch)
  download | inline diff:
From 1a2a8cc969011974913c22604d608a0d9c4ffa78 Mon Sep 17 00:00:00 2001
From: nkey <[email protected]>
Date: Mon, 2 Dec 2024 01:33:21 +0100
Subject: [PATCH v7 4/6] Allow snapshot resets during parallel concurrent index
 builds

Previously, non-unique concurrent index builds in parallel mode required a
consistent MVCC snapshot throughout the build, which could hold back the xmin
horizon and prevent dead tuple cleanup. This patch extends the previous work
on snapshot resets (introduced for non-parallel builds) to also support
parallel builds.

Key changes:
- Add infrastructure to track snapshot restoration in parallel workers
- Extend parallel scan initialization to support periodic snapshot resets
- Wait for parallel workers to restore their initial snapshots before
  proceeding with scan
- Add regression tests to verify behavior with various index types

The snapshot reset approach is safe for non-unique indexes since they don't
need snapshot consistency across the entire scan. For unique indexes, we
continue to maintain a consistent snapshot to properly enforce uniqueness
constraints.

This helps reduce the xmin horizon impact of long-running concurrent index
builds in parallel mode, improving VACUUM's ability to clean up dead tuples.
---
 src/backend/access/brin/brin.c                | 43 +++++++++-------
 src/backend/access/heap/heapam_handler.c      | 12 +++--
 src/backend/access/nbtree/nbtsort.c           | 38 ++++++++++++--
 src/backend/access/table/tableam.c            | 37 ++++++++++++--
 src/backend/access/transam/parallel.c         | 50 +++++++++++++++++--
 src/backend/executor/nodeSeqscan.c            |  3 +-
 src/backend/utils/time/snapmgr.c              |  8 ---
 src/include/access/parallel.h                 |  3 +-
 src/include/access/relscan.h                  |  1 +
 src/include/access/tableam.h                  |  9 ++--
 .../expected/cic_reset_snapshots.out          | 23 ++++++++-
 .../sql/cic_reset_snapshots.sql               |  7 ++-
 12 files changed, 178 insertions(+), 56 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index d69859ac4df..0782bd64a6a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -143,7 +143,6 @@ typedef struct BrinLeader
 	 */
 	BrinShared *brinshared;
 	Sharedsort *sharedsort;
-	Snapshot	snapshot;
 	WalUsage   *walusage;
 	BufferUsage *bufferusage;
 } BrinLeader;
@@ -231,7 +230,7 @@ static void brin_fill_empty_ranges(BrinBuildState *state,
 static void _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 								 bool isconcurrent, int request);
 static void _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state);
-static Size _brin_parallel_estimate_shared(Relation heap, Snapshot snapshot);
+static Size _brin_parallel_estimate_shared(Relation heap);
 static double _brin_parallel_heapscan(BrinBuildState *state);
 static double _brin_parallel_merge(BrinBuildState *state);
 static void _brin_leader_participate_as_worker(BrinBuildState *buildstate,
@@ -2357,7 +2356,6 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 {
 	ParallelContext *pcxt;
 	int			scantuplesortstates;
-	Snapshot	snapshot;
 	Size		estbrinshared;
 	Size		estsort;
 	BrinShared *brinshared;
@@ -2367,6 +2365,7 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	BufferUsage *bufferusage;
 	bool		leaderparticipates = true;
 	bool		need_pop_active_snapshot = true;
+	bool		wait_for_snapshot_attach;
 	int			querylen;
 
 #ifdef DISABLE_LEADER_PARTICIPATION
@@ -2388,25 +2387,25 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	 * Prepare for scan of the base relation.  In a normal index build, we use
 	 * SnapshotAny because we must retrieve all tuples and do our own time
 	 * qual checks (because we have to index RECENTLY_DEAD tuples).  In a
-	 * concurrent build, we take a regular MVCC snapshot and index whatever's
-	 * live according to that.
+	 * concurrent build, we take a regular MVCC snapshot and push it as active.
+	 * Later we index whatever's live according to that snapshot while that
+	 * snapshot is reset periodically.
 	 */
 	if (!isconcurrent)
 	{
 		Assert(ActiveSnapshotSet());
-		snapshot = SnapshotAny;
 		need_pop_active_snapshot = false;
 	}
 	else
 	{
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
+		Assert(!ActiveSnapshotSet());
 		PushActiveSnapshot(GetTransactionSnapshot());
 	}
 
 	/*
 	 * Estimate size for our own PARALLEL_KEY_BRIN_SHARED workspace.
 	 */
-	estbrinshared = _brin_parallel_estimate_shared(heap, snapshot);
+	estbrinshared = _brin_parallel_estimate_shared(heap);
 	shm_toc_estimate_chunk(&pcxt->estimator, estbrinshared);
 	estsort = tuplesort_estimate_shared(scantuplesortstates);
 	shm_toc_estimate_chunk(&pcxt->estimator, estsort);
@@ -2446,8 +2445,6 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	{
 		if (need_pop_active_snapshot)
 			PopActiveSnapshot();
-		if (IsMVCCSnapshot(snapshot))
-			UnregisterSnapshot(snapshot);
 		DestroyParallelContext(pcxt);
 		ExitParallelMode();
 		return;
@@ -2472,7 +2469,8 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 
 	table_parallelscan_initialize(heap,
 								  ParallelTableScanFromBrinShared(brinshared),
-								  snapshot);
+								  isconcurrent ? InvalidSnapshot : SnapshotAny,
+								  isconcurrent);
 
 	/*
 	 * Store shared tuplesort-private state, for which we reserved space.
@@ -2518,7 +2516,6 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 		brinleader->nparticipanttuplesorts++;
 	brinleader->brinshared = brinshared;
 	brinleader->sharedsort = sharedsort;
-	brinleader->snapshot = snapshot;
 	brinleader->walusage = walusage;
 	brinleader->bufferusage = bufferusage;
 
@@ -2534,6 +2531,16 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	/* Save leader state now that it's clear build will be parallel */
 	buildstate->bs_leader = brinleader;
 
+	/*
+	 * In case of concurrent build snapshots are going to be reset periodically.
+	 * In case when leader going to reset own active snapshot as well - we need to
+	 * wait until all workers imported initial snapshot.
+	 */
+	wait_for_snapshot_attach = isconcurrent && leaderparticipates;
+
+	if (wait_for_snapshot_attach)
+		WaitForParallelWorkersToAttach(pcxt, true);
+
 	/* Join heap scan ourselves */
 	if (leaderparticipates)
 		_brin_leader_participate_as_worker(buildstate, heap, index);
@@ -2542,7 +2549,8 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	 * Caller needs to wait for all launched workers when we return.  Make
 	 * sure that the failure-to-start case will not hang forever.
 	 */
-	WaitForParallelWorkersToAttach(pcxt);
+	if (!wait_for_snapshot_attach)
+		WaitForParallelWorkersToAttach(pcxt, false);
 	if (need_pop_active_snapshot)
 		PopActiveSnapshot();
 }
@@ -2565,9 +2573,6 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 	for (i = 0; i < brinleader->pcxt->nworkers_launched; i++)
 		InstrAccumParallelQuery(&brinleader->bufferusage[i], &brinleader->walusage[i]);
 
-	/* Free last reference to MVCC snapshot, if one was used */
-	if (IsMVCCSnapshot(brinleader->snapshot))
-		UnregisterSnapshot(brinleader->snapshot);
 	DestroyParallelContext(brinleader->pcxt);
 	ExitParallelMode();
 }
@@ -2767,14 +2772,14 @@ _brin_parallel_merge(BrinBuildState *state)
 
 /*
  * Returns size of shared memory required to store state for a parallel
- * brin index build based on the snapshot its parallel scan will use.
+ * brin index build.
  */
 static Size
-_brin_parallel_estimate_shared(Relation heap, Snapshot snapshot)
+_brin_parallel_estimate_shared(Relation heap)
 {
 	/* c.f. shm_toc_allocate as to why BUFFERALIGN is used */
 	return add_size(BUFFERALIGN(sizeof(BrinShared)),
-					table_parallelscan_estimate(heap, snapshot));
+					table_parallelscan_estimate(heap, InvalidSnapshot));
 }
 
 /*
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 980c51e32b9..2e5163609c1 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1231,14 +1231,13 @@ heapam_index_build_range_scan(Relation heapRelation,
 	 * SnapshotAny because we must retrieve all tuples and do our own time
 	 * qual checks (because we have to index RECENTLY_DEAD tuples). In a
 	 * concurrent build, or during bootstrap, we take a regular MVCC snapshot
-	 * and index whatever's live according to that.
+	 * and index whatever's live according to that while that snapshot is reset
+	 * every so often (in case of non-unique index).
 	 */
 	OldestXmin = InvalidTransactionId;
 
 	/*
 	 * For unique index we need consistent snapshot for the whole scan.
-	 * In case of parallel scan some additional infrastructure required
-	 * to perform scan with SO_RESET_SNAPSHOT which is not yet ready.
 	 */
 	reset_snapshots = indexInfo->ii_Concurrent &&
 					  !indexInfo->ii_Unique &&
@@ -1300,8 +1299,11 @@ heapam_index_build_range_scan(Relation heapRelation,
 		Assert(!IsBootstrapProcessingMode());
 		Assert(allow_sync);
 		snapshot = scan->rs_snapshot;
-		PushActiveSnapshot(snapshot);
-		need_pop_active_snapshot = true;
+		if (!reset_snapshots)
+		{
+			PushActiveSnapshot(snapshot);
+			need_pop_active_snapshot = true;
+		}
 	}
 
 	hscan = (HeapScanDesc) scan;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 5c4581afb1a..2acbf121745 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1411,6 +1411,8 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	BufferUsage *bufferusage;
 	bool		leaderparticipates = true;
 	bool		need_pop_active_snapshot = true;
+	bool		reset_snapshot;
+	bool		wait_for_snapshot_attach;
 	int			querylen;
 
 #ifdef DISABLE_LEADER_PARTICIPATION
@@ -1428,12 +1430,21 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 
 	scantuplesortstates = leaderparticipates ? request + 1 : request;
 
+    /*
+	 * For concurrent non-unique index builds, we can periodically reset snapshots
+	 * to allow the xmin horizon to advance. This is safe since these builds don't
+	 * require a consistent view across the entire scan. Unique indexes still need
+	 * a stable snapshot to properly enforce uniqueness constraints.
+     */
+	reset_snapshot = isconcurrent && !btspool->isunique;
+
 	/*
 	 * Prepare for scan of the base relation.  In a normal index build, we use
 	 * SnapshotAny because we must retrieve all tuples and do our own time
 	 * qual checks (because we have to index RECENTLY_DEAD tuples).  In a
 	 * concurrent build, we take a regular MVCC snapshot and index whatever's
-	 * live according to that.
+	 * live according to that, while that snapshot may be reset periodically in
+	 * case of non-unique index.
 	 */
 	if (!isconcurrent)
 	{
@@ -1441,6 +1452,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 		snapshot = SnapshotAny;
 		need_pop_active_snapshot = false;
 	}
+	else if (reset_snapshot)
+	{
+		snapshot = InvalidSnapshot;
+		PushActiveSnapshot(GetTransactionSnapshot());
+	}
 	else
 	{
 		snapshot = RegisterSnapshot(GetTransactionSnapshot());
@@ -1501,7 +1517,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	{
 		if (need_pop_active_snapshot)
 			PopActiveSnapshot();
-		if (IsMVCCSnapshot(snapshot))
+		if (snapshot != InvalidSnapshot && IsMVCCSnapshot(snapshot))
 			UnregisterSnapshot(snapshot);
 		DestroyParallelContext(pcxt);
 		ExitParallelMode();
@@ -1528,7 +1544,8 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	btshared->brokenhotchain = false;
 	table_parallelscan_initialize(btspool->heap,
 								  ParallelTableScanFromBTShared(btshared),
-								  snapshot);
+								  snapshot,
+								  reset_snapshot);
 
 	/*
 	 * Store shared tuplesort-private state, for which we reserved space.
@@ -1604,6 +1621,16 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	/* Save leader state now that it's clear build will be parallel */
 	buildstate->btleader = btleader;
 
+	/*
+	 * In case of concurrent build snapshots are going to be reset periodically.
+	 * In case when leader going to reset own active snapshot as well - we need to
+	 * wait until all workers imported initial snapshot.
+	 */
+	wait_for_snapshot_attach = reset_snapshot && leaderparticipates;
+
+	if (wait_for_snapshot_attach)
+		WaitForParallelWorkersToAttach(pcxt, true);
+
 	/* Join heap scan ourselves */
 	if (leaderparticipates)
 		_bt_leader_participate_as_worker(buildstate);
@@ -1612,7 +1639,8 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	 * Caller needs to wait for all launched workers when we return.  Make
 	 * sure that the failure-to-start case will not hang forever.
 	 */
-	WaitForParallelWorkersToAttach(pcxt);
+	if (!wait_for_snapshot_attach)
+		WaitForParallelWorkersToAttach(pcxt, false);
 	if (need_pop_active_snapshot)
 		PopActiveSnapshot();
 }
@@ -1636,7 +1664,7 @@ _bt_end_parallel(BTLeader *btleader)
 		InstrAccumParallelQuery(&btleader->bufferusage[i], &btleader->walusage[i]);
 
 	/* Free last reference to MVCC snapshot, if one was used */
-	if (IsMVCCSnapshot(btleader->snapshot))
+	if (btleader->snapshot != InvalidSnapshot && IsMVCCSnapshot(btleader->snapshot))
 		UnregisterSnapshot(btleader->snapshot);
 	DestroyParallelContext(btleader->pcxt);
 	ExitParallelMode();
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index bd8715b6797..cac7a9ea88a 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -131,10 +131,10 @@ table_parallelscan_estimate(Relation rel, Snapshot snapshot)
 {
 	Size		sz = 0;
 
-	if (IsMVCCSnapshot(snapshot))
+	if (snapshot != InvalidSnapshot && IsMVCCSnapshot(snapshot))
 		sz = add_size(sz, EstimateSnapshotSpace(snapshot));
 	else
-		Assert(snapshot == SnapshotAny);
+		Assert(snapshot == SnapshotAny || snapshot == InvalidSnapshot);
 
 	sz = add_size(sz, rel->rd_tableam->parallelscan_estimate(rel));
 
@@ -143,21 +143,36 @@ table_parallelscan_estimate(Relation rel, Snapshot snapshot)
 
 void
 table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
-							  Snapshot snapshot)
+							  Snapshot snapshot, bool reset_snapshot)
 {
 	Size		snapshot_off = rel->rd_tableam->parallelscan_initialize(rel, pscan);
 
 	pscan->phs_snapshot_off = snapshot_off;
 
-	if (IsMVCCSnapshot(snapshot))
+	/*
+	 * Initialize parallel scan description. For normal scans with a regular
+	 * MVCC snapshot, serialize the snapshot info. For scans that use periodic
+	 * snapshot resets, mark the scan accordingly.
+	 */
+	if (reset_snapshot)
+	{
+		Assert(snapshot == InvalidSnapshot);
+		pscan->phs_snapshot_any = false;
+		pscan->phs_reset_snapshot = true;
+		INJECTION_POINT("table_parallelscan_initialize");
+	}
+	else if (IsMVCCSnapshot(snapshot))
 	{
 		SerializeSnapshot(snapshot, (char *) pscan + pscan->phs_snapshot_off);
 		pscan->phs_snapshot_any = false;
+		pscan->phs_reset_snapshot = false;
 	}
 	else
 	{
 		Assert(snapshot == SnapshotAny);
+		Assert(!reset_snapshot);
 		pscan->phs_snapshot_any = true;
+		pscan->phs_reset_snapshot = false;
 	}
 }
 
@@ -170,7 +185,19 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
-	if (!pscan->phs_snapshot_any)
+	/*
+	 * For scans that
+	 * use periodic snapshot resets, mark the scan accordingly and use the active
+	 * snapshot as the initial state.
+	 */
+	if (pscan->phs_reset_snapshot)
+	{
+		Assert(ActiveSnapshotSet());
+		flags |= SO_RESET_SNAPSHOT;
+		/* Start with current active snapshot. */
+		snapshot = GetActiveSnapshot();
+	}
+	else if (!pscan->phs_snapshot_any)
 	{
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 0a1e089ec1d..d49c6ee410f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -76,6 +76,7 @@
 #define PARALLEL_KEY_RELMAPPER_STATE		UINT64CONST(0xFFFFFFFFFFFF000D)
 #define PARALLEL_KEY_UNCOMMITTEDENUMS		UINT64CONST(0xFFFFFFFFFFFF000E)
 #define PARALLEL_KEY_CLIENTCONNINFO			UINT64CONST(0xFFFFFFFFFFFF000F)
+#define PARALLEL_KEY_SNAPSHOT_RESTORED		UINT64CONST(0xFFFFFFFFFFFF0010)
 
 /* Fixed-size parallel state. */
 typedef struct FixedParallelState
@@ -301,6 +302,10 @@ InitializeParallelDSM(ParallelContext *pcxt)
 										pcxt->nworkers));
 		shm_toc_estimate_keys(&pcxt->estimator, 1);
 
+		shm_toc_estimate_chunk(&pcxt->estimator, mul_size(sizeof(bool),
+							   pcxt->nworkers));
+		shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 		/* Estimate how much we'll need for the entrypoint info. */
 		shm_toc_estimate_chunk(&pcxt->estimator, strlen(pcxt->library_name) +
 							   strlen(pcxt->function_name) + 2);
@@ -372,6 +377,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		char	   *entrypointstate;
 		char	   *uncommittedenumsspace;
 		char	   *clientconninfospace;
+		bool	   *snapshot_set_flag_space;
 		Size		lnamelen;
 
 		/* Serialize shared libraries we have loaded. */
@@ -487,6 +493,19 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		strcpy(entrypointstate, pcxt->library_name);
 		strcpy(entrypointstate + lnamelen + 1, pcxt->function_name);
 		shm_toc_insert(pcxt->toc, PARALLEL_KEY_ENTRYPOINT, entrypointstate);
+
+		/*
+		 * Establish dynamic shared memory to pass information about importing
+		 * of snapshot.
+		 */
+		snapshot_set_flag_space =
+				shm_toc_allocate(pcxt->toc, mul_size(sizeof(bool), pcxt->nworkers));
+		for (i = 0; i < pcxt->nworkers; ++i)
+		{
+			pcxt->worker[i].snapshot_restored = snapshot_set_flag_space + i * sizeof(bool);
+			*pcxt->worker[i].snapshot_restored = false;
+		}
+		shm_toc_insert(pcxt->toc, PARALLEL_KEY_SNAPSHOT_RESTORED, snapshot_set_flag_space);
 	}
 
 	/* Update nworkers_to_launch, in case we changed nworkers above. */
@@ -542,6 +561,17 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 			pcxt->worker[i].error_mqh = shm_mq_attach(mq, pcxt->seg, NULL);
 		}
 	}
+
+	/* Set snapshot restored flag to false. */
+	if (pcxt->nworkers > 0)
+	{
+		bool	   *snapshot_restored_space;
+		int			i;
+		snapshot_restored_space =
+				shm_toc_lookup(pcxt->toc, PARALLEL_KEY_SNAPSHOT_RESTORED, false);
+		for (i = 0; i < pcxt->nworkers; ++i)
+			snapshot_restored_space[i] = false;
+	}
 }
 
 /*
@@ -657,6 +687,10 @@ LaunchParallelWorkers(ParallelContext *pcxt)
  * Wait for all workers to attach to their error queues, and throw an error if
  * any worker fails to do this.
  *
+ * wait_for_snapshot: track whether each parallel worker has successfully restored
+ * its snapshot. This is needed when using periodic snapshot resets to ensure all
+ * workers have a valid initial snapshot before proceeding with the scan.
+ *
  * Callers can assume that if this function returns successfully, then the
  * number of workers given by pcxt->nworkers_launched have initialized and
  * attached to their error queues.  Whether or not these workers are guaranteed
@@ -686,7 +720,7 @@ LaunchParallelWorkers(ParallelContext *pcxt)
  * call this function at all.
  */
 void
-WaitForParallelWorkersToAttach(ParallelContext *pcxt)
+WaitForParallelWorkersToAttach(ParallelContext *pcxt, bool wait_for_snapshot)
 {
 	int			i;
 
@@ -730,9 +764,12 @@ WaitForParallelWorkersToAttach(ParallelContext *pcxt)
 				mq = shm_mq_get_queue(pcxt->worker[i].error_mqh);
 				if (shm_mq_get_sender(mq) != NULL)
 				{
-					/* Yes, so it is known to be attached. */
-					pcxt->known_attached_workers[i] = true;
-					++pcxt->nknown_attached_workers;
+					if (!wait_for_snapshot || *(pcxt->worker[i].snapshot_restored))
+					{
+						/* Yes, so it is known to be attached. */
+						pcxt->known_attached_workers[i] = true;
+						++pcxt->nknown_attached_workers;
+					}
 				}
 			}
 			else if (status == BGWH_STOPPED)
@@ -1291,6 +1328,7 @@ ParallelWorkerMain(Datum main_arg)
 	shm_toc    *toc;
 	FixedParallelState *fps;
 	char	   *error_queue_space;
+	bool	   *snapshot_restored_space;
 	shm_mq	   *mq;
 	shm_mq_handle *mqh;
 	char	   *libraryspace;
@@ -1489,6 +1527,10 @@ ParallelWorkerMain(Datum main_arg)
 							   fps->parallel_leader_pgproc);
 	PushActiveSnapshot(asnapshot);
 
+	/* Snapshot is restored, set flag to make leader know about it. */
+	snapshot_restored_space = shm_toc_lookup(toc, PARALLEL_KEY_SNAPSHOT_RESTORED, false);
+	snapshot_restored_space[ParallelWorkerNumber] = true;
+
 	/*
 	 * We've changed which tuples we can see, and must therefore invalidate
 	 * system caches.
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 7cb12a11c2d..2907b366791 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -262,7 +262,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
-								  estate->es_snapshot);
+								  estate->es_snapshot,
+								  false);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 2189bf0d9ae..b3cc7a2c150 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -287,14 +287,6 @@ GetTransactionSnapshot(void)
 Snapshot
 GetLatestSnapshot(void)
 {
-	/*
-	 * We might be able to relax this, but nothing that could otherwise work
-	 * needs it.
-	 */
-	if (IsInParallelMode())
-		elog(ERROR,
-			 "cannot update SecondarySnapshot during a parallel operation");
-
 	/*
 	 * So far there are no cases requiring support for GetLatestSnapshot()
 	 * during logical decoding, but it wouldn't be hard to add if required.
diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
index 69ffe5498f9..964a7e945be 100644
--- a/src/include/access/parallel.h
+++ b/src/include/access/parallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelWorkerInfo
 {
 	BackgroundWorkerHandle *bgwhandle;
 	shm_mq_handle *error_mqh;
+	bool		  *snapshot_restored;
 } ParallelWorkerInfo;
 
 typedef struct ParallelContext
@@ -65,7 +66,7 @@ extern void InitializeParallelDSM(ParallelContext *pcxt);
 extern void ReinitializeParallelDSM(ParallelContext *pcxt);
 extern void ReinitializeParallelWorkers(ParallelContext *pcxt, int nworkers_to_launch);
 extern void LaunchParallelWorkers(ParallelContext *pcxt);
-extern void WaitForParallelWorkersToAttach(ParallelContext *pcxt);
+extern void WaitForParallelWorkersToAttach(ParallelContext *pcxt, bool wait_for_snapshot);
 extern void WaitForParallelWorkersToFinish(ParallelContext *pcxt);
 extern void DestroyParallelContext(ParallelContext *pcxt);
 extern bool ParallelContextActive(void);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index e1884acf493..a9603084aeb 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -88,6 +88,7 @@ typedef struct ParallelTableScanDescData
 	RelFileLocator phs_locator; /* physical relation to scan */
 	bool		phs_syncscan;	/* report location to syncscan logic? */
 	bool		phs_snapshot_any;	/* SnapshotAny, not phs_snapshot_data? */
+	bool		phs_reset_snapshot; /* use SO_RESET_SNAPSHOT? */
 	Size		phs_snapshot_off;	/* data for snapshot */
 } ParallelTableScanDescData;
 typedef struct ParallelTableScanDescData *ParallelTableScanDesc;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f4c7d2a92bf..9ee5ea15fd4 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -1184,7 +1184,8 @@ extern Size table_parallelscan_estimate(Relation rel, Snapshot snapshot);
  */
 extern void table_parallelscan_initialize(Relation rel,
 										  ParallelTableScanDesc pscan,
-										  Snapshot snapshot);
+										  Snapshot snapshot,
+										  bool reset_snapshot);
 
 /*
  * Begin a parallel scan. `pscan` needs to have been initialized with
@@ -1802,9 +1803,9 @@ table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
  * This only really makes sense for heap AM, it might need to be generalized
  * for other AMs later.
  *
- * In case of non-unique index and non-parallel concurrent build
- * SO_RESET_SNAPSHOT is applied for the scan. That leads for changing snapshots
- * on the fly to allow xmin horizon propagate.
+ * In case of non-unique concurrent  index build SO_RESET_SNAPSHOT is applied
+ * for the scan. That leads for changing snapshots on the fly to allow xmin
+ * horizon propagate.
  */
 static inline double
 table_index_build_scan(Relation table_rel,
diff --git a/src/test/modules/injection_points/expected/cic_reset_snapshots.out b/src/test/modules/injection_points/expected/cic_reset_snapshots.out
index 5db54530f17..595a4000ce0 100644
--- a/src/test/modules/injection_points/expected/cic_reset_snapshots.out
+++ b/src/test/modules/injection_points/expected/cic_reset_snapshots.out
@@ -17,6 +17,12 @@ SELECT injection_points_attach('table_beginscan_strat_reset_snapshots', 'notice'
  
 (1 row)
 
+SELECT injection_points_attach('table_parallelscan_initialize', 'notice');
+ injection_points_attach 
+-------------------------
+ 
+(1 row)
+
 CREATE SCHEMA cic_reset_snap;
 CREATE TABLE cic_reset_snap.tbl(i int primary key, j int);
 INSERT INTO cic_reset_snap.tbl SELECT i, i * I FROM generate_series(1, 200) s(i);
@@ -72,24 +78,35 @@ NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 -- The same in parallel mode
 ALTER TABLE cic_reset_snap.tbl SET (parallel_workers=2);
+-- Detach to keep test stable, since parallel worker may complete scan before leader
+SELECT injection_points_detach('heap_reset_scan_snapshot_effective');
+ injection_points_detach 
+-------------------------
+ 
+(1 row)
+
 CREATE UNIQUE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(MOD(i, 2), j) WHERE MOD(i, 2) = 0;
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable(i);
 NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
-NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
 NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
-NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i, j) WHERE cic_reset_snap.predicate_stable_no_param();
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i DESC NULLS LAST);
 NOTICE:  notice triggered for injection point table_parallelscan_initialize
@@ -97,7 +114,9 @@ REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
 NOTICE:  notice triggered for injection point table_parallelscan_initialize
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl USING BRIN(i);
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 DROP SCHEMA cic_reset_snap CASCADE;
 NOTICE:  drop cascades to 3 other objects
diff --git a/src/test/modules/injection_points/sql/cic_reset_snapshots.sql b/src/test/modules/injection_points/sql/cic_reset_snapshots.sql
index 5072535b355..2941aa7ae38 100644
--- a/src/test/modules/injection_points/sql/cic_reset_snapshots.sql
+++ b/src/test/modules/injection_points/sql/cic_reset_snapshots.sql
@@ -3,7 +3,7 @@ CREATE EXTENSION injection_points;
 SELECT injection_points_set_local();
 SELECT injection_points_attach('heap_reset_scan_snapshot_effective', 'notice');
 SELECT injection_points_attach('table_beginscan_strat_reset_snapshots', 'notice');
-
+SELECT injection_points_attach('table_parallelscan_initialize', 'notice');
 
 CREATE SCHEMA cic_reset_snap;
 CREATE TABLE cic_reset_snap.tbl(i int primary key, j int);
@@ -53,6 +53,9 @@ DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 -- The same in parallel mode
 ALTER TABLE cic_reset_snap.tbl SET (parallel_workers=2);
 
+-- Detach to keep test stable, since parallel worker may complete scan before leader
+SELECT injection_points_detach('heap_reset_scan_snapshot_effective');
+
 CREATE UNIQUE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
@@ -83,4 +86,4 @@ DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 
 DROP SCHEMA cic_reset_snap CASCADE;
 
-DROP EXTENSION injection_points;
+DROP EXTENSION injection_points;
\ No newline at end of file
-- 
2.43.0



  [application/octet-stream] v7-0005-Allow-snapshot-resets-in-concurrent-unique-index-.patch (32.5K, 7-v7-0005-Allow-snapshot-resets-in-concurrent-unique-index-.patch)
  download | inline diff:
From f48e59a4b33a4b05e2f08dedadfce8628a8ae094 Mon Sep 17 00:00:00 2001
From: nkey <[email protected]>
Date: Sat, 7 Dec 2024 23:27:34 +0100
Subject: [PATCH v7 5/6] Allow snapshot resets in concurrent unique index
 builds

Previously, concurrent unique index builds used a fixed snapshot for the entire
scan to ensure proper uniqueness checks. This could delay vacuum's ability to
clean up dead tuples.

Now reset snapshots periodically during concurrent unique index builds, while
still maintaining uniqueness by:

1. Ignoring dead tuples during uniqueness checks in tuplesort
2. Adding a uniqueness check in _bt_load that detects multiple alive tuples with the same key values

This improves vacuum effectiveness during long-running index builds without
compromising index uniqueness enforcement.
---
 src/backend/access/heap/heapam_handler.c      |   6 +-
 src/backend/access/nbtree/nbtdedup.c          |   8 +-
 src/backend/access/nbtree/nbtsort.c           | 173 ++++++++++++++----
 src/backend/access/nbtree/nbtsplitloc.c       |  12 +-
 src/backend/access/nbtree/nbtutils.c          |  29 ++-
 src/backend/catalog/index.c                   |   6 +-
 src/backend/utils/sort/tuplesortvariants.c    |  67 +++++--
 src/include/access/nbtree.h                   |   4 +-
 src/include/access/tableam.h                  |   5 +-
 src/include/utils/tuplesort.h                 |   1 +
 .../expected/cic_reset_snapshots.out          |   6 +
 11 files changed, 242 insertions(+), 75 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2e5163609c1..921b806642a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1232,15 +1232,15 @@ heapam_index_build_range_scan(Relation heapRelation,
 	 * qual checks (because we have to index RECENTLY_DEAD tuples). In a
 	 * concurrent build, or during bootstrap, we take a regular MVCC snapshot
 	 * and index whatever's live according to that while that snapshot is reset
-	 * every so often (in case of non-unique index).
+	 * every so often.
 	 */
 	OldestXmin = InvalidTransactionId;
 
 	/*
-	 * For unique index we need consistent snapshot for the whole scan.
+	 * For concurrent builds of non-system indexes, we may want to periodically
+	 * reset snapshots to allow vacuum to clean up tuples.
 	 */
 	reset_snapshots = indexInfo->ii_Concurrent &&
-					  !indexInfo->ii_Unique &&
 					  !is_system_catalog; /* just for the case */
 
 	/* okay to ignore lazy VACUUMs here */
diff --git a/src/backend/access/nbtree/nbtdedup.c b/src/backend/access/nbtree/nbtdedup.c
index 456d86b51c9..31b59265a29 100644
--- a/src/backend/access/nbtree/nbtdedup.c
+++ b/src/backend/access/nbtree/nbtdedup.c
@@ -148,7 +148,7 @@ _bt_dedup_pass(Relation rel, Buffer buf, IndexTuple newitem, Size newitemsz,
 			_bt_dedup_start_pending(state, itup, offnum);
 		}
 		else if (state->deduplicate &&
-				 _bt_keep_natts_fast(rel, state->base, itup) > nkeyatts &&
+				 _bt_keep_natts_fast(rel, state->base, itup, NULL) > nkeyatts &&
 				 _bt_dedup_save_htid(state, itup))
 		{
 			/*
@@ -374,7 +374,7 @@ _bt_bottomupdel_pass(Relation rel, Buffer buf, Relation heapRel,
 			/* itup starts first pending interval */
 			_bt_dedup_start_pending(state, itup, offnum);
 		}
-		else if (_bt_keep_natts_fast(rel, state->base, itup) > nkeyatts &&
+		else if (_bt_keep_natts_fast(rel, state->base, itup, NULL) > nkeyatts &&
 				 _bt_dedup_save_htid(state, itup))
 		{
 			/* Tuple is equal; just added its TIDs to pending interval */
@@ -789,12 +789,12 @@ _bt_do_singleval(Relation rel, Page page, BTDedupState state,
 	itemid = PageGetItemId(page, minoff);
 	itup = (IndexTuple) PageGetItem(page, itemid);
 
-	if (_bt_keep_natts_fast(rel, newitem, itup) > nkeyatts)
+	if (_bt_keep_natts_fast(rel, newitem, itup, NULL) > nkeyatts)
 	{
 		itemid = PageGetItemId(page, PageGetMaxOffsetNumber(page));
 		itup = (IndexTuple) PageGetItem(page, itemid);
 
-		if (_bt_keep_natts_fast(rel, newitem, itup) > nkeyatts)
+		if (_bt_keep_natts_fast(rel, newitem, itup, NULL) > nkeyatts)
 			return true;
 	}
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 2acbf121745..ac9e5acfc53 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -83,6 +83,7 @@ typedef struct BTSpool
 	Relation	index;
 	bool		isunique;
 	bool		nulls_not_distinct;
+	bool		unique_dead_ignored;
 } BTSpool;
 
 /*
@@ -101,6 +102,7 @@ typedef struct BTShared
 	Oid			indexrelid;
 	bool		isunique;
 	bool		nulls_not_distinct;
+	bool		unique_dead_ignored;
 	bool		isconcurrent;
 	int			scantuplesortstates;
 
@@ -203,15 +205,13 @@ typedef struct BTLeader
  */
 typedef struct BTBuildState
 {
-	bool		isunique;
-	bool		nulls_not_distinct;
 	bool		havedead;
 	Relation	heap;
 	BTSpool    *spool;
 
 	/*
-	 * spool2 is needed only when the index is a unique index. Dead tuples are
-	 * put into spool2 instead of spool in order to avoid uniqueness check.
+	 * spool2 is needed only when the index is a unique index and build non-concurrently.
+	 * Dead tuples are put into spool2 instead of spool in order to avoid uniqueness check.
 	 */
 	BTSpool    *spool2;
 	double		indtuples;
@@ -303,8 +303,6 @@ btbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 		ResetUsage();
 #endif							/* BTREE_BUILD_STATS */
 
-	buildstate.isunique = indexInfo->ii_Unique;
-	buildstate.nulls_not_distinct = indexInfo->ii_NullsNotDistinct;
 	buildstate.havedead = false;
 	buildstate.heap = heap;
 	buildstate.spool = NULL;
@@ -379,6 +377,11 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
 	btspool->index = index;
 	btspool->isunique = indexInfo->ii_Unique;
 	btspool->nulls_not_distinct = indexInfo->ii_NullsNotDistinct;
+    /*
+     * We need to ignore dead tuples for unique checks in case of concurrent build.
+     * It is required because or periodic reset of snapshot.
+     */
+	btspool->unique_dead_ignored = indexInfo->ii_Concurrent && indexInfo->ii_Unique;
 
 	/* Save as primary spool */
 	buildstate->spool = btspool;
@@ -427,8 +430,9 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
 	 * the use of parallelism or any other factor.
 	 */
 	buildstate->spool->sortstate =
-		tuplesort_begin_index_btree(heap, index, buildstate->isunique,
-									buildstate->nulls_not_distinct,
+		tuplesort_begin_index_btree(heap, index, btspool->isunique,
+									btspool->nulls_not_distinct,
+									btspool->unique_dead_ignored,
 									maintenance_work_mem, coordinate,
 									TUPLESORT_NONE);
 
@@ -436,8 +440,12 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
 	 * If building a unique index, put dead tuples in a second spool to keep
 	 * them out of the uniqueness check.  We expect that the second spool (for
 	 * dead tuples) won't get very full, so we give it only work_mem.
+	 *
+	 * In case of concurrent build dead tuples are not need to be put into index
+	 * since we wait for all snapshots older than reference snapshot during the
+	 * validation phase.
 	 */
-	if (indexInfo->ii_Unique)
+	if (indexInfo->ii_Unique && !indexInfo->ii_Concurrent)
 	{
 		BTSpool    *btspool2 = (BTSpool *) palloc0(sizeof(BTSpool));
 		SortCoordinate coordinate2 = NULL;
@@ -468,7 +476,7 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
 		 * full, so we give it only work_mem
 		 */
 		buildstate->spool2->sortstate =
-			tuplesort_begin_index_btree(heap, index, false, false, work_mem,
+			tuplesort_begin_index_btree(heap, index, false, false, false, work_mem,
 										coordinate2, TUPLESORT_NONE);
 	}
 
@@ -1147,13 +1155,116 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
 	SortSupport sortKeys;
 	int64		tuples_done = 0;
 	bool		deduplicate;
+	bool		fail_on_alive_duplicate;
 
 	wstate->bulkstate = smgr_bulk_start_rel(wstate->index, MAIN_FORKNUM);
 
 	deduplicate = wstate->inskey->allequalimage && !btspool->isunique &&
 		BTGetDeduplicateItems(wstate->index);
+	/*
+	 * The unique_dead_ignored does not guarantee absence of multiple alive
+	 * tuples with same values exists in the spool. Such thing may happen if
+	 * alive tuples are located between a few dead tuples, like this: addda.
+	 */
+	fail_on_alive_duplicate = btspool->unique_dead_ignored;
 
-	if (merge)
+	if (fail_on_alive_duplicate)
+	{
+		bool	seen_alive = false,
+				prev_tested = false;
+		IndexTuple prev = NULL;
+		TupleTableSlot 		*slot = MakeSingleTupleTableSlot(RelationGetDescr(wstate->heap),
+															   &TTSOpsBufferHeapTuple);
+		IndexFetchTableData *fetch = table_index_fetch_begin(wstate->heap);
+
+		Assert(btspool->isunique);
+		Assert(!btspool2);
+
+		while ((itup = tuplesort_getindextuple(btspool->sortstate, true)) != NULL)
+		{
+			bool	tuples_equal = false;
+
+			/* When we see first tuple, create first index page */
+			if (state == NULL)
+				state = _bt_pagestate(wstate, 0);
+
+			if (prev != NULL) /* if is not the first tuple */
+			{
+				bool	has_nulls = false,
+						call_again, /* just to pass something */
+						ignored,  /* just to pass something */
+						now_alive;
+				ItemPointerData tid;
+
+				/* if this tuples equal to previouse one? */
+				if (wstate->inskey->allequalimage)
+					tuples_equal = _bt_keep_natts_fast(wstate->index, prev, itup, &has_nulls) > keysz;
+				else
+					tuples_equal = _bt_keep_natts(wstate->index, prev, itup,wstate->inskey, &has_nulls) > keysz;
+
+				/* handle null values correctly */
+				if (has_nulls && !btspool->nulls_not_distinct)
+					tuples_equal = false;
+
+				if (tuples_equal)
+				{
+					/* check previous tuple if not yet */
+					if (!prev_tested)
+					{
+						call_again = false;
+						tid = prev->t_tid;
+						seen_alive = table_index_fetch_tuple(fetch, &tid, SnapshotSelf, slot, &call_again, &ignored);
+						prev_tested = true;
+					}
+
+					call_again = false;
+					tid = itup->t_tid;
+					now_alive = table_index_fetch_tuple(fetch, &tid, SnapshotSelf, slot, &call_again, &ignored);
+
+					/* are multiple alive tuples detected in equal group? */
+					if (seen_alive && now_alive)
+					{
+						char *key_desc;
+						TupleDesc tupDes = RelationGetDescr(wstate->index);
+						bool isnull[INDEX_MAX_KEYS];
+						Datum values[INDEX_MAX_KEYS];
+
+						index_deform_tuple(itup, tupDes, values, isnull);
+
+						key_desc = BuildIndexValueDescription(wstate->index, values, isnull);
+
+						/* keep this message in sync with the same in comparetup_index_btree_tiebreak */
+						ereport(ERROR,
+								(errcode(ERRCODE_UNIQUE_VIOLATION),
+										errmsg("could not create unique index \"%s\"",
+											   RelationGetRelationName(wstate->index)),
+										key_desc ? errdetail("Key %s is duplicated.", key_desc) :
+										errdetail("Duplicate keys exist."),
+										errtableconstraint(wstate->heap,
+														   RelationGetRelationName(wstate->index))));
+					}
+					seen_alive |= now_alive;
+				}
+			}
+
+			if (!tuples_equal)
+			{
+				seen_alive = false;
+				prev_tested = false;
+			}
+
+			_bt_buildadd(wstate, state, itup, 0);
+			if (prev) pfree(prev);
+			prev = CopyIndexTuple(itup);
+
+			/* Report progress */
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE,
+										 ++tuples_done);
+		}
+		ExecDropSingleTupleTableSlot(slot);
+		table_index_fetch_end(fetch);
+	}
+	else if (merge)
 	{
 		/*
 		 * Another BTSpool for dead tuples exists. Now we have to merge
@@ -1314,7 +1425,7 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
 										InvalidOffsetNumber);
 			}
 			else if (_bt_keep_natts_fast(wstate->index, dstate->base,
-										 itup) > keysz &&
+										 itup, NULL) > keysz &&
 					 _bt_dedup_save_htid(dstate, itup))
 			{
 				/*
@@ -1411,7 +1522,6 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	BufferUsage *bufferusage;
 	bool		leaderparticipates = true;
 	bool		need_pop_active_snapshot = true;
-	bool		reset_snapshot;
 	bool		wait_for_snapshot_attach;
 	int			querylen;
 
@@ -1430,21 +1540,12 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 
 	scantuplesortstates = leaderparticipates ? request + 1 : request;
 
-    /*
-	 * For concurrent non-unique index builds, we can periodically reset snapshots
-	 * to allow the xmin horizon to advance. This is safe since these builds don't
-	 * require a consistent view across the entire scan. Unique indexes still need
-	 * a stable snapshot to properly enforce uniqueness constraints.
-     */
-	reset_snapshot = isconcurrent && !btspool->isunique;
-
 	/*
 	 * Prepare for scan of the base relation.  In a normal index build, we use
 	 * SnapshotAny because we must retrieve all tuples and do our own time
 	 * qual checks (because we have to index RECENTLY_DEAD tuples).  In a
 	 * concurrent build, we take a regular MVCC snapshot and index whatever's
-	 * live according to that, while that snapshot may be reset periodically in
-	 * case of non-unique index.
+	 * live according to that, while that snapshot may be reset periodically.
 	 */
 	if (!isconcurrent)
 	{
@@ -1452,16 +1553,16 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 		snapshot = SnapshotAny;
 		need_pop_active_snapshot = false;
 	}
-	else if (reset_snapshot)
+	else
 	{
+		/*
+		 * For concurrent index builds, we can periodically reset snapshots to allow
+		 * the xmin horizon to advance. This is safe since these builds don't
+		 * require a consistent view across the entire scan.
+		 */
 		snapshot = InvalidSnapshot;
 		PushActiveSnapshot(GetTransactionSnapshot());
 	}
-	else
-	{
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
-		PushActiveSnapshot(snapshot);
-	}
 
 	/*
 	 * Estimate size for our own PARALLEL_KEY_BTREE_SHARED workspace, and
@@ -1531,6 +1632,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	btshared->indexrelid = RelationGetRelid(btspool->index);
 	btshared->isunique = btspool->isunique;
 	btshared->nulls_not_distinct = btspool->nulls_not_distinct;
+	btshared->unique_dead_ignored = btspool->unique_dead_ignored;
 	btshared->isconcurrent = isconcurrent;
 	btshared->scantuplesortstates = scantuplesortstates;
 	btshared->queryid = pgstat_get_my_query_id();
@@ -1545,7 +1647,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	table_parallelscan_initialize(btspool->heap,
 								  ParallelTableScanFromBTShared(btshared),
 								  snapshot,
-								  reset_snapshot);
+								  isconcurrent);
 
 	/*
 	 * Store shared tuplesort-private state, for which we reserved space.
@@ -1626,7 +1728,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	 * In case when leader going to reset own active snapshot as well - we need to
 	 * wait until all workers imported initial snapshot.
 	 */
-	wait_for_snapshot_attach = reset_snapshot && leaderparticipates;
+	wait_for_snapshot_attach = isconcurrent && leaderparticipates;
 
 	if (wait_for_snapshot_attach)
 		WaitForParallelWorkersToAttach(pcxt, true);
@@ -1742,6 +1844,7 @@ _bt_leader_participate_as_worker(BTBuildState *buildstate)
 	leaderworker->index = buildstate->spool->index;
 	leaderworker->isunique = buildstate->spool->isunique;
 	leaderworker->nulls_not_distinct = buildstate->spool->nulls_not_distinct;
+	leaderworker->unique_dead_ignored = buildstate->spool->unique_dead_ignored;
 
 	/* Initialize second spool, if required */
 	if (!btleader->btshared->isunique)
@@ -1845,11 +1948,12 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	btspool->index = indexRel;
 	btspool->isunique = btshared->isunique;
 	btspool->nulls_not_distinct = btshared->nulls_not_distinct;
+	btspool->unique_dead_ignored = btshared->unique_dead_ignored;
 
 	/* Look up shared state private to tuplesort.c */
 	sharedsort = shm_toc_lookup(toc, PARALLEL_KEY_TUPLESORT, false);
 	tuplesort_attach_shared(sharedsort, seg);
-	if (!btshared->isunique)
+	if (!btshared->isunique || btshared->isconcurrent)
 	{
 		btspool2 = NULL;
 		sharedsort2 = NULL;
@@ -1928,6 +2032,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 													 btspool->index,
 													 btspool->isunique,
 													 btspool->nulls_not_distinct,
+													 btspool->unique_dead_ignored,
 													 sortmem, coordinate,
 													 TUPLESORT_NONE);
 
@@ -1950,14 +2055,12 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 		coordinate2->nParticipants = -1;
 		coordinate2->sharedsort = sharedsort2;
 		btspool2->sortstate =
-			tuplesort_begin_index_btree(btspool->heap, btspool->index, false, false,
+			tuplesort_begin_index_btree(btspool->heap, btspool->index, false, false, false,
 										Min(sortmem, work_mem), coordinate2,
 										false);
 	}
 
 	/* Fill in buildstate for _bt_build_callback() */
-	buildstate.isunique = btshared->isunique;
-	buildstate.nulls_not_distinct = btshared->nulls_not_distinct;
 	buildstate.havedead = false;
 	buildstate.heap = btspool->heap;
 	buildstate.spool = btspool;
diff --git a/src/backend/access/nbtree/nbtsplitloc.c b/src/backend/access/nbtree/nbtsplitloc.c
index 1f40d40263e..e2ed4537026 100644
--- a/src/backend/access/nbtree/nbtsplitloc.c
+++ b/src/backend/access/nbtree/nbtsplitloc.c
@@ -687,7 +687,7 @@ _bt_afternewitemoff(FindSplitData *state, OffsetNumber maxoff,
 	{
 		itemid = PageGetItemId(state->origpage, maxoff);
 		tup = (IndexTuple) PageGetItem(state->origpage, itemid);
-		keepnatts = _bt_keep_natts_fast(state->rel, tup, state->newitem);
+		keepnatts = _bt_keep_natts_fast(state->rel, tup, state->newitem, NULL);
 
 		if (keepnatts > 1 && keepnatts <= nkeyatts)
 		{
@@ -718,7 +718,7 @@ _bt_afternewitemoff(FindSplitData *state, OffsetNumber maxoff,
 		!_bt_adjacenthtid(&tup->t_tid, &state->newitem->t_tid))
 		return false;
 	/* Check same conditions as rightmost item case, too */
-	keepnatts = _bt_keep_natts_fast(state->rel, tup, state->newitem);
+	keepnatts = _bt_keep_natts_fast(state->rel, tup, state->newitem, NULL);
 
 	if (keepnatts > 1 && keepnatts <= nkeyatts)
 	{
@@ -967,7 +967,7 @@ _bt_strategy(FindSplitData *state, SplitPoint *leftpage,
 	 * avoid appending a heap TID in new high key, we're done.  Finish split
 	 * with default strategy and initial split interval.
 	 */
-	perfectpenalty = _bt_keep_natts_fast(state->rel, leftmost, rightmost);
+	perfectpenalty = _bt_keep_natts_fast(state->rel, leftmost, rightmost, NULL);
 	if (perfectpenalty <= indnkeyatts)
 		return perfectpenalty;
 
@@ -988,7 +988,7 @@ _bt_strategy(FindSplitData *state, SplitPoint *leftpage,
 	 * If page is entirely full of duplicates, a single value strategy split
 	 * will be performed.
 	 */
-	perfectpenalty = _bt_keep_natts_fast(state->rel, leftmost, rightmost);
+	perfectpenalty = _bt_keep_natts_fast(state->rel, leftmost, rightmost, NULL);
 	if (perfectpenalty <= indnkeyatts)
 	{
 		*strategy = SPLIT_MANY_DUPLICATES;
@@ -1027,7 +1027,7 @@ _bt_strategy(FindSplitData *state, SplitPoint *leftpage,
 		itemid = PageGetItemId(state->origpage, P_HIKEY);
 		hikey = (IndexTuple) PageGetItem(state->origpage, itemid);
 		perfectpenalty = _bt_keep_natts_fast(state->rel, hikey,
-											 state->newitem);
+											 state->newitem, NULL);
 		if (perfectpenalty <= indnkeyatts)
 			*strategy = SPLIT_SINGLE_VALUE;
 		else
@@ -1149,7 +1149,7 @@ _bt_split_penalty(FindSplitData *state, SplitPoint *split)
 	lastleft = _bt_split_lastleft(state, split);
 	firstright = _bt_split_firstright(state, split);
 
-	return _bt_keep_natts_fast(state->rel, lastleft, firstright);
+	return _bt_keep_natts_fast(state->rel, lastleft, firstright, NULL);
 }
 
 /*
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 50cbf06cb45..3d6dda4ace8 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -100,8 +100,6 @@ static bool _bt_check_rowcompare(ScanKey skey,
 								 ScanDirection dir, bool *continuescan);
 static void _bt_checkkeys_look_ahead(IndexScanDesc scan, BTReadPageState *pstate,
 									 int tupnatts, TupleDesc tupdesc);
-static int	_bt_keep_natts(Relation rel, IndexTuple lastleft,
-						   IndexTuple firstright, BTScanInsert itup_key);
 
 
 /*
@@ -4672,7 +4670,7 @@ _bt_truncate(Relation rel, IndexTuple lastleft, IndexTuple firstright,
 	Assert(!BTreeTupleIsPivot(lastleft) && !BTreeTupleIsPivot(firstright));
 
 	/* Determine how many attributes must be kept in truncated tuple */
-	keepnatts = _bt_keep_natts(rel, lastleft, firstright, itup_key);
+	keepnatts = _bt_keep_natts(rel, lastleft, firstright, itup_key, NULL);
 
 #ifdef DEBUG_NO_TRUNCATE
 	/* Force truncation to be ineffective for testing purposes */
@@ -4790,17 +4788,24 @@ _bt_truncate(Relation rel, IndexTuple lastleft, IndexTuple firstright,
 /*
  * _bt_keep_natts - how many key attributes to keep when truncating.
  *
+ * This is exported to be used as comparison function during concurrent
+ * unique index build in case _bt_keep_natts_fast is not suitable because
+ * collation is not "allequalimage"/deduplication-safe.
+ *
  * Caller provides two tuples that enclose a split point.  Caller's insertion
  * scankey is used to compare the tuples; the scankey's argument values are
  * not considered here.
  *
+ * hasnulls value set to true in case of any null column in any tuple.
+ *
  * This can return a number of attributes that is one greater than the
  * number of key attributes for the index relation.  This indicates that the
  * caller must use a heap TID as a unique-ifier in new pivot tuple.
  */
-static int
+int
 _bt_keep_natts(Relation rel, IndexTuple lastleft, IndexTuple firstright,
-			   BTScanInsert itup_key)
+			   BTScanInsert itup_key,
+			   bool *hasnulls)
 {
 	int			nkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
 	TupleDesc	itupdesc = RelationGetDescr(rel);
@@ -4826,6 +4831,8 @@ _bt_keep_natts(Relation rel, IndexTuple lastleft, IndexTuple firstright,
 
 		datum1 = index_getattr(lastleft, attnum, itupdesc, &isNull1);
 		datum2 = index_getattr(firstright, attnum, itupdesc, &isNull2);
+		if (hasnulls)
+			(*hasnulls) |= (isNull1 || isNull2);
 
 		if (isNull1 != isNull2)
 			break;
@@ -4845,7 +4852,7 @@ _bt_keep_natts(Relation rel, IndexTuple lastleft, IndexTuple firstright,
 	 * expected in an allequalimage index.
 	 */
 	Assert(!itup_key->allequalimage ||
-		   keepnatts == _bt_keep_natts_fast(rel, lastleft, firstright));
+		   keepnatts == _bt_keep_natts_fast(rel, lastleft, firstright, NULL));
 
 	return keepnatts;
 }
@@ -4856,7 +4863,8 @@ _bt_keep_natts(Relation rel, IndexTuple lastleft, IndexTuple firstright,
  * This is exported so that a candidate split point can have its effect on
  * suffix truncation inexpensively evaluated ahead of time when finding a
  * split location.  A naive bitwise approach to datum comparisons is used to
- * save cycles.
+ * save cycles. Also, it may be used as comparison function during concurrent
+ * build of unique index.
  *
  * The approach taken here usually provides the same answer as _bt_keep_natts
  * will (for the same pair of tuples from a heapkeyspace index), since the
@@ -4865,6 +4873,8 @@ _bt_keep_natts(Relation rel, IndexTuple lastleft, IndexTuple firstright,
  * "equal image" columns, routine is guaranteed to give the same result as
  * _bt_keep_natts would.
  *
+ * hasnulls value set to true in case of any null column in any tuple.
+ *
  * Callers can rely on the fact that attributes considered equal here are
  * definitely also equal according to _bt_keep_natts, even when the index uses
  * an opclass or collation that is not "allequalimage"/deduplication-safe.
@@ -4873,7 +4883,8 @@ _bt_keep_natts(Relation rel, IndexTuple lastleft, IndexTuple firstright,
  * more balanced split point.
  */
 int
-_bt_keep_natts_fast(Relation rel, IndexTuple lastleft, IndexTuple firstright)
+_bt_keep_natts_fast(Relation rel, IndexTuple lastleft, IndexTuple firstright,
+					bool *hasnulls)
 {
 	TupleDesc	itupdesc = RelationGetDescr(rel);
 	int			keysz = IndexRelationGetNumberOfKeyAttributes(rel);
@@ -4890,6 +4901,8 @@ _bt_keep_natts_fast(Relation rel, IndexTuple lastleft, IndexTuple firstright)
 
 		datum1 = index_getattr(lastleft, attnum, itupdesc, &isNull1);
 		datum2 = index_getattr(firstright, attnum, itupdesc, &isNull2);
+		if (hasnulls)
+			*hasnulls |= (isNull1 | isNull2);
 		att = TupleDescAttr(itupdesc, attnum - 1);
 
 		if (isNull1 != isNull2)
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index e0ada5ce159..f6a1a2f3f90 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3292,9 +3292,9 @@ IndexCheckExclusion(Relation heapRelation,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
  * does not contain any tuples added to the table while we built the index.
  *
- * Furthermore, in case of non-unique index we set SO_RESET_SNAPSHOT for the
- * scan, which causes new snapshot to be set as active every so often. The reason
- * for that is to propagate the xmin horizon forward.
+ * Furthermore, we set SO_RESET_SNAPSHOT for the scan, which causes new
+ * snapshot to be set as active every so often. The reason  for that is to
+ * propagate the xmin horizon forward.
  *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
  * commit the second transaction and start a third.  Again we wait for all
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index e07ba4ea4b1..aa4fcaac9a0 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -123,6 +123,7 @@ typedef struct
 
 	bool		enforceUnique;	/* complain if we find duplicate tuples */
 	bool		uniqueNullsNotDistinct; /* unique constraint null treatment */
+	bool		uniqueDeadIgnored; /* ignore dead tuples in unique check */
 } TuplesortIndexBTreeArg;
 
 /*
@@ -349,6 +350,7 @@ tuplesort_begin_index_btree(Relation heapRel,
 							Relation indexRel,
 							bool enforceUnique,
 							bool uniqueNullsNotDistinct,
+							bool uniqueDeadIgnored,
 							int workMem,
 							SortCoordinate coordinate,
 							int sortopt)
@@ -391,6 +393,7 @@ tuplesort_begin_index_btree(Relation heapRel,
 	arg->index.indexRel = indexRel;
 	arg->enforceUnique = enforceUnique;
 	arg->uniqueNullsNotDistinct = uniqueNullsNotDistinct;
+	arg->uniqueDeadIgnored = uniqueDeadIgnored;
 
 	indexScanKey = _bt_mkscankey(indexRel, NULL);
 
@@ -1520,6 +1523,7 @@ comparetup_index_btree_tiebreak(const SortTuple *a, const SortTuple *b,
 		Datum		values[INDEX_MAX_KEYS];
 		bool		isnull[INDEX_MAX_KEYS];
 		char	   *key_desc;
+		bool		uniqueCheckFail = true;
 
 		/*
 		 * Some rather brain-dead implementations of qsort (such as the one in
@@ -1529,18 +1533,57 @@ comparetup_index_btree_tiebreak(const SortTuple *a, const SortTuple *b,
 		 */
 		Assert(tuple1 != tuple2);
 
-		index_deform_tuple(tuple1, tupDes, values, isnull);
-
-		key_desc = BuildIndexValueDescription(arg->index.indexRel, values, isnull);
-
-		ereport(ERROR,
-				(errcode(ERRCODE_UNIQUE_VIOLATION),
-				 errmsg("could not create unique index \"%s\"",
-						RelationGetRelationName(arg->index.indexRel)),
-				 key_desc ? errdetail("Key %s is duplicated.", key_desc) :
-				 errdetail("Duplicate keys exist."),
-				 errtableconstraint(arg->index.heapRel,
-									RelationGetRelationName(arg->index.indexRel))));
+		/* This is fail-fast check, see _bt_load for details. */
+		if (arg->uniqueDeadIgnored)
+		{
+			bool	any_tuple_dead,
+					call_again = false,
+					ignored;
+
+			TupleTableSlot	*slot = MakeSingleTupleTableSlot(RelationGetDescr(arg->index.heapRel),
+															 &TTSOpsBufferHeapTuple);
+			ItemPointerData tid = tuple1->t_tid;
+
+			IndexFetchTableData *fetch = table_index_fetch_begin(arg->index.heapRel);
+			any_tuple_dead = !table_index_fetch_tuple(fetch, &tid, SnapshotSelf, slot, &call_again, &ignored);
+
+			if (!any_tuple_dead)
+			{
+				call_again = false;
+				tid = tuple2->t_tid;
+				any_tuple_dead = !table_index_fetch_tuple(fetch, &tuple2->t_tid, SnapshotSelf, slot, &call_again,
+														  &ignored);
+			}
+
+			if (any_tuple_dead)
+			{
+				elog(DEBUG5, "skipping duplicate values because some of them are dead: (%u,%u) vs (%u,%u)",
+					 ItemPointerGetBlockNumber(&tuple1->t_tid),
+					 ItemPointerGetOffsetNumber(&tuple1->t_tid),
+					 ItemPointerGetBlockNumber(&tuple2->t_tid),
+					 ItemPointerGetOffsetNumber(&tuple2->t_tid));
+
+				uniqueCheckFail = false;
+			}
+			ExecDropSingleTupleTableSlot(slot);
+			table_index_fetch_end(fetch);
+		}
+		if (uniqueCheckFail)
+		{
+			index_deform_tuple(tuple1, tupDes, values, isnull);
+
+			key_desc = BuildIndexValueDescription(arg->index.indexRel, values, isnull);
+
+			/* keep this error message in sync with the same in _bt_load */
+			ereport(ERROR,
+					(errcode(ERRCODE_UNIQUE_VIOLATION),
+							errmsg("could not create unique index \"%s\"",
+								   RelationGetRelationName(arg->index.indexRel)),
+							key_desc ? errdetail("Key %s is duplicated.", key_desc) :
+							errdetail("Duplicate keys exist."),
+							errtableconstraint(arg->index.heapRel,
+											   RelationGetRelationName(arg->index.indexRel))));
+		}
 	}
 
 	/*
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 123fba624db..4200d2bd20e 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1297,8 +1297,10 @@ extern bool btproperty(Oid index_oid, int attno,
 extern char *btbuildphasename(int64 phasenum);
 extern IndexTuple _bt_truncate(Relation rel, IndexTuple lastleft,
 							   IndexTuple firstright, BTScanInsert itup_key);
+extern int	_bt_keep_natts(Relation rel, IndexTuple lastleft, IndexTuple firstright,
+						   BTScanInsert itup_key, bool *hasnulls);
 extern int	_bt_keep_natts_fast(Relation rel, IndexTuple lastleft,
-								IndexTuple firstright);
+								IndexTuple firstright, bool *hasnulls);
 extern bool _bt_check_natts(Relation rel, bool heapkeyspace, Page page,
 							OffsetNumber offnum);
 extern void _bt_check_third_page(Relation rel, Relation heap,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 9ee5ea15fd4..ec3769585c3 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -1803,9 +1803,8 @@ table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
  * This only really makes sense for heap AM, it might need to be generalized
  * for other AMs later.
  *
- * In case of non-unique concurrent  index build SO_RESET_SNAPSHOT is applied
- * for the scan. That leads for changing snapshots on the fly to allow xmin
- * horizon propagate.
+ * In case of concurrent index build SO_RESET_SNAPSHOT is applied for the scan.
+ * That leads for changing snapshots on the fly to allow xmin horizon propagate.
  */
 static inline double
 table_index_build_scan(Relation table_rel,
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index cde83f62015..ae5f4d28fdc 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -428,6 +428,7 @@ extern Tuplesortstate *tuplesort_begin_index_btree(Relation heapRel,
 												   Relation indexRel,
 												   bool enforceUnique,
 												   bool uniqueNullsNotDistinct,
+												   bool	uniqueDeadIgnored,
 												   int workMem, SortCoordinate coordinate,
 												   int sortopt);
 extern Tuplesortstate *tuplesort_begin_index_hash(Relation heapRel,
diff --git a/src/test/modules/injection_points/expected/cic_reset_snapshots.out b/src/test/modules/injection_points/expected/cic_reset_snapshots.out
index 595a4000ce0..9f03fa3033c 100644
--- a/src/test/modules/injection_points/expected/cic_reset_snapshots.out
+++ b/src/test/modules/injection_points/expected/cic_reset_snapshots.out
@@ -41,7 +41,11 @@ END; $$;
 ----------------
 ALTER TABLE cic_reset_snap.tbl SET (parallel_workers=0);
 CREATE UNIQUE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
+NOTICE:  notice triggered for injection point heap_reset_scan_snapshot_effective
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
 NOTICE:  notice triggered for injection point table_beginscan_strat_reset_snapshots
@@ -86,7 +90,9 @@ SELECT injection_points_detach('heap_reset_scan_snapshot_effective');
 (1 row)
 
 CREATE UNIQUE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 REINDEX INDEX CONCURRENTLY cic_reset_snap.idx;
+NOTICE:  notice triggered for injection point table_parallelscan_initialize
 DROP INDEX CONCURRENTLY cic_reset_snap.idx;
 CREATE INDEX CONCURRENTLY idx ON cic_reset_snap.tbl(i);
 NOTICE:  notice triggered for injection point table_parallelscan_initialize
-- 
2.43.0



  [application/octet-stream] v7-0006-Add-STIR-Short-Term-Index-Replacement-access-meth.patch (37.3K, 8-v7-0006-Add-STIR-Short-Term-Index-Replacement-access-meth.patch)
  download | inline diff:
From ccad95c4c080d0a73d7e5c1458fde825b559f9fe Mon Sep 17 00:00:00 2001
From: nkey <[email protected]>
Date: Sat, 21 Dec 2024 18:36:10 +0100
Subject: [PATCH v7 6/6] Add STIR (Short-Term Index Replacement) access method

This patch provides foundational infrastructure for upcoming enhancements to
concurrent index builds by introducing:

- **ii_Auxiliary** in `IndexInfo`: Indicates that an index is an auxiliary
  index, specifically for use during concurrent index builds.
- **validate_index** in `IndexVacuumInfo`: Signals when a vacuum or cleanup
  operation is validating a newly built index (e.g., during concurrent build).

Additionally, a new **STIR (Short-Term Index Replacement)** access method is
introduced, intended solely for short-lived, auxiliary usage. STIR functions
as an ephemeral helper during concurrent index builds, temporarily storing TIDs
without providing the full features of a typical index. As such, it raises
warnings or errors when accessed outside its specialized usage path.

These changes lay essential groundwork for further improvements to concurrent
index builds.
---
 contrib/pgstattuple/pgstattuple.c        |   3 +
 src/backend/access/Makefile              |   2 +-
 src/backend/access/heap/vacuumlazy.c     |   2 +
 src/backend/access/meson.build           |   1 +
 src/backend/access/stir/Makefile         |  18 +
 src/backend/access/stir/meson.build      |   5 +
 src/backend/access/stir/stir.c           | 576 +++++++++++++++++++++++
 src/backend/catalog/index.c              |   1 +
 src/backend/commands/analyze.c           |   1 +
 src/backend/commands/vacuumparallel.c    |   1 +
 src/backend/nodes/makefuncs.c            |   1 +
 src/include/access/genam.h               |   1 +
 src/include/access/reloptions.h          |   3 +-
 src/include/access/stir.h                | 117 +++++
 src/include/catalog/pg_am.dat            |   3 +
 src/include/catalog/pg_opclass.dat       |   3 +
 src/include/catalog/pg_opfamily.dat      |   2 +
 src/include/catalog/pg_proc.dat          |   4 +
 src/include/nodes/execnodes.h            |   6 +-
 src/include/utils/index_selfuncs.h       |   8 +
 src/test/regress/expected/amutils.out    |   8 +-
 src/test/regress/expected/opr_sanity.out |   7 +-
 src/test/regress/expected/psql.out       |  24 +-
 23 files changed, 779 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/stir/Makefile
 create mode 100644 src/backend/access/stir/meson.build
 create mode 100644 src/backend/access/stir/stir.c
 create mode 100644 src/include/access/stir.h

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index ff7cc07df99..007efc4ed0c 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -282,6 +282,9 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 			case SPGIST_AM_OID:
 				err = "spgist index";
 				break;
+			case STIR_AM_OID:
+				err = "stir index";
+				break;
 			case BRIN_AM_OID:
 				err = "brin index";
 				break;
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 1932d11d154..cd6524a54ab 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -9,6 +9,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS	    = brin common gin gist hash heap index nbtree rmgrdesc spgist \
-			  sequence table tablesample transam
+			  stir sequence table tablesample transam
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f2ca9430581..bec79b48cb2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2538,6 +2538,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
@@ -2589,6 +2590,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
diff --git a/src/backend/access/meson.build b/src/backend/access/meson.build
index 62a371db7f7..63ee0ef134d 100644
--- a/src/backend/access/meson.build
+++ b/src/backend/access/meson.build
@@ -11,6 +11,7 @@ subdir('nbtree')
 subdir('rmgrdesc')
 subdir('sequence')
 subdir('spgist')
+subdir('stir')
 subdir('table')
 subdir('tablesample')
 subdir('transam')
diff --git a/src/backend/access/stir/Makefile b/src/backend/access/stir/Makefile
new file mode 100644
index 00000000000..fae5898b8d7
--- /dev/null
+++ b/src/backend/access/stir/Makefile
@@ -0,0 +1,18 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/stir
+#
+# IDENTIFICATION
+#    src/backend/access/stir/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/stir
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	stir.o
+
+include $(top_srcdir)/src/backend/common.mk
\ No newline at end of file
diff --git a/src/backend/access/stir/meson.build b/src/backend/access/stir/meson.build
new file mode 100644
index 00000000000..39c6eca848d
--- /dev/null
+++ b/src/backend/access/stir/meson.build
@@ -0,0 +1,5 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+backend_sources += files(
+	'stir.c',
+)
\ No newline at end of file
diff --git a/src/backend/access/stir/stir.c b/src/backend/access/stir/stir.c
new file mode 100644
index 00000000000..83aa255176f
--- /dev/null
+++ b/src/backend/access/stir/stir.c
@@ -0,0 +1,576 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.c
+ *	  Implementation of Short-Term Index Replacement.
+ *
+ * STIR is a specialized access method type designed for temporary storage
+ * of TID values during concurernt index build operations.
+ *
+ * The typical lifecycle of a STIR index is:
+ * 1. created as an auxiliary index for CIC/RIC
+ * 2. accepts inserts for a period
+ * 3. stirbulkdelete called during index validation phase
+ * 5. gets dropped
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/stir/stir.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/stir.h"
+#include "commands/vacuum.h"
+#include "utils/index_selfuncs.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_opfamily.h"
+#include "utils/catcache.h"
+#include "access/amvalidate.h"
+#include "utils/syscache.h"
+#include "access/htup_details.h"
+#include "catalog/pg_amproc.h"
+#include "catalog/index.h"
+#include "catalog/pg_amop.h"
+#include "utils/regproc.h"
+#include "storage/bufmgr.h"
+#include "access/tableam.h"
+#include "access/reloptions.h"
+#include "utils/memutils.h"
+#include "utils/fmgrprotos.h"
+
+/*
+ * Stir handler function: return IndexAmRoutine with access method parameters
+ * and callbacks.
+ */
+Datum
+stirhandler(PG_FUNCTION_ARGS)
+{
+	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
+
+	/* Set STIR-specific strategy and procedure numbers */
+	amroutine->amstrategies = STIR_NSTRATEGIES;
+	amroutine->amsupport = STIR_NPROC;
+	amroutine->amoptsprocnum = STIR_OPTIONS_PROC;
+
+	/* STIR doesn't support most index operations */
+	amroutine->amcanorder = false;
+	amroutine->amcanorderbyop = false;
+	amroutine->amcanbackward = false;
+	amroutine->amcanunique = false;
+	amroutine->amcanmulticol = true;
+	amroutine->amoptionalkey = true;
+	amroutine->amsearcharray = false;
+	amroutine->amsearchnulls = false;
+	amroutine->amstorage = false;
+	amroutine->amclusterable = false;
+	amroutine->ampredlocks = false;
+	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
+	amroutine->amcaninclude = true;
+	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amparallelvacuumoptions =
+			VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_CLEANUP;
+	amroutine->amkeytype = InvalidOid;
+
+	/* Set up function callbacks */
+	amroutine->ambuild = stirbuild;
+	amroutine->ambuildempty = stirbuildempty;
+	amroutine->aminsert = stirinsert;
+	amroutine->aminsertcleanup = NULL;
+	amroutine->ambulkdelete = stirbulkdelete;
+	amroutine->amvacuumcleanup = stirvacuumcleanup;
+	amroutine->amcanreturn = NULL;
+	amroutine->amcostestimate = stircostestimate;
+	amroutine->amoptions = stiroptions;
+	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
+	amroutine->amvalidate = stirvalidate;
+	amroutine->amadjustmembers = NULL;
+	amroutine->ambeginscan = stirbeginscan;
+	amroutine->amrescan = stirrescan;
+	amroutine->amgettuple = NULL;
+	amroutine->amgetbitmap = NULL;
+	amroutine->amendscan = stirendscan;
+	amroutine->ammarkpos = NULL;
+	amroutine->amrestrpos = NULL;
+	amroutine->amestimateparallelscan = NULL;
+	amroutine->aminitparallelscan = NULL;
+	amroutine->amparallelrescan = NULL;
+
+	PG_RETURN_POINTER(amroutine);
+}
+
+/*
+ * Validates operator class for STIR index.
+ *
+ * STIR is not an real index, so validatio may be skipped.
+ * But we do it just for consistency.
+ */
+bool
+stirvalidate(Oid opclassoid)
+{
+	bool result = true;
+	HeapTuple classtup;
+	Form_pg_opclass classform;
+	Oid opfamilyoid;
+	HeapTuple familytup;
+	Form_pg_opfamily familyform;
+	char *opfamilyname;
+	CatCList *proclist,
+			*oprlist;
+	int i;
+
+	/* Fetch opclass information */
+	classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
+	if (!HeapTupleIsValid(classtup))
+		elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
+	classform = (Form_pg_opclass) GETSTRUCT(classtup);
+
+	opfamilyoid = classform->opcfamily;
+
+
+	/* Fetch opfamily information */
+	familytup = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfamilyoid));
+	if (!HeapTupleIsValid(familytup))
+		elog(ERROR, "cache lookup failed for operator family %u", opfamilyoid);
+	familyform = (Form_pg_opfamily) GETSTRUCT(familytup);
+
+	opfamilyname = NameStr(familyform->opfname);
+
+	/* Fetch all operators and support functions of the opfamily */
+	oprlist = SearchSysCacheList1(AMOPSTRATEGY, ObjectIdGetDatum(opfamilyoid));
+	proclist = SearchSysCacheList1(AMPROCNUM, ObjectIdGetDatum(opfamilyoid));
+
+	/* Check individual operators */
+	for (i = 0; i < oprlist->n_members; i++)
+	{
+		HeapTuple oprtup = &oprlist->members[i]->tuple;
+		Form_pg_amop oprform = (Form_pg_amop) GETSTRUCT(oprtup);
+
+		/* Check it's allowed strategy for stir */
+		if (oprform->amopstrategy < 1 ||
+			oprform->amopstrategy > STIR_NSTRATEGIES)
+		{
+			ereport(INFO,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+							errmsg("stir opfamily %s contains operator %s with invalid strategy number %d",
+								   opfamilyname,
+								   format_operator(oprform->amopopr),
+								   oprform->amopstrategy)));
+			result = false;
+		}
+
+		/* stir doesn't support ORDER BY operators */
+		if (oprform->amoppurpose != AMOP_SEARCH ||
+			OidIsValid(oprform->amopsortfamily))
+		{
+			ereport(INFO,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+							errmsg("stir opfamily %s contains invalid ORDER BY specification for operator %s",
+								   opfamilyname,
+								   format_operator(oprform->amopopr))));
+			result = false;
+		}
+
+		/* Check operator signature --- same for all stir strategies */
+		if (!check_amop_signature(oprform->amopopr, BOOLOID,
+								  oprform->amoplefttype,
+								  oprform->amoprighttype))
+		{
+			ereport(INFO,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+							errmsg("stir opfamily %s contains operator %s with wrong signature",
+								   opfamilyname,
+								   format_operator(oprform->amopopr))));
+			result = false;
+		}
+	}
+
+
+	ReleaseCatCacheList(proclist);
+	ReleaseCatCacheList(oprlist);
+	ReleaseSysCache(familytup);
+	ReleaseSysCache(classtup);
+
+	return result;
+}
+
+
+/*
+ * Initialize metapage of a STIR index.
+ * The skipInserts flag determines if new inserts will be accepted or skipped.
+ */
+void
+StirFillMetapage(Relation index, Page metaPage, bool skipInserts)
+{
+	StirMetaPageData *metadata;
+
+	StirInitPage(metaPage, STIR_META);
+	metadata = StirPageGetMeta(metaPage);
+	memset(metadata, 0, sizeof(StirMetaPageData));
+	metadata->magickNumber = STIR_MAGICK_NUMBER;
+	metadata->skipInserts = skipInserts;
+	((PageHeader) metaPage)->pd_lower += sizeof(StirMetaPageData);
+}
+
+/*
+ * Create and initialize the metapage for a STIR index.
+ * This is called during index creation.
+ */
+void
+StirInitMetapage(Relation index, ForkNumber forknum)
+{
+	Buffer metaBuffer;
+	Page metaPage;
+	GenericXLogState *state;
+
+	/*
+	 * Make a new page; since it is first page it should be associated with
+	 * block number 0 (STIR_METAPAGE_BLKNO).  No need to hold the extension
+	 * lock because there cannot be concurrent inserters yet.
+	 */
+	metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	Assert(BufferGetBlockNumber(metaBuffer) == STIR_METAPAGE_BLKNO);
+
+	/* Initialize contents of meta page */
+	state = GenericXLogStart(index);
+	metaPage = GenericXLogRegisterBuffer(state, metaBuffer,
+										 GENERIC_XLOG_FULL_IMAGE);
+	StirFillMetapage(index, metaPage, forknum == INIT_FORKNUM);
+	GenericXLogFinish(state);
+
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * Initialize any page of a stir index.
+ */
+void
+StirInitPage(Page page, uint16 flags)
+{
+	StirPageOpaque opaque;
+
+	PageInit(page, BLCKSZ, sizeof(StirPageOpaqueData));
+
+	opaque = StirPageGetOpaque(page);
+	opaque->flags = flags;
+	opaque->stir_page_id = STIR_PAGE_ID;
+}
+
+/*
+ * Add a tuple to a STIR page. Returns false if tuple doesn't fit.
+ * The tuple is added to the end of the page.
+ */
+static bool
+StirPageAddItem(Page page, StirTuple *tuple)
+{
+	StirTuple *itup;
+	StirPageOpaque opaque;
+	Pointer ptr;
+
+	/* We shouldn't be pointed to an invalid page */
+	Assert(!PageIsNew(page));
+
+	/* Does new tuple fit on the page? */
+	if (StirPageGetFreeSpace(state, page) < sizeof(StirTuple))
+		return false;
+
+	/* Copy new tuple to the end of page */
+	opaque = StirPageGetOpaque(page);
+	itup = StirPageGetTuple(page, opaque->maxoff + 1);
+	memcpy((Pointer) itup, (Pointer) tuple, sizeof(StirTuple));
+
+	/* Adjust maxoff and pd_lower */
+	opaque->maxoff++;
+	ptr = (Pointer) StirPageGetTuple(page, opaque->maxoff + 1);
+	((PageHeader) page)->pd_lower = ptr - page;
+
+	/* Assert we didn't overrun available space */
+	Assert(((PageHeader) page)->pd_lower <= ((PageHeader) page)->pd_upper);
+	return true;
+}
+
+/*
+ * Insert a new tuple into a STIR index.
+ */
+bool
+stirinsert(Relation index, Datum *values, bool *isnull,
+		  ItemPointer ht_ctid, Relation heapRel,
+		  IndexUniqueCheck checkUnique,
+		  bool indexUnchanged,
+		  struct IndexInfo *indexInfo)
+{
+	StirTuple *itup;
+	MemoryContext oldCtx;
+	MemoryContext insertCtx;
+	StirMetaPageData *metaData;
+	Buffer buffer,
+			metaBuffer;
+	Page page;
+	GenericXLogState *state;
+	uint16 blkNo;
+
+	/* Create temporary context for insert operation */
+	insertCtx = AllocSetContextCreate(CurrentMemoryContext,
+									  "Stir insert temporary context",
+									  ALLOCSET_DEFAULT_SIZES);
+
+	oldCtx = MemoryContextSwitchTo(insertCtx);
+
+	/* Create new tuple with heap pointer */
+	itup = (StirTuple *) palloc0(sizeof(StirTuple));
+	itup->heapPtr = *ht_ctid;
+
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+
+	for (;;)
+	{
+		LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if inserts are allowed */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+		blkNo = metaData->lastBlkNo;
+		/* Don't hold metabuffer lock while doing insert */
+		LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+
+		if (blkNo > 0)
+		{
+			buffer = ReadBuffer(index, blkNo);
+			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+			state = GenericXLogStart(index);
+			page = GenericXLogRegisterBuffer(state, buffer, 0);
+
+			Assert(!PageIsNew(page));
+
+			/* Try to add tuple to existing page */
+			if (StirPageAddItem(page, itup))
+			{
+				/* Success!  Apply the change, clean up, and exit */
+				GenericXLogFinish(state);
+				UnlockReleaseBuffer(buffer);
+				ReleaseBuffer(metaBuffer);
+				MemoryContextSwitchTo(oldCtx);
+				MemoryContextDelete(insertCtx);
+				return false;
+			}
+
+			/* Didn't fit, must try other pages */
+			GenericXLogAbort(state);
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Need to add new page - get exclusive lock on meta page */
+		LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		state = GenericXLogStart(index);
+		metaData = StirPageGetMeta(GenericXLogRegisterBuffer(state, metaBuffer, GENERIC_XLOG_FULL_IMAGE));
+		/* Check if another backend already extended the index */
+
+		if (blkNo != metaData->lastBlkNo)
+		{
+			Assert(blkNo < metaData->lastBlkNo);
+			/* Someone else inserted the new page into the index, lets try again /
+			 */
+			GenericXLogAbort(state);
+			LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+			continue;
+		}
+		else
+		{
+			/* Must extend the file */
+			buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
+									   EB_LOCK_FIRST);
+
+			page = GenericXLogRegisterBuffer(state, buffer, GENERIC_XLOG_FULL_IMAGE);
+			StirInitPage(page, 0);
+
+			if (!StirPageAddItem(page, itup))
+			{
+				/* We shouldn't be here since we're inserting to an empty page */
+				elog(ERROR, "could not add new stir tuple to empty page");
+			}
+
+			/* Update meta page with new last block number */
+			metaData->lastBlkNo = BufferGetBlockNumber(buffer);
+			GenericXLogFinish(state);
+
+			UnlockReleaseBuffer(buffer);
+			UnlockReleaseBuffer(metaBuffer);
+
+			MemoryContextSwitchTo(oldCtx);
+			MemoryContextDelete(insertCtx);
+
+			return false;
+		}
+	}
+}
+
+/*
+ * STIR doesn't support scans - these functions all error out
+ */
+IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not a not implemented", __func__)));
+}
+
+void
+stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+		  ScanKey orderbys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not a not implemented", __func__)));
+}
+
+void stirendscan(IndexScanDesc scan)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not a not implemented", __func__)));
+}
+
+/*
+ * Build a STIR index - only allowed for auxiliary indexes.
+ * Just initializes the meta page without any heap scans.
+ */
+IndexBuildResult *stirbuild(Relation heap, Relation index,
+						   struct IndexInfo *indexInfo)
+{
+	IndexBuildResult *result;
+
+	if (!indexInfo->ii_Auxiliary)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("STIR indexes are not supported to be built")));
+
+	StirInitMetapage(index, MAIN_FORKNUM);
+
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	result->heap_tuples = 0;
+	result->index_tuples = 0;
+	return result;
+}
+
+void stirbuildempty(Relation index)
+{
+	StirInitMetapage(index, INIT_FORKNUM);
+}
+
+IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+									 IndexBulkDeleteResult *stats,
+									 IndexBulkDeleteCallback callback,
+									 void *callback_state)
+{
+	Relation index = info->index;
+	BlockNumber blkno, npages;
+	Buffer buffer;
+	Page page;
+
+	/* For normal VACUUM, mark to skip inserts and warn about index drop needed */
+	if (!info->validate_index)
+	{
+		StirMarkAsSkipInserts(index);
+
+		ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" is not a not implemented, seems like this index need to be dropped", __func__)));
+		return NULL;
+	}
+
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	/*
+	 * Iterate over the pages. We don't care about concurrently added pages,
+	 * because index is marked as not-ready for that momment and index not
+	 * used for insert.
+	 */
+	npages = RelationGetNumberOfBlocks(index);
+	for (blkno = STIR_HEAD_BLKNO; blkno < npages; blkno++)
+	{
+		StirTuple *itup, *itupEnd;
+
+		vacuum_delay_point();
+
+		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
+									RBM_NORMAL, info->strategy);
+
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+		page = BufferGetPage(buffer);
+
+		if (PageIsNew(page))
+		{
+			UnlockReleaseBuffer(buffer);
+			continue;
+		}
+
+		itup = StirPageGetTuple(page, FirstOffsetNumber);
+		itupEnd = StirPageGetTuple(page, OffsetNumberNext(StirPageGetMaxOffset(page)));
+		while (itup < itupEnd)
+		{
+			/* Do we have to delete this tuple? */
+			if (callback(&itup->heapPtr, callback_state))
+			{
+				ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("we never delete in stir")));
+			}
+
+			itup = StirPageGetNextTuple(itup);
+		}
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	return stats;
+}
+
+/*
+ * Mark a STIR index to skip future inserts
+ */
+void StirMarkAsSkipInserts(Relation index)
+{
+	StirMetaPageData *metaData;
+	Buffer metaBuffer;
+	Page metaPage;
+	GenericXLogState *state;
+
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	state = GenericXLogStart(index);
+	metaPage = GenericXLogRegisterBuffer(state, metaBuffer,
+										 GENERIC_XLOG_FULL_IMAGE);
+	metaData = StirPageGetMeta(metaPage);
+	if (!metaData->skipInserts)
+	{
+		metaData->skipInserts = true;
+		GenericXLogFinish(state);
+	}
+	else
+	{
+		GenericXLogAbort(state);
+	}
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+										IndexBulkDeleteResult *stats)
+{
+	StirMarkAsSkipInserts(info->index);
+	ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("\"%s\" is not a not implemented, seems like this index need to be dropped", __func__)));
+	return NULL;
+}
+
+bytea *stiroptions(Datum reloptions, bool validate)
+{
+	return NULL;
+}
+
+void stircostestimate(PlannerInfo *root, IndexPath *path,
+					 double loop_count, Cost *indexStartupCost,
+					 Cost *indexTotalCost, Selectivity *indexSelectivity,
+					 double *indexCorrelation, double *indexPages)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not a not implemented", __func__)));
+}
\ No newline at end of file
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f6a1a2f3f90..82816580e3c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3402,6 +3402,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
+	ivinfo.validate_index = true;
 
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9a56de2282f..d54d310ba43 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -718,6 +718,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			ivinfo.message_level = elevel;
 			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
 			ivinfo.strategy = vac_strategy;
+			ivinfo.validate_index = false;
 
 			stats = index_vacuum_cleanup(&ivinfo, NULL);
 
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 67cba17a564..e4327b4f7dc 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -884,6 +884,7 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
 	ivinfo.estimated_count = pvs->shared->estimated_count;
 	ivinfo.num_heap_tuples = pvs->shared->reltuples;
 	ivinfo.strategy = pvs->bstrategy;
+	ivinfo.validate_index = false;
 
 	/* Update error traceback information */
 	pvs->indname = pstrdup(RelationGetRelationName(indrel));
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 7e5df7bea4d..44a8a1f2875 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -825,6 +825,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
+	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 81653febc18..194dbbe1d0e 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -52,6 +52,7 @@ typedef struct IndexVacuumInfo
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
 	double		num_heap_tuples;	/* tuples remaining in heap */
+	bool		validate_index; /* validating concurrently built index? */
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index df6923c9d50..0966397d344 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -51,8 +51,9 @@ typedef enum relopt_kind
 	RELOPT_KIND_VIEW = (1 << 9),
 	RELOPT_KIND_BRIN = (1 << 10),
 	RELOPT_KIND_PARTITIONED = (1 << 11),
+	RELOPT_KIND_STIR = (1 << 12),
 	/* if you add a new kind, make sure you update "last_default" too */
-	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_STIR,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/access/stir.h b/src/include/access/stir.h
new file mode 100644
index 00000000000..9943c42a97e
--- /dev/null
+++ b/src/include/access/stir.h
@@ -0,0 +1,117 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.h
+ *	  header file for postgres stir access method implementation.
+ *
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * src/include/access/stir.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef _STIR_H_
+#define _STIR_H_
+
+#include "amapi.h"
+#include "xlog.h"
+#include "generic_xlog.h"
+#include "itup.h"
+#include "fmgr.h"
+#include "nodes/pathnodes.h"
+
+/* Support procedures numbers */
+#define STIR_NPROC				0
+
+/* Scan strategies */
+#define STIR_NSTRATEGIES		1
+
+#define STIR_OPTIONS_PROC				0
+
+/* Macros for accessing stir page structures */
+#define StirPageGetOpaque(page) ((StirPageOpaque) PageGetSpecialPointer(page))
+#define StirPageGetMaxOffset(page) (StirPageGetOpaque(page)->maxoff)
+#define StirPageIsMeta(page) \
+	((StirPageGetOpaque(page)->flags & STIR_META) != 0)
+#define StirPageGetData(page)		((StirTuple *)PageGetContents(page))
+#define StirPageGetTuple(page, offset) \
+	((StirTuple *)(PageGetContents(page) \
+		+ sizeof(StirTuple) * ((offset) - 1)))
+#define StirPageGetNextTuple(tuple) \
+	((StirTuple *)((Pointer)(tuple) + sizeof(StirTuple)))
+
+
+
+/* Preserved page numbers */
+#define STIR_METAPAGE_BLKNO	(0)
+#define STIR_HEAD_BLKNO		(1) /* first data page */
+
+
+/* Opaque for stir pages */
+typedef struct StirPageOpaqueData
+{
+	OffsetNumber maxoff;		/* number of index tuples on page */
+	uint16		flags;			/* see bit definitions below */
+	uint16		unused;			/* placeholder to force maxaligning of size of
+								 * StirPageOpaqueData and to place
+								 * stir_page_id exactly at the end of page */
+	uint16		stir_page_id;	/* for identification of STIR indexes */
+} StirPageOpaqueData;
+
+/* Stir page flags */
+#define STIR_META		(1<<0)
+
+typedef StirPageOpaqueData *StirPageOpaque;
+
+#define STIR_PAGE_ID		0xFF84
+
+/* Metadata of stir index */
+typedef struct StirMetaPageData
+{
+	uint32		magickNumber;
+	uint16		lastBlkNo;
+	bool		skipInserts;	/* should we just exit without any inserts */
+} StirMetaPageData;
+
+/* Magic number to distinguish stir pages from others */
+#define STIR_MAGICK_NUMBER (0xDBAC0DEF)
+
+#define StirPageGetMeta(page)	((StirMetaPageData *) PageGetContents(page))
+
+typedef struct StirTuple
+{
+	ItemPointerData heapPtr;
+} StirTuple;
+
+#define StirPageGetFreeSpace(state, page) \
+	(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+		- StirPageGetMaxOffset(page) * (sizeof(StirTuple)) \
+		- MAXALIGN(sizeof(StirPageOpaqueData)))
+
+extern void StirFillMetapage(Relation index, Page metaPage, bool skipInserts);
+extern void StirInitMetapage(Relation index, ForkNumber forknum);
+extern void StirInitPage(Page page, uint16 flags);
+extern void StirMarkAsSkipInserts(Relation index);
+
+/* index access method interface functions */
+extern bool stirvalidate(Oid opclassoid);
+extern bool stirinsert(Relation index, Datum *values, bool *isnull,
+					 ItemPointer ht_ctid, Relation heapRel,
+					 IndexUniqueCheck checkUnique,
+					 bool indexUnchanged,
+					 struct IndexInfo *indexInfo);
+extern IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys);
+extern void stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+					 ScanKey orderbys, int norderbys);
+extern void stirendscan(IndexScanDesc scan);
+extern IndexBuildResult *stirbuild(Relation heap, Relation index,
+								 struct IndexInfo *indexInfo);
+extern void stirbuildempty(Relation index);
+extern IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+										   IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+										   void *callback_state);
+extern IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+											  IndexBulkDeleteResult *stats);
+extern bytea *stiroptions(Datum reloptions, bool validate);
+
+#endif
\ No newline at end of file
diff --git a/src/include/catalog/pg_am.dat b/src/include/catalog/pg_am.dat
index db874902820..51350df0bf0 100644
--- a/src/include/catalog/pg_am.dat
+++ b/src/include/catalog/pg_am.dat
@@ -33,5 +33,8 @@
 { oid => '3580', oid_symbol => 'BRIN_AM_OID',
   descr => 'block range index (BRIN) access method',
   amname => 'brin', amhandler => 'brinhandler', amtype => 'i' },
+{ oid => '5555', oid_symbol => 'STIR_AM_OID',
+  descr => 'short term index replacement access method',
+  amname => 'stir', amhandler => 'stirhandler', amtype => 'i' },
 
 ]
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index f503c652ebc..7067452a035 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -488,4 +488,7 @@
 
 # no brin opclass for the geometric types except box
 
+# allow any types for STIR
+{ opcmethod => 'stir', opcname => 'stir_ops', opcfamily => 'stir/any_ops',
+  opcintype => 'any' },
 ]
diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat
index c8ac8c73def..41ea0c3ca50 100644
--- a/src/include/catalog/pg_opfamily.dat
+++ b/src/include/catalog/pg_opfamily.dat
@@ -304,5 +304,7 @@
   opfmethod => 'hash', opfname => 'multirange_ops' },
 { oid => '6158',
   opfmethod => 'gist', opfname => 'multirange_ops' },
+{ oid => '5558',
+  opfmethod => 'stir', opfname => 'any_ops' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0f22c217235..59f50e2b027 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -935,6 +935,10 @@
   proname => 'brinhandler', provolatile => 'v',
   prorettype => 'index_am_handler', proargtypes => 'internal',
   prosrc => 'brinhandler' },
+{ oid => '5556', descr => 'short term index replacement access method handler',
+  proname => 'stirhandler', provolatile => 'v',
+  prorettype => 'index_am_handler', proargtypes => 'internal',
+  prosrc => 'stirhandler' },
 { oid => '3952', descr => 'brin: standalone scan new table pages',
   proname => 'brin_summarize_new_values', provolatile => 'v',
   proparallel => 'u', prorettype => 'int4', proargtypes => 'regclass',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7f71b7625df..748655fd0cf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -172,12 +172,13 @@ typedef struct ExprState
  *		BrokenHotChain		did we detect any broken HOT chains?
  *		Summarizing			is it a summarizing index?
  *		ParallelWorkers		# of workers requested (excludes leader)
+ *		Auxiliary			# index-helper for concurrent build?
  *		Am					Oid of index AM
  *		AmCache				private cache area for index AM
  *		Context				memory context holding this IndexInfo
  *
- * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
- * during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
+ * are used only during index build; they're conventionally zeroed otherwise.
  * ----------------
  */
 typedef struct IndexInfo
@@ -206,6 +207,7 @@ typedef struct IndexInfo
 	bool		ii_Summarizing;
 	bool		ii_WithoutOverlaps;
 	int			ii_ParallelWorkers;
+	bool		ii_Auxiliary;
 	Oid			ii_Am;
 	void	   *ii_AmCache;
 	MemoryContext ii_Context;
diff --git a/src/include/utils/index_selfuncs.h b/src/include/utils/index_selfuncs.h
index a41cd2b7fd9..61f3d3dea0c 100644
--- a/src/include/utils/index_selfuncs.h
+++ b/src/include/utils/index_selfuncs.h
@@ -62,6 +62,14 @@ extern void spgcostestimate(struct PlannerInfo *root,
 							Selectivity *indexSelectivity,
 							double *indexCorrelation,
 							double *indexPages);
+extern void stircostestimate(struct PlannerInfo *root,
+							struct IndexPath *path,
+							double loop_count,
+							Cost *indexStartupCost,
+							Cost *indexTotalCost,
+							Selectivity *indexSelectivity,
+							double *indexCorrelation,
+							double *indexPages);
 extern void gincostestimate(struct PlannerInfo *root,
 							struct IndexPath *path,
 							double loop_count,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index 7ab6113c619..92c033a2010 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -173,7 +173,13 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
  spgist | can_exclude   | t
  spgist | can_include   | t
  spgist | bogus         | 
-(36 rows)
+ stir   | can_order     | f
+ stir   | can_unique    | f
+ stir   | can_multi_col | t
+ stir   | can_exclude   | f
+ stir   | can_include   | t
+ stir   | bogus         | 
+(42 rows)
 
 --
 -- additional checks for pg_index_column_has_property
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index b673642ad1d..2645d970629 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2119,9 +2119,10 @@ FROM pg_opclass AS c1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS a1
                  WHERE a1.amopfamily = c1.opcfamily
                    AND binary_coercible(c1.opcintype, a1.amoplefttype));
- opcname | opcfamily 
----------+-----------
-(0 rows)
+ opcname  | opcfamily 
+----------+-----------
+ stir_ops |      5558
+(1 row)
 
 -- Check that each operator listed in pg_amop has an associated opclass,
 -- that is one whose opcintype matches oprleft (possibly by coercion).
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index 36dc31c16c4..a6d86cb4ca0 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5074,7 +5074,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA *
 List of access methods
@@ -5088,7 +5089,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA h*
 List of access methods
@@ -5113,9 +5115,9 @@ List of access methods
 
 \dA: extra argument "bar" ignored
 \dA+
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5124,12 +5126,13 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ *
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5138,7 +5141,8 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ h*
                      List of access methods
-- 
2.43.0



view thread (33+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
  In-Reply-To: <CANtu0oiuUF7L0wTGxOHfumyoVge3n7C4rAjdmFo=efeEwobXbg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox