Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
10+ messages / 2 participants
[nested] [flat]

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-02-24 19:48  Mihail Nikalayeu <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Mihail Nikalayeu @ 2026-02-24 19:48 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; +Cc: Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Hello! Rebased.

Mikhail.


Attachments:

  [text/x-patch] v29-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch (30.9K, 2-v29-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch)
  download | inline diff:
From 2f6113d9602e22a171af9e2a3ae9bdf38d7d0d10 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 14:36:31 +0100
Subject: [PATCH v29 5/7] Track and drop auxiliary indexes in DROP/REINDEX

During concurrent index operations, auxiliary indexes may be left as orphaned objects when errors occur (junk auxiliary indexes).

This patch improves the handling of such auxiliary indexes:
- add auxiliaryForIndexId parameter to index_create() to track dependencies between main and auxiliary indexes
- automatically drop auxiliary indexes when the main index is dropped
- delete junk auxiliary indexes properly during REINDEX operations
---
 doc/src/sgml/ref/create_index.sgml         |  14 ++-
 doc/src/sgml/ref/reindex.sgml              |   8 +-
 src/backend/catalog/dependency.c           |   2 +-
 src/backend/catalog/index.c                |  71 ++++++++++----
 src/backend/catalog/pg_depend.c            |  58 ++++++++++++
 src/backend/catalog/toasting.c             |   1 +
 src/backend/commands/indexcmds.c           |  37 +++++++-
 src/backend/commands/tablecmds.c           |  52 +++++++++-
 src/backend/nodes/makefuncs.c              |   3 +-
 src/include/catalog/dependency.h           |   1 +
 src/include/nodes/execnodes.h              |   2 +
 src/include/nodes/makefuncs.h              |   2 +-
 src/test/regress/expected/create_index.out | 105 +++++++++++++++++++--
 src/test/regress/sql/create_index.sql      |  57 ++++++++++-
 14 files changed, 371 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 12c88587a79..7f751453317 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -668,10 +668,16 @@ Indexes:
     "idx_ccaux" stir (col) INVALID
 </programlisting>
 
-    The recommended recovery
-    method in such cases is to drop these indexes and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
-    to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
+    The recommended recovery method in such cases is to drop the index with
+    <command>DROP INDEX</command>. The auxiliary index (suffixed with
+    <literal>_ccaux</literal>) will be automatically dropped when the main
+    index is dropped. After dropping the indexes, you can try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is to
+    rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>,
+    which will also handle cleanup of any invalid auxiliary indexes.)
+    If the only invalid index is one suffixed <literal>_ccaux</literal>,
+    recommended recovery method is just <literal>DROP INDEX</literal>
+    for that index.
    </para>
 
    <para>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 1c3c7a97f6a..384c5fc8b3f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -476,11 +476,15 @@ Indexes:
     <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
     recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
-    then attempt <command>REINDEX CONCURRENTLY</command> again.
+    then attempt <command>REINDEX CONCURRENTLY</command> again. The auxiliary index
+    (suffixed with <literal>_ccaux</literal>) will be automatically dropped
+    along with its main index.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
     the recommended recovery method is to just drop said index, since the
-    rebuild proper has been successful.
+    rebuild proper has been successful. If the only
+    invalid index is one suffixed <literal>_ccaux</literal>, recommended
+    recovery method is just <literal>DROP INDEX</literal> for that index.
     A nonzero number may be appended to the suffix of the invalid index
     names to keep them unique, like <literal>_ccnew1</literal>,
     <literal>_ccold2</literal>, etc.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 570c434ede8..c1c0d730815 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -287,7 +287,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.  (Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object, 0);
+	AcquireDeletionLock(object, flags);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4f77627fb3b..91125d37150 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -773,6 +773,8 @@ index_create(Relation heapRelation,
 		   ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
 	/* partitioned indexes must never be "built" by themselves */
 	Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
+	/* ii_AuxiliaryForIndexId and INDEX_CREATE_AUXILIARY are required both or neither */
+	Assert(OidIsValid(indexInfo->ii_AuxiliaryForIndexId) == auxiliary);
 
 	relkind = partitioned ? RELKIND_PARTITIONED_INDEX : RELKIND_INDEX;
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -1178,6 +1180,15 @@ index_create(Relation heapRelation,
 			recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_SEC);
 		}
 
+		/*
+		 * Record dependency on the main index in case of auxiliary index.
+		 */
+		if (OidIsValid(indexInfo->ii_AuxiliaryForIndexId))
+		{
+			ObjectAddressSet(referenced, RelationRelationId, indexInfo->ii_AuxiliaryForIndexId);
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+		}
+
 		/* placeholder for normal dependencies */
 		addrs = new_object_addresses();
 
@@ -1410,7 +1421,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							true,
 							indexRelation->rd_indam->amsummarizing,
 							oldInfo->ii_WithoutOverlaps,
-							false);
+							false,
+							InvalidOid);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1578,7 +1590,8 @@ index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
 							true,
 							false,	/* aux are not summarizing */
 							false,	/* aux are not without overlaps */
-							true	/* auxiliary */);
+							true	/* auxiliary */,
+							mainIndexId /* auxiliaryForIndexId */);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -2614,7 +2627,8 @@ BuildIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid /* auxiliary_for_index_id is set only during build */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2675,7 +2689,8 @@ BuildDummyIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3840,6 +3855,7 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 				heapRelation;
 	Oid			heapId;
 	Oid			save_userid;
+	Oid			junkAuxIndexId;
 	int			save_sec_context;
 	int			save_nestlevel;
 	IndexInfo  *indexInfo;
@@ -3896,6 +3912,19 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
 	}
 
+	/* Check for the auxiliary index for that index, it needs to be dropped */
+	junkAuxIndexId = get_auxiliary_index(indexId);
+	if (OidIsValid(junkAuxIndexId))
+	{
+		ObjectAddress object;
+		object.classId = RelationRelationId;
+		object.objectId = junkAuxIndexId;
+		object.objectSubId = 0;
+		performDeletion(&object, DROP_RESTRICT,
+								 PERFORM_DELETION_INTERNAL |
+								 PERFORM_DELETION_QUIETLY);
+	}
+
 	/*
 	 * Open the target index relation and get an exclusive lock on it, to
 	 * ensure that no one else is touching this particular index.
@@ -4184,7 +4213,8 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 {
 	Relation	rel;
 	Oid			toast_relid;
-	List	   *indexIds;
+	List	   *indexIds,
+			   *auxIndexIds = NIL;
 	char		persistence;
 	bool		result = false;
 	ListCell   *indexId;
@@ -4273,13 +4303,30 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	else
 		persistence = rel->rd_rel->relpersistence;
 
+	foreach(indexId, indexIds)
+	{
+		Oid			indexOid = lfirst_oid(indexId);
+		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* All STIR indexes are auxiliary indexes */
+		if (indexAm == STIR_AM_OID)
+		{
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			auxIndexIds = lappend_oid(auxIndexIds, indexOid);
+		}
+	}
+
 	/* Reindex all the indexes. */
 	i = 1;
 	foreach(indexId, indexIds)
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
-		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* Auxiliary indexes are going to be dropped during main index rebuild */
+		if (list_member_oid(auxIndexIds, indexOid))
+			continue;
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4305,18 +4352,6 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
-		if (indexAm == STIR_AM_OID)
-		{
-			ereport(WARNING,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
-							get_namespace_name(indexNamespaceId),
-							get_rel_name(indexOid))));
-			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
-				RemoveReindexPending(indexOid);
-			continue;
-		}
-
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 07c2d41c189..7e0e29bdb5b 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -20,6 +20,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am_d.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_depend.h"
 #include "catalog/pg_extension.h"
@@ -1108,6 +1109,63 @@ get_index_constraint(Oid indexId)
 	return constraintId;
 }
 
+/*
+ * get_auxiliary_index
+ *		Given the OID of an index, return the OID of its auxiliary
+ *		index, or InvalidOid if there is no auxiliary index.
+ */
+Oid
+get_auxiliary_index(Oid indexId)
+{
+	Oid			auxiliaryIndexOid = InvalidOid;
+	Relation	depRel;
+	ScanKeyData key[3];
+	SysScanDesc scan;
+	HeapTuple	tup;
+
+	/* Search the dependency table for the index */
+	depRel = table_open(DependRelationId, AccessShareLock);
+
+	ScanKeyInit(&key[0],
+				Anum_pg_depend_refclassid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationRelationId));
+	ScanKeyInit(&key[1],
+				Anum_pg_depend_refobjid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indexId));
+	ScanKeyInit(&key[2],
+				Anum_pg_depend_refobjsubid,
+				BTEqualStrategyNumber, F_INT4EQ,
+				Int32GetDatum(0));
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+							  NULL, 3, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend deprec = (Form_pg_depend) GETSTRUCT(tup);
+
+		/*
+		 * We assume AUTO dependency on index with rel_kind
+		 * of RELKIND_INDEX and AM eq STIR is that we are looking for.
+		 */
+		if (deprec->classid == RelationRelationId &&
+			(deprec->deptype == DEPENDENCY_AUTO) &&
+			get_rel_relkind(deprec->objid) == RELKIND_INDEX &&
+			get_rel_relam(deprec->objid) == STIR_AM_OID)
+		{
+			auxiliaryIndexOid = deprec->objid;
+			break;
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(depRel, AccessShareLock);
+
+	return auxiliaryIndexOid;
+}
+
 /*
  * get_index_ref_constraints
  *		Given the OID of an index, return the OID of all foreign key
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 87e01e74ad7..c511563f3ff 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -308,6 +308,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
 	indexInfo->ii_Auxiliary = false;
+	indexInfo->ii_AuxiliaryForIndexId = InvalidOid;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f583239e091..599e3375833 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -246,7 +246,7 @@ CheckIndexCompatible(Oid oldId,
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
 							  false, false, amsummarizing,
-							  isWithoutOverlaps, isauxiliary);
+							  isWithoutOverlaps, isauxiliary, InvalidOid);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -946,7 +946,8 @@ DefineIndex(ParseState *pstate,
 							  concurrent,
 							  amissummarizing,
 							  stmt->iswithoutoverlaps,
-							  false);
+							  false,
+							  InvalidOid);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -3708,6 +3709,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		Oid			indexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -4057,6 +4059,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
@@ -4064,6 +4067,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		int			save_nestlevel;
 		Relation	newIndexRel;
 		Relation	auxIndexRel;
+		Relation	junkAuxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -4137,12 +4141,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 												   tablespaceid,
 												   auxConcurrentName);
 
+		/* Search for auxiliary index for reindexed index, to drop it */
+		junkAuxIndexId = get_auxiliary_index(idx->indexId);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
+		if (OidIsValid(junkAuxIndexId))
+			junkAuxIndexRel = index_open(junkAuxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4152,6 +4161,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
 		newidx->auxIndexId = auxIndexId;
+		newidx->junkAuxIndexId = junkAuxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4173,10 +4183,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		if (OidIsValid(junkAuxIndexId))
+		{
+			lockrelid = palloc_object(LockRelId);
+			*lockrelid = junkAuxIndexRel->rd_lockInfo.lockRelId;
+			relationLocks = lappend(relationLocks, lockrelid);
+		}
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		if (OidIsValid(junkAuxIndexId))
+			index_close(junkAuxIndexRel, NoLock);
 		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
@@ -4365,7 +4383,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 	/*
 	 * At this moment all target indexes are marked as "ready-to-insert". So,
-	 * we are free to start process of dropping auxiliary indexes.
+	 * we are free to start process of dropping auxiliary indexes - including
+	 * junk indexes detected earlier.
 	 */
 	foreach(lc, newIndexIds)
 	{
@@ -4388,6 +4407,9 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 */
 		PushActiveSnapshot(GetTransactionSnapshot());
 		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		/* Ensure the junk index is marked as non-ready */
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_set_state_flags(newidx->junkAuxIndexId, INDEX_DROP_CLEAR_READY);
 		PopActiveSnapshot();
 
 		CommitTransactionCommand();
@@ -4607,6 +4629,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_concurrently_set_dead(newidx->tableId, newidx->junkAuxIndexId);
 
 		PopActiveSnapshot();
 	}
@@ -4658,6 +4682,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			object.objectSubId = 0;
 
 			add_exact_object_address(&object, objects);
+
+			if (OidIsValid(idx->junkAuxIndexId))
+			{
+				object.objectId = idx->junkAuxIndexId;
+				object.objectSubId = 0;
+				add_exact_object_address(&object, objects);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b04b0dbd2a0..2227cb2c7a1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1548,6 +1548,8 @@ RemoveRelations(DropStmt *drop)
 	ListCell   *cell;
 	int			flags = 0;
 	LOCKMODE	lockmode = AccessExclusiveLock;
+	MemoryContext private_context,
+				  oldcontext;
 
 	/* DROP CONCURRENTLY uses a weaker lock, and has some restrictions */
 	if (drop->concurrent)
@@ -1608,9 +1610,20 @@ RemoveRelations(DropStmt *drop)
 			relkind = 0;		/* keep compiler quiet */
 			break;
 	}
+	/*
+	 * Create a memory context that will survive forced transaction commits we
+	 * may need to do below (in case of concurrent index drop).
+	 * Since it is a child of PortalContext, it will go away eventually even if
+	 * we suffer an error; there's no need for special abort cleanup logic.
+	 */
+	private_context = AllocSetContextCreate(PortalContext,
+											"RemoveRelations",
+											ALLOCSET_SMALL_SIZES);
 
+	oldcontext = MemoryContextSwitchTo(private_context);
 	/* Lock and validate each relation; build a list of object addresses */
 	objects = new_object_addresses();
+	MemoryContextSwitchTo(oldcontext);
 
 	foreach(cell, drop->objects)
 	{
@@ -1662,6 +1675,38 @@ RemoveRelations(DropStmt *drop)
 			flags |= PERFORM_DELETION_CONCURRENTLY;
 		}
 
+		/*
+		 * Concurrent index drop requires it to be the first transaction. But in
+		 * case we have junk auxiliary index - we want to drop it too (and also
+		 * in a concurrent way). In this case perform silent internal deletion
+		 * of auxiliary index, and restore transaction state. It is fine to do it
+		 * in the loop because there is only single element in drop->objects.
+		 */
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
+			state.actual_relkind == RELKIND_INDEX)
+		{
+			Oid junkAuxIndexOid = get_auxiliary_index(relOid);
+			if (OidIsValid(junkAuxIndexOid))
+			{
+				ObjectAddress object;
+				object.classId = RelationRelationId;
+				object.objectId = junkAuxIndexOid;
+				object.objectSubId = 0;
+				performDeletion(&object, DROP_RESTRICT,
+										 PERFORM_DELETION_CONCURRENTLY |
+										 PERFORM_DELETION_INTERNAL |
+										 PERFORM_DELETION_QUIETLY);
+				CommitTransactionCommand();
+				MemoryContextDelete(private_context);
+
+				/* And start again - now without auxiliary index. */
+				StartTransactionCommand();
+				PushActiveSnapshot(GetTransactionSnapshot());
+				RemoveRelations(drop);
+				return;
+			}
+		}
+
 		/*
 		 * Concurrent index drop cannot be used with partitioned indexes,
 		 * either.
@@ -1690,12 +1735,17 @@ RemoveRelations(DropStmt *drop)
 		obj.objectId = relOid;
 		obj.objectSubId = 0;
 
+		oldcontext = MemoryContextSwitchTo(private_context);
 		add_exact_object_address(&obj, objects);
+		MemoryContextSwitchTo(oldcontext);
 	}
 
+	/* Deletion may involve multiple commits, so, switch to memory context */
+	oldcontext = MemoryContextSwitchTo(private_context);
 	performMultipleDeletions(objects, drop->behavior, flags);
+	MemoryContextSwitchTo(oldcontext);
 
-	free_object_addresses(objects);
+	MemoryContextDelete(private_context);
 }
 
 /*
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 8cb2231a7a8..ccc1294e730 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps, bool auxiliary)
+			  bool withoutoverlaps, bool auxiliary, Oid auxiliary_for_index_id)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -851,6 +851,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
 	n->ii_Auxiliary = auxiliary;
+	n->ii_AuxiliaryForIndexId = auxiliary_for_index_id;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 2f3c1eae3c7..6ae210c584e 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -193,6 +193,7 @@ extern List *getOwnedSequences(Oid relid);
 extern Oid	getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok);
 
 extern Oid	get_index_constraint(Oid indexId);
+extern Oid	get_auxiliary_index(Oid indexId);
 
 extern List *get_index_ref_constraints(Oid indexId);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6ce9154b28d..e8236eede00 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -220,6 +220,8 @@ typedef struct IndexInfo
 	int			ii_ParallelWorkers;
 	/* is auxiliary for concurrent index build? */
 	bool		ii_Auxiliary;
+	/* if creating an auxiliary index, the OID of the main index; otherwise InvalidOid. */
+	Oid			ii_AuxiliaryForIndexId;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 6cf45a68cbe..92dff90c3de 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -100,7 +100,7 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
 								bool summarizing, bool withoutoverlaps,
-								bool auxiliary);
+								bool auxiliary, Oid auxiliary_for_index_id);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index aa4fa76358a..3ed8999d74f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3265,20 +3265,109 @@ ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-ERROR:  relation "concur_reindex_tab4" does not exist
-LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-                    ^
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
-ERROR:  could not create unique index "aux_index_ind6"
-DETAIL:  Key (c1)=(1) is duplicated.
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
 WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
 HINT:  Use DROP INDEX or REINDEX INDEX.
 WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
 NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+-- Make sure it is still exists
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
+DROP INDEX aux_index_ind6;
+ERROR:  index "aux_index_ind6" does not exist
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 7ae8e44019b..6d597790b56 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1340,11 +1340,62 @@ REINDEX INDEX aux_index_ind6_ccaux;
 -- Concurrently also
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
+-- Make sure it is still exists
+\d aux_index_tab5
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.53.0



  [text/x-patch] v29-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch (36.2K, 3-v29-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch)
  download | inline diff:
From 0b112374e1eb9e4fae4a91eed2326cf049a3003a Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sun, 11 Jan 2026 19:27:52 +0300
Subject: [PATCH v29 2/7] Add STIR access method and flags related to auxiliary
 indexes

This patch provides infrastructure for following enhancements to concurrent index builds by:
- ii_Auxiliary in IndexInfo: indicates that an index is an auxiliary index used during concurrent index build
- validate_index in IndexVacuumInfo: set if index_bulk_delete called during the validation phase of concurrent index build
- STIR (Short-Term Index Replacement) access method is introduced, intended solely for short-lived, auxiliary usage

STIR functions are designed as an ephemeral helper during concurrent index builds, temporarily storing TIDs without providing the full features of a typical access method. As such, it raises warnings or errors when accessed outside its specialized usage path.

Planned to be used in following commits.
---
 contrib/pgstattuple/pgstattuple.c        |   3 +
 src/backend/access/Makefile              |   1 +
 src/backend/access/heap/vacuumlazy.c     |   2 +
 src/backend/access/meson.build           |   1 +
 src/backend/access/stir/Makefile         |  18 +
 src/backend/access/stir/meson.build      |   5 +
 src/backend/access/stir/stir.c           | 559 +++++++++++++++++++++++
 src/backend/catalog/index.c              |   1 +
 src/backend/catalog/toasting.c           |   1 +
 src/backend/commands/analyze.c           |   1 +
 src/backend/commands/vacuumparallel.c    |   1 +
 src/backend/nodes/makefuncs.c            |   1 +
 src/include/access/genam.h               |   1 +
 src/include/access/reloptions.h          |   3 +-
 src/include/access/stir.h                | 113 +++++
 src/include/catalog/pg_am.dat            |   3 +
 src/include/catalog/pg_opclass.dat       |   4 +
 src/include/catalog/pg_opfamily.dat      |   2 +
 src/include/catalog/pg_proc.dat          |   4 +
 src/include/nodes/execnodes.h            |   7 +-
 src/include/utils/index_selfuncs.h       |   8 +
 src/test/regress/expected/amutils.out    |   8 +-
 src/test/regress/expected/opr_sanity.out |   7 +-
 src/test/regress/expected/psql.out       |  24 +-
 24 files changed, 760 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/stir/Makefile
 create mode 100644 src/backend/access/stir/meson.build
 create mode 100644 src/backend/access/stir/stir.c
 create mode 100644 src/include/access/stir.h

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 6a7f8cb4a7c..5b5984e3aa2 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -285,6 +285,9 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 			case SPGIST_AM_OID:
 				err = "spgist index";
 				break;
+			case STIR_AM_OID:
+				err = "stir index";
+				break;
 			case BRIN_AM_OID:
 				err = "brin index";
 				break;
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index e88d72ea039..ebbcfa90715 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -19,6 +19,7 @@ SUBDIRS	    = \
 	nbtree \
 	rmgrdesc \
 	spgist \
+	stir \
 	sequence \
 	table \
 	tablesample \
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4be267ff657..42b1af94c8d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3132,6 +3132,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
@@ -3183,6 +3184,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
diff --git a/src/backend/access/meson.build b/src/backend/access/meson.build
index 5fd18de74f9..7219c65f365 100644
--- a/src/backend/access/meson.build
+++ b/src/backend/access/meson.build
@@ -11,6 +11,7 @@ subdir('nbtree')
 subdir('rmgrdesc')
 subdir('sequence')
 subdir('spgist')
+subdir('stir')
 subdir('table')
 subdir('tablesample')
 subdir('transam')
diff --git a/src/backend/access/stir/Makefile b/src/backend/access/stir/Makefile
new file mode 100644
index 00000000000..8785dab37bd
--- /dev/null
+++ b/src/backend/access/stir/Makefile
@@ -0,0 +1,18 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/stir
+#
+# IDENTIFICATION
+#    src/backend/access/stir/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/stir
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	stir.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/stir/meson.build b/src/backend/access/stir/meson.build
new file mode 100644
index 00000000000..4b7ad15346c
--- /dev/null
+++ b/src/backend/access/stir/meson.build
@@ -0,0 +1,5 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+backend_sources += files(
+	'stir.c',
+)
diff --git a/src/backend/access/stir/stir.c b/src/backend/access/stir/stir.c
new file mode 100644
index 00000000000..7ebf422c93b
--- /dev/null
+++ b/src/backend/access/stir/stir.c
@@ -0,0 +1,559 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.c
+ *	  Implementation of Short-Term Index Replacement.
+ *
+ * STIR is a specialized access method type designed for temporary storage
+ * of TID values during concurrent index build operations.
+ *
+ * The typical lifecycle of a STIR index is:
+ * 1. created as an auxiliary index for CIC/RIC
+ * 2. accepts inserts for a period
+ * 3. stirbulkdelete called during index validation phase
+ * 4. gets dropped
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/stir/stir.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/stir.h"
+#include "miscadmin.h"
+#include "access/amvalidate.h"
+#include "access/htup_details.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_amop.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_opfamily.h"
+#include "commands/vacuum.h"
+#include "storage/bufmgr.h"
+#include "utils/catcache.h"
+#include "utils/fmgrprotos.h"
+#include "utils/index_selfuncs.h"
+#include "utils/memutils.h"
+#include "utils/regproc.h"
+#include "utils/syscache.h"
+
+/*
+ * Stir handler function: return IndexAmRoutine with access method parameters
+ * and callbacks.
+ */
+Datum
+stirhandler(PG_FUNCTION_ARGS)
+{
+	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
+
+	/* Set STIR-specific strategy and procedure numbers */
+	amroutine->amstrategies = STIR_NSTRATEGIES;
+	amroutine->amsupport = STIR_NPROC;
+	amroutine->amoptsprocnum = STIR_OPTIONS_PROC;
+
+	/* STIR doesn't support most index operations */
+	amroutine->amcanorder = false;
+	amroutine->amcanorderbyop = false;
+	amroutine->amcanbackward = false;
+	amroutine->amcanunique = false;
+	amroutine->amcanmulticol = true;
+	amroutine->amoptionalkey = true;
+	amroutine->amsearcharray = false;
+	amroutine->amsearchnulls = false;
+	amroutine->amstorage = false;
+	amroutine->amclusterable = false;
+	amroutine->ampredlocks = false;
+	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
+	amroutine->amcaninclude = true;
+	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amparallelvacuumoptions =
+			VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_CLEANUP;
+	amroutine->amkeytype = InvalidOid;
+
+	/* Set up function callbacks */
+	amroutine->ambuild = stirbuild;
+	amroutine->ambuildempty = stirbuildempty;
+	amroutine->aminsert = stirinsert;
+	amroutine->aminsertcleanup = NULL;
+	amroutine->ambulkdelete = stirbulkdelete;
+	amroutine->amvacuumcleanup = stirvacuumcleanup;
+	amroutine->amcanreturn = NULL;
+	amroutine->amcostestimate = stircostestimate;
+	amroutine->amoptions = stiroptions;
+	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
+	amroutine->amvalidate = stirvalidate;
+	amroutine->amadjustmembers = NULL;
+	amroutine->ambeginscan = stirbeginscan;
+	amroutine->amrescan = stirrescan;
+	amroutine->amgettuple = NULL;
+	amroutine->amgetbitmap = NULL;
+	amroutine->amendscan = stirendscan;
+	amroutine->ammarkpos = NULL;
+	amroutine->amrestrpos = NULL;
+	amroutine->amestimateparallelscan = NULL;
+	amroutine->aminitparallelscan = NULL;
+	amroutine->amparallelrescan = NULL;
+
+	PG_RETURN_POINTER(amroutine);
+}
+
+/*
+ * Validates operator class for STIR index.
+ *
+ * STIR is not a real index, so validate may be skipped.
+ * But we do it just for consistency.
+ */
+bool
+stirvalidate(Oid opclassoid)
+{
+	bool result = true;
+	HeapTuple classtup;
+	Form_pg_opclass classform;
+	Oid opfamilyoid;
+	HeapTuple familytup;
+	Form_pg_opfamily familyform;
+	char *opfamilyname;
+	CatCList *oprlist;
+	int i;
+
+	/* Fetch opclass information */
+	classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
+	if (!HeapTupleIsValid(classtup))
+		elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
+	classform = (Form_pg_opclass) GETSTRUCT(classtup);
+
+	opfamilyoid = classform->opcfamily;
+
+	/* Fetch opfamily information */
+	familytup = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfamilyoid));
+	if (!HeapTupleIsValid(familytup))
+		elog(ERROR, "cache lookup failed for operator family %u", opfamilyoid);
+	familyform = (Form_pg_opfamily) GETSTRUCT(familytup);
+
+	opfamilyname = NameStr(familyform->opfname);
+
+	/* Fetch all operators and support functions of the opfamily */
+	oprlist = SearchSysCacheList1(AMOPSTRATEGY, ObjectIdGetDatum(opfamilyoid));
+
+	/* Check individual operators */
+	for (i = 0; i < oprlist->n_members; i++)
+	{
+		HeapTuple oprtup = &oprlist->members[i]->tuple;
+		Form_pg_amop oprform = (Form_pg_amop) GETSTRUCT(oprtup);
+
+		/* Check it's allowed strategy for stir */
+		if (oprform->amopstrategy < 1 ||
+			oprform->amopstrategy > STIR_NSTRATEGIES)
+		{
+			ereport(INFO,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+						errmsg("stir opfamily %s contains operator %s with invalid strategy number %d",
+							opfamilyname,
+							format_operator(oprform->amopopr),
+							oprform->amopstrategy)));
+			result = false;
+		}
+
+		/* stir doesn't support ORDER BY operators */
+		if (oprform->amoppurpose != AMOP_SEARCH ||
+			OidIsValid(oprform->amopsortfamily))
+		{
+			ereport(INFO,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+						errmsg("stir opfamily %s contains invalid ORDER BY specification for operator %s",
+							opfamilyname,
+							format_operator(oprform->amopopr))));
+			result = false;
+		}
+
+		/* Check operator signature --- same for all stir strategies */
+		if (!check_amop_signature(oprform->amopopr, BOOLOID,
+								  oprform->amoplefttype,
+								  oprform->amoprighttype))
+		{
+			ereport(INFO,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+						errmsg("stir opfamily %s contains operator %s with wrong signature",
+							opfamilyname,
+							format_operator(oprform->amopopr))));
+			result = false;
+		}
+	}
+
+	ReleaseCatCacheList(oprlist);
+	ReleaseSysCache(familytup);
+	ReleaseSysCache(classtup);
+
+	return result;
+}
+
+/*
+ * Initialize meta-page of a STIR index.
+ * The skipInserts flag determines if new inserts will be accepted or skipped.
+ */
+void
+StirFillMetapage(Relation index, Page metaPage, bool skipInserts)
+{
+	StirMetaPageData *metadata;
+
+	StirInitPage(metaPage, STIR_META);
+	metadata = StirPageGetMeta(metaPage);
+	memset(metadata, 0, sizeof(StirMetaPageData));
+	metadata->magicNumber = STIR_MAGIC_NUMBER;
+	metadata->skipInserts = skipInserts;
+	((PageHeader) metaPage)->pd_lower = ((char *) metadata + sizeof(StirMetaPageData)) - (char *) metaPage;
+}
+
+/*
+ * Create and initialize the metapage for a STIR index.
+ * This is called during index creation.
+ */
+void
+StirInitMetapage(Relation index, ForkNumber forknum)
+{
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	/*
+	 * Make a new page; since it is the first page it should be associated with
+	 * block number 0 (STIR_METAPAGE_BLKNO).  No need to hold the extension
+	 * lock because there cannot be concurrent inserters yet.
+	 */
+	metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	START_CRIT_SECTION();
+	Assert(BufferGetBlockNumber(metaBuffer) == STIR_METAPAGE_BLKNO);
+
+	metaPage = BufferGetPage(metaBuffer);
+	StirFillMetapage(index, metaPage, forknum == INIT_FORKNUM);
+
+	MarkBufferDirty(metaBuffer);
+	END_CRIT_SECTION();
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * Initialize any page of a stir index.
+ */
+void
+StirInitPage(Page page, uint16 flags)
+{
+	StirPageOpaque opaque;
+
+	PageInit(page, BLCKSZ, sizeof(StirPageOpaqueData));
+
+	opaque = StirPageGetOpaque(page);
+	opaque->flags = flags;
+	opaque->stir_page_id = STIR_PAGE_ID;
+}
+
+/*
+ * Add a tuple to a STIR page. Returns false if the tuple doesn't fit.
+ * The tuple is added to the end of the page.
+ */
+static bool
+StirPageAddItem(Page page, StirTuple *tuple)
+{
+	StirTuple *itup;
+	StirPageOpaque opaque;
+	char *ptr;
+
+	/* We shouldn't be pointed to an invalid page */
+	Assert(!PageIsNew(page));
+
+	/* Does the new tuple fit on the page? */
+	if (StirPageGetFreeSpace(page) < sizeof(StirTuple))
+		return false;
+
+	/* Copy a new tuple to the end of the page */
+	opaque = StirPageGetOpaque(page);
+	itup = StirPageGetTuple(page, opaque->maxoff + 1);
+	memcpy(itup, tuple, sizeof(StirTuple));
+
+	/* Adjust maxoff and pd_lower */
+	opaque->maxoff++;
+	ptr = (char *) StirPageGetTuple(page, opaque->maxoff + 1);
+	((PageHeader) page)->pd_lower = ptr - page;
+
+	/* Assert we didn't overrun available space */
+	Assert(((PageHeader) page)->pd_lower <= ((PageHeader) page)->pd_upper);
+	return true;
+}
+
+/*
+ * Insert a new tuple into a STIR index.
+ */
+bool
+stirinsert(Relation index, Datum *values, bool *isnull,
+		  ItemPointer ht_ctid, Relation heapRel,
+		  IndexUniqueCheck checkUnique,
+		  bool indexUnchanged,
+		  struct IndexInfo *indexInfo)
+{
+	StirTuple itup;
+	StirMetaPageData *metaData;
+	Buffer buffer,
+			metaBuffer;
+	Page page;
+	BlockNumber blkNo;
+
+	itup.heapPtr = *ht_ctid;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+
+	for (;;)
+	{
+		LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if inserts are allowed */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+		blkNo = metaData->lastBlkNo;
+		/* Don't hold metabuffer lock while doing insert */
+		LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+
+		if (blkNo > 0)
+		{
+			buffer = ReadBuffer(index, blkNo);
+			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+			START_CRIT_SECTION();
+
+			page = BufferGetPage(buffer);
+
+			Assert(!PageIsNew(page));
+
+			/* Try to add tuple to the existing page */
+			if (StirPageAddItem(page, &itup))
+			{
+				/* Success!  Apply the change, clean up, and exit */
+				MarkBufferDirty(buffer);
+				END_CRIT_SECTION();
+
+				UnlockReleaseBuffer(buffer);
+				ReleaseBuffer(metaBuffer);
+				return false;
+			}
+
+			END_CRIT_SECTION();
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Need to add a new page - get exclusive lock on meta-page */
+		LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if another backend already extended the index */
+
+		if (blkNo != metaData->lastBlkNo)
+		{
+			Assert(blkNo < metaData->lastBlkNo);
+			/* Someone else inserted the new page into the index, let's try again */
+			LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+			continue;
+		}
+		else
+		{
+			/* Must extend the file */
+			buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
+									   EB_LOCK_FIRST);
+			page = BufferGetPage(buffer);
+			START_CRIT_SECTION();
+
+			StirInitPage(page, 0);
+
+			if (!StirPageAddItem(page, &itup))
+			{
+				/* We shouldn't be here since we're inserting to an empty page */
+				elog(ERROR, "could not add new stir tuple to empty page");
+			}
+
+			/* Update meta-page with new last block number */
+			metaData->lastBlkNo = BufferGetBlockNumber(buffer);
+
+			MarkBufferDirty(metaBuffer);
+			MarkBufferDirty(buffer);
+
+			END_CRIT_SECTION();
+
+			UnlockReleaseBuffer(buffer);
+			UnlockReleaseBuffer(metaBuffer);
+
+			return false;
+		}
+	}
+}
+
+/*
+ * STIR doesn't support scans - these functions all error out
+ */
+IndexScanDesc
+stirbeginscan(Relation r, int nkeys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void
+stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+		  ScanKey orderbys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void stirendscan(IndexScanDesc scan)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+/*
+ * Build a STIR index - only allowed for auxiliary indexes.
+ * Just initializes the meta-page without any heap scans.
+ */
+IndexBuildResult *
+stirbuild(Relation heap, Relation index,
+						   struct IndexInfo *indexInfo)
+{
+	IndexBuildResult *result;
+
+	if (!indexInfo->ii_Auxiliary)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Building STIR indexes is not supported")));
+
+	StirInitMetapage(index, MAIN_FORKNUM);
+
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	result->heap_tuples = 0;
+	result->index_tuples = 0;
+	return result;
+}
+
+void stirbuildempty(Relation index)
+{
+	StirInitMetapage(index, INIT_FORKNUM);
+}
+
+IndexBulkDeleteResult *
+stirbulkdelete(IndexVacuumInfo *info,
+									 IndexBulkDeleteResult *stats,
+									 IndexBulkDeleteCallback callback,
+									 void *callback_state)
+{
+	Relation index = info->index;
+	BlockNumber blkno, npages;
+	Buffer buffer;
+	Page page;
+
+	/* For normal VACUUM, mark to skip inserts and warn about an index drop needed */
+	if (!info->validate_index)
+	{
+		StirMarkAsSkipInserts(index);
+
+		ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+		return NULL;
+	}
+
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	/*
+	 * Iterate over the pages. We don't care about concurrently added pages,
+	 * because the index is marked as not-ready for that moment and the index is not
+	 * used for insert.
+	 */
+	npages = RelationGetNumberOfBlocks(index);
+	for (blkno = STIR_HEAD_BLKNO; blkno < npages; blkno++)
+	{
+		StirTuple *itup, *itupEnd;
+
+		vacuum_delay_point(false);
+
+		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
+									RBM_NORMAL, info->strategy);
+
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+		page = BufferGetPage(buffer);
+
+		if (PageIsNew(page))
+		{
+			UnlockReleaseBuffer(buffer);
+			continue;
+		}
+
+		itup = StirPageGetTuple(page, FirstOffsetNumber);
+		itupEnd = StirPageGetTuple(page, OffsetNumberNext(StirPageGetMaxOffset(page)));
+		while (itup < itupEnd)
+		{
+			/* Do we have to delete this tuple? */
+			if (callback(&itup->heapPtr, callback_state))
+			{
+				ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("we never delete in stir")));
+			}
+
+			itup = StirPageGetNextTuple(itup);
+		}
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	return stats;
+}
+
+/*
+ * Mark a STIR index to skip future inserts
+ */
+void
+StirMarkAsSkipInserts(Relation index)
+{
+	StirMetaPageData *metaData;
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	START_CRIT_SECTION();
+
+	metaPage = BufferGetPage(metaBuffer);
+	metaData = StirPageGetMeta(metaPage);
+
+	if (!metaData->skipInserts)
+	{
+		metaData->skipInserts = true;
+		MarkBufferDirty(metaBuffer);
+	}
+	END_CRIT_SECTION();
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+IndexBulkDeleteResult *
+stirvacuumcleanup(IndexVacuumInfo *info,
+				  IndexBulkDeleteResult *stats)
+{
+	StirMarkAsSkipInserts(info->index);
+	ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+	return NULL;
+}
+
+bytea *
+stiroptions(Datum reloptions, bool validate)
+{
+	return NULL;
+}
+
+void
+stircostestimate(PlannerInfo *root, IndexPath *path,
+					 double loop_count, Cost *indexStartupCost,
+					 Cost *indexTotalCost, Selectivity *indexSelectivity,
+					 double *indexCorrelation, double *indexPages)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 43de42ce39e..1325f3d9700 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3409,6 +3409,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
+	ivinfo.validate_index = true;
 
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index c78dcea98c1..87e01e74ad7 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -307,6 +307,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_ParallelWorkers = 0;
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
+	indexInfo->ii_Auxiliary = false;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a483424152c..c44a87904a4 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -725,6 +725,7 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 			ivinfo.message_level = elevel;
 			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
 			ivinfo.strategy = vac_strategy;
+			ivinfo.validate_index = false;
 
 			stats = index_vacuum_cleanup(&ivinfo, NULL);
 
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index c3b3c9ea21a..4c727389d42 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -884,6 +884,7 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
 	ivinfo.estimated_count = pvs->shared->estimated_count;
 	ivinfo.num_heap_tuples = pvs->shared->reltuples;
 	ivinfo.strategy = pvs->bstrategy;
+	ivinfo.validate_index = false;
 
 	/* Update error traceback information */
 	pvs->indname = pstrdup(RelationGetRelationName(indrel));
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 2caec621d73..09ae445694d 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -875,6 +875,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
+	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..cd467582731 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -56,6 +56,7 @@ typedef struct IndexVacuumInfo
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
 	double		num_heap_tuples;	/* tuples remaining in heap */
+	bool		validate_index; /* validating concurrently built index? */
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index 0bd17b30ca7..e2966165e6f 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -52,8 +52,9 @@ typedef enum relopt_kind
 	RELOPT_KIND_VIEW = (1 << 9),
 	RELOPT_KIND_BRIN = (1 << 10),
 	RELOPT_KIND_PARTITIONED = (1 << 11),
+	RELOPT_KIND_STIR = (1 << 12),
 	/* if you add a new kind, make sure you update "last_default" too */
-	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_STIR,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/access/stir.h b/src/include/access/stir.h
new file mode 100644
index 00000000000..18ee36506fd
--- /dev/null
+++ b/src/include/access/stir.h
@@ -0,0 +1,113 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.h
+ *	  header file for postgres stir access method implementation.
+ *
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * src/include/access/stir.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STIR_H
+#define STIR_H
+
+#include "access/amapi.h"
+#include "access/xlog.h"
+#include "access/generic_xlog.h"
+#include "access/itup.h"
+#include "nodes/pathnodes.h"
+
+/* Support procedures numbers */
+#define STIR_NPROC				0
+
+/* Scan strategies */
+#define STIR_NSTRATEGIES		1
+
+#define STIR_OPTIONS_PROC				0
+
+/* Macros for accessing stir page structures */
+#define StirPageGetOpaque(page) ((StirPageOpaque) PageGetSpecialPointer(page))
+#define StirPageGetMaxOffset(page) (StirPageGetOpaque(page)->maxoff)
+#define StirPageIsMeta(page) \
+	((StirPageGetOpaque(page)->flags & STIR_META) != 0)
+#define StirPageGetData(page)		((StirTuple *)PageGetContents(page))
+#define StirPageGetTuple(page, offset) \
+	((StirTuple *)(PageGetContents(page) \
+		+ sizeof(StirTuple) * ((offset) - 1)))
+#define StirPageGetNextTuple(tuple) \
+	((StirTuple *)((char *)(tuple) + sizeof(StirTuple)))
+
+
+
+/* Preserved page numbers */
+#define STIR_METAPAGE_BLKNO	(0)
+#define STIR_HEAD_BLKNO		(1) /* first data page */
+
+
+/* Opaque for stir pages */
+typedef struct StirPageOpaqueData
+{
+	OffsetNumber maxoff;		/* number of index tuples on the page */
+	uint16		flags;			/* see bit definitions below */
+	uint16		stir_page_id;	/* for identification of STIR indexes */
+} StirPageOpaqueData;
+
+/* Stir page flags */
+#define STIR_META		(1<<0)
+
+typedef StirPageOpaqueData *StirPageOpaque;
+
+#define STIR_PAGE_ID		0xFF84
+
+/* Metadata of stir index */
+typedef struct StirMetaPageData
+{
+	uint32		magicNumber;
+	BlockNumber	lastBlkNo;
+	bool		skipInserts;	/* should we just exit without any inserts? */
+} StirMetaPageData;
+
+/* Magic number to distinguish stir pages from others */
+#define STIR_MAGIC_NUMBER (0xDBAC0DEF)
+
+#define StirPageGetMeta(page)	((StirMetaPageData *) PageGetContents(page))
+
+typedef struct StirTuple
+{
+	ItemPointerData heapPtr;
+} StirTuple;
+
+#define StirPageGetFreeSpace(page) \
+	(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+		- StirPageGetMaxOffset(page) * (sizeof(StirTuple)) \
+		- MAXALIGN(sizeof(StirPageOpaqueData)))
+
+extern void StirFillMetapage(Relation index, Page metaPage, bool skipInserts);
+extern void StirInitMetapage(Relation index, ForkNumber forknum);
+extern void StirInitPage(Page page, uint16 flags);
+extern void StirMarkAsSkipInserts(Relation index);
+
+/* index access method interface functions */
+extern bool stirvalidate(Oid opclassoid);
+extern bool stirinsert(Relation index, Datum *values, bool *isnull,
+					 ItemPointer ht_ctid, Relation heapRel,
+					 IndexUniqueCheck checkUnique,
+					 bool indexUnchanged,
+					 struct IndexInfo *indexInfo);
+extern IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys);
+extern void stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+					 ScanKey orderbys, int norderbys);
+extern void stirendscan(IndexScanDesc scan);
+extern IndexBuildResult *stirbuild(Relation heap, Relation index,
+								 struct IndexInfo *indexInfo);
+extern void stirbuildempty(Relation index);
+extern IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+										   IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+										   void *callback_state);
+extern IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+											  IndexBulkDeleteResult *stats);
+extern bytea *stiroptions(Datum reloptions, bool validate);
+
+#endif			/* STIR_H */
diff --git a/src/include/catalog/pg_am.dat b/src/include/catalog/pg_am.dat
index 46d361047fe..8bd2c2b46ba 100644
--- a/src/include/catalog/pg_am.dat
+++ b/src/include/catalog/pg_am.dat
@@ -33,5 +33,8 @@
 { oid => '3580', oid_symbol => 'BRIN_AM_OID',
   descr => 'block range index (BRIN) access method',
   amname => 'brin', amhandler => 'brinhandler', amtype => 'i' },
+{ oid => '5555', oid_symbol => 'STIR_AM_OID',
+  descr => 'short term index replacement access method',
+  amname => 'stir', amhandler => 'stirhandler', amtype => 'i' },
 
 ]
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index df170b80840..a3457e749db 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -492,4 +492,8 @@
 
 # no brin opclass for the geometric types except box
 
+# allow any types for STIR
+{ opcmethod => 'stir', oid_symbol => 'ANY_STIR_OPS_OID', opcname => 'stir_ops',
+  opcfamily => 'stir/any_ops', opcintype => 'any'},
+
 ]
diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat
index 7a027c4810e..6ffc20a061c 100644
--- a/src/include/catalog/pg_opfamily.dat
+++ b/src/include/catalog/pg_opfamily.dat
@@ -308,5 +308,7 @@
   opfmethod => 'hash', opfname => 'multirange_ops' },
 { oid => '6158',
   opfmethod => 'gist', opfname => 'multirange_ops' },
+{ oid => '5558',
+  opfmethod => 'stir', opfname => 'any_ops' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index dac40992cbc..5fbf77387c3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -935,6 +935,10 @@
   proname => 'brinhandler', provolatile => 'v',
   prorettype => 'index_am_handler', proargtypes => 'internal',
   prosrc => 'brinhandler' },
+{ oid => '5556', descr => 'short term index replacement access method handler',
+  proname => 'stirhandler', provolatile => 'v',
+  prorettype => 'index_am_handler', proargtypes => 'internal',
+  prosrc => 'stirhandler' },
 { oid => '3952', descr => 'brin: standalone scan new table pages',
   proname => 'brin_summarize_new_values', provolatile => 'v',
   proparallel => 'u', prorettype => 'int4', proargtypes => 'regclass',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..6ce9154b28d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -157,8 +157,8 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
- * during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
+ * are used only during index build; they're conventionally zeroed otherwise.
  * ----------------
  */
 typedef struct IndexInfo
@@ -218,7 +218,8 @@ typedef struct IndexInfo
 	bool		ii_WithoutOverlaps;
 	/* # of workers requested (excludes leader) */
 	int			ii_ParallelWorkers;
-
+	/* is auxiliary for concurrent index build? */
+	bool		ii_Auxiliary;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/utils/index_selfuncs.h b/src/include/utils/index_selfuncs.h
index 74793a1a19d..bf0e30dabe9 100644
--- a/src/include/utils/index_selfuncs.h
+++ b/src/include/utils/index_selfuncs.h
@@ -62,6 +62,14 @@ extern void spgcostestimate(struct PlannerInfo *root,
 							Selectivity *indexSelectivity,
 							double *indexCorrelation,
 							double *indexPages);
+extern void stircostestimate(struct PlannerInfo *root,
+							struct IndexPath *path,
+							double loop_count,
+							Cost *indexStartupCost,
+							Cost *indexTotalCost,
+							Selectivity *indexSelectivity,
+							double *indexCorrelation,
+							double *indexPages);
 extern void gincostestimate(struct PlannerInfo *root,
 							struct IndexPath *path,
 							double loop_count,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index 7ab6113c619..92c033a2010 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -173,7 +173,13 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
  spgist | can_exclude   | t
  spgist | can_include   | t
  spgist | bogus         | 
-(36 rows)
+ stir   | can_order     | f
+ stir   | can_unique    | f
+ stir   | can_multi_col | t
+ stir   | can_exclude   | f
+ stir   | can_include   | t
+ stir   | bogus         | 
+(42 rows)
 
 --
 -- additional checks for pg_index_column_has_property
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 6ff4d7ee901..9259679eea2 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2129,9 +2129,10 @@ FROM pg_opclass AS c1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS a1
                  WHERE a1.amopfamily = c1.opcfamily
                    AND binary_coercible(c1.opcintype, a1.amoplefttype));
- opcname | opcfamily 
----------+-----------
-(0 rows)
+ opcname  | opcfamily 
+----------+-----------
+ stir_ops |      5558
+(1 row)
 
 -- Check that each operator listed in pg_amop has an associated opclass,
 -- that is one whose opcintype matches oprleft (possibly by coercion).
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index c8f3932edf0..ecc2c2a6049 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5171,7 +5171,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA *
 List of access methods
@@ -5185,7 +5186,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA h*
 List of access methods
@@ -5210,9 +5212,9 @@ List of access methods
 
 \dA: extra argument "bar" ignored
 \dA+
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5221,12 +5223,13 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ *
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5235,7 +5238,8 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ h*
                      List of access methods
-- 
2.53.0



  [text/x-patch] v29-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch (94.9K, 4-v29-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch)
  download | inline diff:
From 8e6192fb55044f45fdddaeb65b31ec9c03c9c343 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 15:03:10 +0100
Subject: [PATCH v29 4/7] Use auxiliary indexes for concurrent index operations

Replace the second table full scan in concurrent index builds with an auxiliary index approach:
- create a STIR auxiliary index with the same predicate (if exists) as in main index
- use it to track tuples inserted during the first phase
- merge auxiliary index with main index during validation to catch up new index with any tuples missed during the first phase
- automatically drop auxiliary when main index is ready

To merge main and auxiliary indexes:
- index_bulk_delete called for both, TIDs put into tuplesort
- both tuplesort are being sorted
- both tuplesort scanned with two pointers looking for the TIDs present in auxiliary index, but absent in main one
- all such TIDs are put into tuplestore
- all TIDs in tuplestore are fetched using the stream, tuplestore used in heapam_index_validate_scan_read_stream_next to provide the next page to prefetch
- if fetched tuple is alive - it is inserted into the main index

This eliminates the need for a second full table scan during validation, improving performance, especially for large tables. Affects both CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY operations.
---
 doc/src/sgml/monitoring.sgml               |  26 +-
 doc/src/sgml/ref/create_index.sgml         |  34 +-
 doc/src/sgml/ref/reindex.sgml              |  42 +-
 src/backend/access/heap/README.HOT         |  13 +-
 src/backend/access/heap/heapam_handler.c   | 548 ++++++++++++++-------
 src/backend/catalog/index.c                | 308 ++++++++++--
 src/backend/catalog/system_views.sql       |  17 +-
 src/backend/commands/indexcmds.c           | 344 +++++++++++--
 src/backend/nodes/makefuncs.c              |   4 +-
 src/include/access/tableam.h               |  12 +-
 src/include/catalog/index.h                |   9 +-
 src/include/commands/progress.h            |  13 +-
 src/include/nodes/makefuncs.h              |   3 +-
 src/test/regress/expected/create_index.out |  42 ++
 src/test/regress/expected/indexing.out     |   3 +-
 src/test/regress/expected/rules.out        |  17 +-
 src/test/regress/sql/create_index.sql      |  21 +
 17 files changed, 1121 insertions(+), 335 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b77d189a500..99c5e78c828 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6492,6 +6492,18 @@ FROM pg_stat_get_backend_idset() AS backendid;
        information for this phase.
       </entry>
      </row>
+     <row>
+      <entry><literal>waiting for writers to use auxiliary index</literal></entry>
+      <entry>
+       <command>CREATE INDEX CONCURRENTLY</command> or <command>REINDEX CONCURRENTLY</command> is waiting for transactions
+       with write locks that can potentially see the table to finish, to ensure use of auxiliary index for new tuples in
+       future transactions.
+       This phase is skipped when not in concurrent mode.
+       Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
+       and <structname>current_locker_pid</structname> contain the progress
+       information for this phase.
+      </entry>
+     </row>
      <row>
       <entry><literal>building index</literal></entry>
       <entry>
@@ -6532,13 +6544,12 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </entry>
      </row>
      <row>
-      <entry><literal>index validation: scanning table</literal></entry>
+      <entry><literal>index validation: merging indexes</literal></entry>
       <entry>
-       <command>CREATE INDEX CONCURRENTLY</command> is scanning the table
-       to validate the index tuples collected in the previous two phases.
+       <command>CREATE INDEX CONCURRENTLY</command> is merging content of auxiliary index with the target index.
        This phase is skipped when not in concurrent mode.
-       Columns <structname>blocks_total</structname> (set to the total size of the table)
-       and <structname>blocks_done</structname> contain the progress information for this phase.
+       Columns <structname>tuples_total</structname> (set to the number of tuples to be merged)
+       and <structname>tuples_done</structname> contain the progress information for this phase.
       </entry>
      </row>
      <row>
@@ -6555,8 +6566,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
      <row>
       <entry><literal>waiting for readers before marking dead</literal></entry>
       <entry>
-       <command>REINDEX CONCURRENTLY</command> is waiting for transactions
-       with read locks on the table to finish, before marking the old index dead.
+       <command>CREATE INDEX CONCURRENTLY</command> is waiting for transactions
+        with read locks on the table to finish, before marking the auxiliary index as dead.
+       <command>REINDEX CONCURRENTLY</command> is also waiting before marking the old index as dead.
        This phase is skipped when not in concurrent mode.
        Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
        and <structname>current_locker_pid</structname> contain the progress
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index bb7505d171b..12c88587a79 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -620,10 +620,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     out writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>CREATE INDEX</command>.
     When this option is used,
-    <productname>PostgreSQL</productname> must perform two scans of the table, and in
-    addition it must wait for all existing transactions that could potentially
-    modify or use the index to terminate.  Thus
-    this method requires more total work than a standard index build and takes
+    <productname>PostgreSQL</productname> must perform table scan followed by
+    validation phase, and in addition it must wait for all existing transactions
+    that could potentially modify or use the index to terminate.  Thus
+    this method requires more total work than a standard index build and may take
     significantly longer to complete.  However, since it allows normal
     operations to continue while the index is built, this method is useful for
     adding new indexes in a production environment.  Of course, the extra CPU
@@ -631,14 +631,14 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </para>
 
    <para>
-    In a concurrent index build, the index is actually entered as an
-    <quote>invalid</quote> index into
-    the system catalogs in one transaction, then two table scans occur in
-    two more transactions.  Before each table scan, the index build must
+    In a concurrent index build, the main and auxiliary indexes are actually
+    entered as an <quote>invalid</quote> index into
+    the system catalogs in one transaction, then two phases occur in
+    multiple transactions.  Before each phase, the index build must
     wait for existing transactions that have modified the table to terminate.
-    After the second scan, the index build must wait for any transactions
+    After the second phase, the index build must wait for any transactions
     that have a snapshot (see <xref linkend="mvcc"/>) predating the second
-    scan to terminate, including transactions used by any phase of concurrent
+    phase to terminate, including transactions used by any phase of concurrent
     index builds on other tables, if the indexes involved are partial or have
     columns that are not simple column references.
     Then finally the index can be marked <quote>valid</quote> and ready for use,
@@ -651,10 +651,11 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    <para>
     If a problem arises while scanning the table, such as a deadlock or a
     uniqueness violation in a unique index, the <command>CREATE INDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> index. This index
-    will be ignored for querying purposes because it might be incomplete;
-    however it will still consume update overhead. The <application>psql</application>
-    <command>\d</command> command will report such an index as <literal>INVALID</literal>:
+    command will fail but leave behind an <quote>invalid</quote> index and its
+    associated auxiliary index. These indexes
+    will be ignored for querying purposes because they might be incomplete;
+    however they will still consume update overhead. The <application>psql</application>
+    <command>\d</command> command will report such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -664,11 +665,12 @@ postgres=# \d tab
  col    | integer |           |          |
 Indexes:
     "idx" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     The recommended recovery
-    method in such cases is to drop the index and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>.  (Another possibility is
+    method in such cases is to drop these indexes and try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
     to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
    </para>
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 185cd75ca30..1c3c7a97f6a 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -76,7 +76,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
       this index is left as <quote>invalid</quote>. Such indexes are useless
       but it can be convenient to use <command>REINDEX</command> to rebuild
       them. Note that only <command>REINDEX INDEX</command> is able
-      to perform a concurrent build on an invalid index.
+      to perform a concurrent build on a invalid index.
      </para>
     </listitem>
 
@@ -368,9 +368,8 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <productname>PostgreSQL</productname> supports rebuilding indexes with minimum locking
     of writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>REINDEX</command>. When this option
-    is used, <productname>PostgreSQL</productname> must perform two scans of the table
-    for each index that needs to be rebuilt and wait for termination of
-    all existing transactions that could potentially use the index.
+    is used, <productname>PostgreSQL</productname> must perform several steps to ensure data
+    consistency while allowing normal operations to continue.
     This method requires more total work than a standard index
     rebuild and takes significantly longer to complete as it needs to wait
     for unfinished transactions that might modify the index. However, since
@@ -388,7 +387,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <orderedlist>
      <listitem>
       <para>
-       A new transient index definition is added to the catalog
+       A new transient index definition and an auxiliary index are added to the catalog
        <literal>pg_index</literal>.  This definition will be used to replace
        the old index.  A <literal>SHARE UPDATE EXCLUSIVE</literal> lock at
        session level is taken on the indexes being reindexed as well as their
@@ -398,7 +397,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       A first pass to build the index is done for each new index.  Once the
+       The auxiliary index is marked as "ready for inserts", making
+       it visible to other sessions. This index efficiently tracks all new
+       tuples during the reindex process.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The new main index is built by scanning the table.  Once the
        index is built, its flag <literal>pg_index.indisready</literal> is
        switched to <quote>true</quote> to make it ready for inserts, making it
        visible to other sessions once the transaction that performed the build
@@ -409,9 +416,9 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       Then a second pass is performed to add tuples that were added while the
-       first pass was running.  This step is also done in a separate
-       transaction for each index.
+       A validation phase merges any missing entries from the auxiliary index
+       into the main index, ensuring all concurrent changes are captured.
+       This step is also done in a separate transaction for each index.
       </para>
      </listitem>
 
@@ -428,7 +435,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes have <literal>pg_index.indisready</literal> switched to
+       The old and auxiliary indexes have <literal>pg_index.indisready</literal> switched to
        <quote>false</quote> to prevent any new tuple insertions, after waiting
        for running queries that might reference the old index to complete.
       </para>
@@ -436,7 +443,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes are dropped.  The <literal>SHARE UPDATE
+       The old and auxiliary indexes are dropped.  The <literal>SHARE UPDATE
        EXCLUSIVE</literal> session locks for the indexes and the table are
        released.
       </para>
@@ -447,11 +454,11 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
    <para>
     If a problem arises while rebuilding the indexes, such as a
     uniqueness violation in a unique index, the <command>REINDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> new index in addition to
-    the pre-existing one. This index will be ignored for querying purposes
-    because it might be incomplete; however it will still consume update
+    command will fail but leave behind an <quote>invalid</quote> new index and its auxiliary index in addition to
+    the pre-existing one. These indexes will be ignored for querying purposes
+    because they might be incomplete; however they will still consume update
     overhead. The <application>psql</application> <command>\d</command> command will report
-    such an index as <literal>INVALID</literal>:
+    such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -462,12 +469,13 @@ postgres=# \d tab
 Indexes:
     "idx" btree (col)
     "idx_ccnew" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     If the index marked <literal>INVALID</literal> is suffixed
-    <literal>_ccnew</literal>, then it corresponds to the transient
+    <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
-    recovery method is to drop it using <literal>DROP INDEX</literal>,
+    recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
     then attempt <command>REINDEX CONCURRENTLY</command> again.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 74e407f375a..b1c797517ee 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -375,6 +375,11 @@ constraint on which updates can be HOT.  Other transactions must include
 such an index when determining HOT-safety of updates, even though they
 must ignore it for both insertion and searching purposes.
 
+Also, special auxiliary index is created the same way. It is marked as
+"ready for inserts" without any actual table scan. Its purpose is to collect
+new tuples inserted into table while our target index is still "not ready
+for inserts".
+
 We must do this to avoid making incorrect index entries.  For example,
 suppose we are building an index on column X and we make an index entry for
 a non-HOT tuple with X=1.  Then some other backend, unaware that X is an
@@ -394,10 +399,10 @@ entry at the root of the HOT-update chain but we use the key value from the
 live tuple.
 
 We mark the index open for inserts (but still not ready for reads) then
-we again wait for transactions which have the table open.  Then we take
-a second reference snapshot and validate the index.  This searches for
-tuples missing from the index, and inserts any missing ones.  Again,
-the index entries have to have TIDs equal to HOT-chain root TIDs, but
+we again wait for transactions which have the table open.  Then validate
+the index.  This searches for tuples missing from the index in auxiliary
+index, and inserts any missing ones if they are visible to reference snapshot.
+Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b83e2013d50..2d60a65ac98 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -41,6 +41,7 @@
 #include "storage/bufpage.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -1757,242 +1758,409 @@ heapam_index_build_range_scan(Relation heapRelation,
 	return reltuples;
 }
 
+/*
+ * Calculate set difference (relative complement) of main and aux
+ * sets.
+ *
+ * All records which are present in auxiliary tuplesort but not in
+ * main are added to the store.
+ *
+ * In set theory notation store = aux - main or store = aux / main.
+ *
+ * returns number of items added to store
+ */
+static int
+heapam_index_validate_tuplesort_difference(Tuplesortstate  *main,
+										   Tuplesortstate  *aux,
+										   Tuplestorestate *store)
+{
+	int				num = 0;
+	/* state variables for the merge */
+	ItemPointer 	indexcursor = NULL,
+					auxindexcursor = NULL;
+	ItemPointerData decoded,
+					auxdecoded;
+	bool			tuplesort_empty = false,
+					auxtuplesort_empty = false;
+
+	/* Initialize pointers. */
+	ItemPointerSetInvalid(&decoded);
+	ItemPointerSetInvalid(&auxdecoded);
+
+	/*
+	 * Main loop: we step through the auxiliary sort (auxState->tuplesort),
+	 * which holds TIDs that must compared to those from the "main" sort
+	 * (state->tuplesort).
+	 */
+	while (!auxtuplesort_empty)
+	{
+		Datum		ts_val;
+		bool		ts_isnull;
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		* Attempt to fetch the next TID from the auxiliary sort. If it's
+		* empty, we set auxindexcursor to NULL.
+		*/
+		auxtuplesort_empty = !tuplesort_getdatum(aux, true,
+												 false, &ts_val, &ts_isnull,
+												 NULL);
+		Assert(auxtuplesort_empty || !ts_isnull);
+		if (!auxtuplesort_empty)
+		{
+			itemptr_decode(&auxdecoded, DatumGetInt64(ts_val));
+			auxindexcursor = &auxdecoded;
+		}
+		else
+		{
+			auxindexcursor = NULL;
+		}
+
+		/*
+		* If the auxiliary sort is not yet empty, we now try to synchronize
+		* the "main" sort cursor (indexcursor) with auxindexcursor. We advance
+		* the main sort cursor until we've reached or passed the auxiliary TID.
+		*/
+		if (!auxtuplesort_empty)
+		{
+			/*
+			 * Move the main sort forward while:
+			 *   (1) It's not exhausted (tuplesort_empty == false), and
+			 *   (2) Either indexcursor is NULL (first iteration) or
+			 *       indexcursor < auxindexcursor in TID order.
+			 */
+			while (!tuplesort_empty && (indexcursor == NULL || /* null on first time here */
+						ItemPointerCompare(indexcursor, auxindexcursor) < 0))
+			{
+				/*
+				 * Get the next TID from the main sort. If it's empty,
+				 * we set indexcursor to NULL.
+				 */
+				tuplesort_empty = !tuplesort_getdatum(main, true,
+													  false, &ts_val, &ts_isnull,
+													  NULL);
+				Assert(tuplesort_empty || !ts_isnull);
+
+				if (!tuplesort_empty)
+				{
+					itemptr_decode(&decoded, DatumGetInt64(ts_val));
+					indexcursor = &decoded;
+				}
+				else
+				{
+					indexcursor = NULL;
+				}
+
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * Now, if either:
+			 *  - the main sort is empty, or
+			 *  - indexcursor > auxindexcursor,
+			 *
+			 * then auxindexcursor identifies a TID that doesn't appear in
+			 * the main sort. We likely need to insert it
+			 * into the target index if it’s visible in the heap.
+			 */
+			if (tuplesort_empty || ItemPointerCompare(indexcursor, auxindexcursor) > 0)
+			{
+				tuplestore_putdatum(store, Int64GetDatum(itemptr_encode(auxindexcursor)));
+				num++;
+			}
+		}
+	}
+
+	return num;
+}
+
+typedef struct ValidateIndexScanState
+{
+	Tuplestorestate		*store;
+	BlockNumber			prev_block_number;
+	OffsetNumber		prev_offset_number;
+} ValidateIndexScanState;
+
+/*
+ * This is ReadStreamBlockNumberCB implementation which works as follows:
+ *
+ * 1) It iterates over a sorted tuplestore, where each element is an encoded
+ *    ItemPointer
+ *
+ * 2) It returns the current BlockNumber and collects all OffsetNumbers
+ *    for that block in per_buffer_data.
+ *
+ * 3) Once the code encounters a new BlockNumber, it stops reading more
+ *    offsets and saves the OffsetNumber of the new block for the next call.
+ *
+ * 4) The list of offsets for a block is always terminated with InvalidOffsetNumber.
+ *
+ * This function is intended to be repeatedly called, each time returning
+ * the next block and its corresponding set of offsets.
+ */
+static BlockNumber
+heapam_index_validate_scan_read_stream_next(
+								  ReadStream *stream,
+								  void *void_callback_private_data,
+								  void *void_per_buffer_data
+								  )
+{
+	bool should_free;
+	Datum datum;
+	BlockNumber result = InvalidBlockNumber;
+	int i = 0;
+
+	/*
+	 * Retrieve the specialized callback state and the output buffer.
+	 * callback_private_data keeps track of the previous block and offset
+	 * from a prior invocation, if any.
+	 */
+	ValidateIndexScanState *callback_private_data = void_callback_private_data;
+	OffsetNumber *per_buffer_data = void_per_buffer_data;
+
+	/*
+	 * If there is a "leftover" offset number from the previous invocation,
+	 * it means we had switched to a new block in the middle of the last call.
+	 * We place that leftover offset number into the buffer first.
+	 */
+	if (callback_private_data->prev_offset_number != InvalidOffsetNumber)
+	{
+		Assert(callback_private_data->prev_block_number != InvalidBlockNumber);
+		/*
+		 * 'result' is the block number to return. We set it to the block
+		 * from the previous leftover offset.
+		 */
+		result = callback_private_data->prev_block_number;
+		/* Place leftover offset number in the output buffer. */
+		per_buffer_data[i++] = callback_private_data->prev_offset_number;
+		/*
+		 * Clear the leftover offset number so it won't be reused unless
+		 * we encounter another block change.
+		 */
+		callback_private_data->prev_offset_number = InvalidOffsetNumber;
+	}
+
+	/*
+	 * Read from the tuplestore until we either run out of tuples or we
+	 * encounter a block change. For each tuple:
+	 *
+	 *   1) Decode its block/offset from the Datum.
+	 *   2) If it's the first time in this call (prev_block_number == InvalidBlockNumber),
+	 *      initialize prev_block_number.
+	 *   3) If the block number matches the current block, collect the offset.
+	 *   4) If the block number differs, save that offset as leftover and break
+	 *      so that the next call can handle the new block.
+	 */
+	while (tuplestore_getdatum(callback_private_data->store, true, &should_free, &datum))
+	{
+		BlockNumber next_block_number;
+		ItemPointerData next_data;
+
+		/* Decode the datum into an ItemPointer (block + offset). */
+		itemptr_decode(&next_data, DatumGetInt64(datum));
+		next_block_number = ItemPointerGetBlockNumber(&next_data);
+
+		/*
+		 * If we haven't set a block number yet this round, initialize it
+		 * using the first tuple we read.
+		 */
+		if (callback_private_data->prev_block_number == InvalidBlockNumber)
+			callback_private_data->prev_block_number = next_block_number;
+
+		/*
+		 * Always set the result to be the "current" block number
+		 * we are filling offsets for.
+		 */
+		result = callback_private_data->prev_block_number;
+
+		/*
+		 * If this tuple is from the same block, just store its offset
+		 * in our per_buffer_data array.
+		 */
+		if (next_block_number == callback_private_data->prev_block_number)
+		{
+			per_buffer_data[i++] = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+		}
+		else
+		{
+			/*
+			 * If the block just changed, store the offset of the new block
+			 * as leftover for the next invocation and break out.
+			 */
+			callback_private_data->prev_block_number = next_block_number;
+			callback_private_data->prev_offset_number = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+
+			/* Break to let the next call handle the new block. */
+			break;
+		}
+	}
+
+	/*
+	 * Terminate the list of offsets for this block with an InvalidOffsetNumber.
+	 */
+	per_buffer_data[i] = InvalidOffsetNumber;
+	return result;
+}
+
 static void
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
 						   Snapshot snapshot,
-						   ValidateIndexState *state)
+						   ValidateIndexState *state,
+						   ValidateIndexState *auxState)
 {
-	TableScanDesc scan;
-	HeapScanDesc hscan;
-	HeapTuple	heapTuple;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-	BlockNumber previous_blkno = InvalidBlockNumber;
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
+
+	TupleTableSlot  *slot;
+	EState			*estate;
+	ExprContext		*econtext;
+	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	int				num_to_check;
+	Tuplestorestate *tuples_for_check;
+	ValidateIndexScanState callback_private_data;
+
+	Buffer buf;
+	OffsetNumber* tuples;
+	ReadStream *read_stream;
+
+	/* Use 10% of memory for tuple store. */
+	int		store_work_mem_part = maintenance_work_mem / 10;
+
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting
+	 * item pointers.  This can be significantly faster, primarily because TID
+	 * is a pass-by-reference type on all platforms, whereas int8 is
+	 * pass-by-value on most platforms.
+	 */
+	tuples_for_check =  tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
 	/*
 	 * sanity checks
 	 */
 	Assert(OidIsValid(indexRelation->rd_rel->relam));
 
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
+	num_to_check = heapam_index_validate_tuplesort_difference(state->tuplesort,
+														 auxState->tuplesort,
+														 tuples_for_check);
+
+	/* It is our responsibility to close tuple sort as fast as we can */
+	tuplesort_end(state->tuplesort);
+	tuplesort_end(auxState->tuplesort);
+
+	state->tuplesort = auxState->tuplesort = NULL;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+									&TTSOpsBufferHeapTuple);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
 
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+	callback_private_data.prev_block_number = InvalidBlockNumber;
+	callback_private_data.store = tuples_for_check;
+	callback_private_data.prev_offset_number = InvalidOffsetNumber;
 
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	scan = table_beginscan_strat(heapRelation,	/* relation */
-								 snapshot,	/* snapshot */
-								 0, /* number of keys */
-								 NULL,	/* scan key */
-								 true,	/* buffer access strategy OK */
-								 false);	/* syncscan not OK */
-	hscan = (HeapScanDesc) scan;
+	read_stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING,
+														 bstrategy,
+														 heapRelation, MAIN_FORKNUM,
+														 heapam_index_validate_scan_read_stream_next,
+														 &callback_private_data,
+														 (MaxHeapTuplesPerPage + 1) * sizeof(OffsetNumber));
 
-	pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_TOTAL,
-								 hscan->rs_nblocks);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_TOTAL, num_to_check);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE, 0);
 
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while ((buf = read_stream_next_buffer(read_stream, (void*) &tuples)) != InvalidBuffer)
 	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
+		HeapTupleData	heap_tuple_data[MaxHeapTuplesPerPage];
+		int i;
+		OffsetNumber off;
+		BlockNumber block_number;
 
 		CHECK_FOR_INTERRUPTS();
 
-		state->htups += 1;
+		LockBuffer(buf, BUFFER_LOCK_SHARE);
+		block_number = BufferGetBlockNumber(buf);
 
-		if ((previous_blkno == InvalidBlockNumber) ||
-			(hscan->rs_cblock != previous_blkno))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
-										 hscan->rs_cblock);
-			previous_blkno = hscan->rs_cblock;
+			ItemPointerData tid;
+			bool		all_dead, found;
+			ItemPointerSet(&tid, block_number, off);
+
+			found = heap_hot_search_buffer(&tid, heapRelation, buf, snapshot,
+										   &heap_tuple_data[i], &all_dead, true);
+			if (!found)
+				ItemPointerSetInvalid(&heap_tuple_data[i].t_self);
+			i++;
+			state->htups += 1;
 		}
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		/*
-		 * As commented in table_index_build_scan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
-		 */
-		if (hscan->rs_cblock != root_blkno)
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
-		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
-		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
-		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
+			if (ItemPointerIsValid(&heap_tuple_data[i].t_self))
 			{
+				ItemPointerData root_tid;
+				ItemPointerSet(&root_tid, block_number, off);
+
+				/* Reset the per-tuple memory context for the next fetch. */
+				MemoryContextReset(econtext->ecxt_per_tuple_memory);
+				ExecStoreBufferHeapTuple(&heap_tuple_data[i], slot, buf);
+
+				/* Compute the key values and null flags for this tuple. */
+				FormIndexDatum(indexInfo,
+							   slot,
+							   estate,
+							   values,
+							   isnull);
+
 				/*
-				 * Remember index items seen earlier on the current heap page
+				 * Insert the tuple into the target index.
 				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+				index_insert(indexRelation,
+							 values,
+							 isnull,
+							 &root_tid, /* insert root tuple */
+							 heapRelation,
+							 indexInfo->ii_Unique ?
+							 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+							 false,
+							 indexInfo);
+
+				state->tups_inserted += 1;
 			}
 
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  false, &ts_val, &ts_isnull,
-												  NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_TUPLES_DONE, 1);
+			i++;
 		}
 
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 false,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
+		ReleaseBuffer(buf);
 	}
 
-	table_endscan(scan);
-
 	ExecDropSingleTupleTableSlot(slot);
 
 	FreeExecutorState(estate);
 
+	read_stream_end(read_stream);
+	tuplestore_end(tuples_for_check);
+
+	FreeAccessStrategy(bstrategy);
+
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1325f3d9700..4f77627fb3b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -712,11 +712,16 @@ UpdateIndexRelation(Oid indexoid,
  *			already exists.
  *		INDEX_CREATE_PARTITIONED:
  *			create a partitioned index (table must be partitioned)
+ *		INDEX_CREATE_AUXILIARY:
+ *			mark index as auxiliary index
  * constr_flags: flags passed to index_constraint_create
  *		(only if INDEX_CREATE_ADD_CONSTRAINT is set)
  * allow_system_table_mods: allow table to be a system catalog
  * is_internal: if true, post creation hook for new index
  * constraintId: if not NULL, receives OID of created constraint
+ * relpersistence: persistence level to use for index. In most of the
+ *		cases it should be equal to the persistence level of the table,
+ *		auxiliary indexes are only exception here.
  *
  * Returns the OID of the created index.
  */
@@ -757,6 +762,7 @@ index_create(Relation heapRelation,
 	bool		invalid = (flags & INDEX_CREATE_INVALID) != 0;
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
+	bool		auxiliary = (flags & INDEX_CREATE_AUXILIARY) != 0;
 	char		relkind;
 	TransactionId relfrozenxid;
 	MultiXactId relminmxid;
@@ -782,7 +788,10 @@ index_create(Relation heapRelation,
 	namespaceId = RelationGetNamespace(heapRelation);
 	shared_relation = heapRelation->rd_rel->relisshared;
 	mapped_relation = RelationIsMapped(heapRelation);
-	relpersistence = heapRelation->rd_rel->relpersistence;
+	if (auxiliary)
+		relpersistence = RELPERSISTENCE_UNLOGGED; /* aux indexes are always unlogged */
+	else
+		relpersistence = heapRelation->rd_rel->relpersistence;
 
 	/*
 	 * check parameters
@@ -790,6 +799,11 @@ index_create(Relation heapRelation,
 	if (indexInfo->ii_NumIndexAttrs < 1)
 		elog(ERROR, "must index at least one column");
 
+	if (indexInfo->ii_Am == STIR_AM_OID && !auxiliary)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("user-defined indexes with STIR access method are not supported")));
+
 	if (!allow_system_table_mods &&
 		IsSystemRelation(heapRelation) &&
 		IsNormalProcessingMode())
@@ -1395,7 +1409,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							false,	/* not ready for inserts */
 							true,
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							false);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1470,6 +1485,154 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	return newIndexId;
 }
 
+/*
+ * index_concurrently_create_aux
+ *
+ * Create concurrently an auxiliary index based on the definition of the one
+ * provided by caller.  The index is inserted into catalogs and needs to be
+ * built later on. This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ */
+Oid
+index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
+							   Oid tablespaceOid, const char *newName)
+{
+	Relation	indexRelation;
+	IndexInfo  *oldInfo,
+			*newInfo;
+	Oid			newIndexId = InvalidOid;
+	HeapTuple	indexTuple;
+
+	List	   *indexColNames = NIL;
+	List	   *indexExprs = NIL;
+	List	   *indexPreds = NIL;
+
+	Oid *auxOpclassIds;
+	int16 *auxColoptions;
+
+	indexRelation = index_open(mainIndexId, RowExclusiveLock);
+
+	/* The new index needs some information from the old index */
+	oldInfo = BuildIndexInfo(indexRelation);
+
+	/*
+	 * Build of an auxiliary index with exclusion constraints is not
+	 * supported.
+	 */
+	if (oldInfo->ii_ExclusionOps != NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("auxiliary index creation for exclusion constraints is not supported")));
+
+	/* Get the array of class and column options IDs from index info */
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(mainIndexId));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", mainIndexId);
+
+
+	/*
+	 * Fetch the list of expressions and predicates directly from the
+	 * catalogs.  This cannot rely on the information from IndexInfo of the
+	 * old index as these have been flattened for the planner.
+	 */
+	if (oldInfo->ii_Expressions != NIL)
+	{
+		Datum		exprDatum;
+		char	   *exprString;
+
+		exprDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indexprs);
+		exprString = TextDatumGetCString(exprDatum);
+		indexExprs = (List *) stringToNode(exprString);
+		pfree(exprString);
+	}
+	if (oldInfo->ii_Predicate != NIL)
+	{
+		Datum		predDatum;
+		char	   *predString;
+
+		predDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indpred);
+		predString = TextDatumGetCString(predDatum);
+		indexPreds = (List *) stringToNode(predString);
+
+		/* Also convert to implicit-AND format */
+		indexPreds = make_ands_implicit((Expr *) indexPreds);
+		pfree(predString);
+	}
+
+	/*
+	 * Build the index information for the new index.  Note that rebuild of
+	 * indexes with exclusion constraints is not supported, hence there is no
+	 * need to fill all the ii_Exclusion* fields.
+	 */
+	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
+							oldInfo->ii_NumIndexKeyAttrs,
+							STIR_AM_OID, /* special AM for aux indexes */
+							indexExprs,
+							indexPreds,
+							false,	/* aux index are not unique */
+							oldInfo->ii_NullsNotDistinct,
+							false,	/* not ready for inserts */
+							true,
+							false,	/* aux are not summarizing */
+							false,	/* aux are not without overlaps */
+							true	/* auxiliary */);
+
+	/*
+	 * Extract the list of column names and the column numbers for the new
+	 * index information.  All this information will be used for the index
+	 * creation.
+	 */
+	for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+	{
+		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
+		Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+		indexColNames = lappend(indexColNames, NameStr(att->attname));
+		newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+	}
+
+	auxOpclassIds = palloc0(sizeof(Oid) * newInfo->ii_NumIndexAttrs);
+	auxColoptions = palloc0(sizeof(int16) * newInfo->ii_NumIndexAttrs);
+
+	/* Fill with "any ops" */
+	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
+	{
+		auxOpclassIds[i] = ANY_STIR_OPS_OID;
+		auxColoptions[i] = 0;
+	}
+
+	newIndexId = index_create(heapRelation,
+							  newName,
+							  InvalidOid,    /* indexRelationId */
+							  InvalidOid,    /* parentIndexRelid */
+							  InvalidOid,    /* parentConstraintId */
+							  InvalidRelFileNumber, /* relFileNumber */
+							  newInfo,
+							  indexColNames,
+							  STIR_AM_OID,
+							  tablespaceOid,
+							  indexRelation->rd_indcollation,
+							  auxOpclassIds,
+							  NULL,
+							  auxColoptions,
+							  NULL,
+							  (Datum) 0,
+							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT | INDEX_CREATE_AUXILIARY,
+							  0,
+							  true, /* allow table to be a system catalog? */
+							  false,    /* is_internal? */
+							  NULL);
+
+	/* Close the relations used and clean up */
+	index_close(indexRelation, NoLock);
+	ReleaseSysCache(indexTuple);
+
+	return newIndexId;
+}
+
 /*
  * index_concurrently_build
  *
@@ -2450,7 +2613,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2510,7 +2674,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3286,12 +3451,21 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * We do a concurrent index build by first inserting the catalog entry for the
  * index via index_create(), marking it not indisready and not indisvalid.
+ * Then we create special auxiliary index the same way. It based on STIR AM.
  * Then we commit our transaction and start a new one, then we wait for all
  * transactions that could have been modifying the table to terminate.  Now
- * we know that any subsequently-started transactions will see the index and
+ * we know that any subsequently-started transactions will see indexes and
  * honor its constraints on HOT updates; so while existing HOT-chains might
  * be broken with respect to the index, no currently live tuple will have an
- * incompatible HOT update done to it.  We now build the index normally via
+ * incompatible HOT update done to it.
+ *
+ * After that, we build the auxiliary index. It is fast operation without any actual
+ * table scan. As result, we have empty STIR index. We commit transaction and
+ * again wait for all transactions that could have been modifying the table
+ * to terminate. At that moment all new tuples are going to be inserted into
+ * auxiliary index.
+ *
+ * We now build the index normally via
  * index_build(), while holding a weak lock that allows concurrent
  * insert/update/delete.  Also, we index only tuples that are valid
  * as of the start of the scan (see table_index_build_scan), whereas a normal
@@ -3301,14 +3475,17 @@ IndexCheckExclusion(Relation heapRelation,
  * bogus unique-index failures due to concurrent UPDATEs (we might see
  * different versions of the same row as being valid when we pass over them,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
- * does not contain any tuples added to the table while we built the index.
+ * does not contain any tuples added to the table while we built the index
+ * (but these tuples contained in auxiliary index).
  *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
- * commit the second transaction and start a third.  Again we wait for all
+ * commit the third transaction and start a fourth.  Again we wait for all
  * transactions that could have been modifying the table to terminate.  Now
  * we know that any subsequently-started transactions will see the index and
- * insert their new tuples into it.  We then take a new reference snapshot
- * which is passed to validate_index().  Any tuples that are valid according
+ * insert their new tuples into it. At the same moment we clear "indisready" for
+ * auxiliary index, since it is no more required to be updated.
+ *
+ * We then take a new reference snapshot, any tuples that are valid according
  * to this snap, but are not in the index, must be added to the index.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
@@ -3316,12 +3493,14 @@ IndexCheckExclusion(Relation heapRelation,
  * that might care about them before we mark the index valid.)
  *
  * validate_index() works by first gathering all the TIDs currently in the
- * index, using a bulkdelete callback that just stores the TIDs and doesn't
+ * indexes, using a bulkdelete callback that just stores the TIDs and doesn't
  * ever say "delete it".  (This should be faster than a plain indexscan;
  * also, not all index AMs support full-index indexscan.)  Then we sort the
- * TIDs, and finally scan the table doing a "merge join" against the TID list
- * to see which tuples are missing from the index.  Thus we will ensure that
- * all tuples valid according to the reference snapshot are in the index.
+ * TIDs of both auxiliary and target indexes, and doing a "merge join" against
+ * the TID lists to see which tuples from auxiliary index are missing from the
+ * target index.  Thus we will ensure that all tuples valid according to the
+ * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
  * tuple that is already dead or is in process of being deleted, and we
@@ -3339,22 +3518,26 @@ IndexCheckExclusion(Relation heapRelation,
  * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
  * transactions will be able to use it for queries.
  *
- * Doing two full table scans is a brute-force strategy.  We could try to be
- * cleverer, eg storing new tuples in a special area of the table (perhaps
- * making the table append-only by setting use_fsm).  However that would
- * add yet more locking issues.
+ * Also, some actions to concurrent drop the auxiliary index are performed.
  */
 void
-validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 {
 	Relation	heapRelation,
-				indexRelation;
+				indexRelation,
+				auxIndexRelation;
 	IndexInfo  *indexInfo;
-	IndexVacuumInfo ivinfo;
-	ValidateIndexState state;
+	IndexVacuumInfo ivinfo, auxivinfo;
+	ValidateIndexState state, auxState;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	/* Use 80% of maintenance_work_mem to target index sorting and
+	 * 10% rest for auxiliary.
+	 *
+	 * Rest 10% will be used for tuplestore later. */
+	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
+	int			aux_work_mem_part = maintenance_work_mem / 10;
 
 	{
 		const int	progress_index[] = {
@@ -3387,6 +3570,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	RestrictSearchPath();
 
 	indexRelation = index_open(indexId, RowExclusiveLock);
+	auxIndexRelation = index_open(auxIndexId, RowExclusiveLock);
 
 	/*
 	 * Fetch info needed for index_insert.  (You might think this should be
@@ -3411,15 +3595,49 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.strategy = NULL;
 	ivinfo.validate_index = true;
 
+	/*
+	 * Copy all info to auxiliary info, changing only relation.
+	 */
+	auxivinfo = ivinfo;
+	auxivinfo.index = auxIndexRelation;
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
 	 * is a pass-by-reference type on all platforms, whereas int8 is
 	 * pass-by-value on most platforms.
 	 */
+	auxState.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
+										   InvalidOid, false,
+										   aux_work_mem_part,
+										   NULL, TUPLESORT_NONE);
+	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
+
+	(void) index_bulk_delete(&auxivinfo, NULL,
+							 validate_index_callback, &auxState);
+	/* If aux index is empty, merge may be skipped */
+	if (auxState.itups == 0)
+	{
+		tuplesort_end(auxState.tuplesort);
+		auxState.tuplesort = NULL;
+
+		/* Roll back any GUC changes executed by index functions */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+
+		/* Close rels, but keep locks */
+		index_close(auxIndexRelation, NoLock);
+		index_close(indexRelation, NoLock);
+		table_close(heapRelation, NoLock);
+
+		return;
+	}
+
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
 											InvalidOid, false,
-											maintenance_work_mem,
+											(int) main_work_mem_part,
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
@@ -3442,27 +3660,30 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	tuplesort_performsort(auxState.tuplesort);
 
 	/*
-	 * Now scan the heap and "merge" it with the index
+	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN);
+								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
 	table_index_validate_scan(heapRelation,
 							  indexRelation,
 							  indexInfo,
 							  snapshot,
-							  &state);
+							  &state,
+							  &auxState);
 
-	/* Done with tuplesort object */
-	tuplesort_end(state.tuplesort);
+	/* Tuple sort closed by table_index_validate_scan */
+	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
 
 	/* Make sure to release resources cached in indexInfo (if needed). */
 	index_insert_cleanup(indexRelation, indexInfo);
 
 	elog(DEBUG2,
-		 "validate_index found %.0f heap tuples, %.0f index tuples; inserted %.0f missing tuples",
-		 state.htups, state.itups, state.tups_inserted);
+		 "validate_index fetched %.0f heap tuples, %.0f index tuples;"
+						" %.0f aux index tuples; inserted %.0f missing tuples",
+		 state.htups, state.itups, auxState.itups, state.tups_inserted);
 
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
@@ -3471,6 +3692,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
 	/* Close rels, but keep locks */
+	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
 }
@@ -3531,6 +3753,12 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			Assert(!indexForm->indisvalid);
 			indexForm->indisvalid = true;
 			break;
+		case INDEX_DROP_CLEAR_READY:
+			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
+			Assert(indexForm->indisready);
+			Assert(!indexForm->indisvalid);
+			indexForm->indisready = false;
+			break;
 		case INDEX_DROP_CLEAR_VALID:
 
 			/*
@@ -3802,6 +4030,13 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		indexInfo->ii_ExclusionStrats = NULL;
 	}
 
+	/* Auxiliary indexes are not allowed to be rebuilt */
+	if (indexInfo->ii_Auxiliary)
+		ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("reindex of auxiliary index \"%s\" not supported",
+					RelationGetRelationName(iRel))));
+
 	/* Suppress use of the target index while rebuilding it */
 	SetReindexProcessing(heapId, indexId);
 
@@ -4044,6 +4279,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
+		Oid			indexAm = get_rel_relam(indexOid);
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4069,6 +4305,18 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
+		if (indexAm == STIR_AM_OID)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+							get_namespace_name(indexNamespaceId),
+							get_rel_name(indexOid))));
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			continue;
+		}
+
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1ea8f1faa9e..2e8dfddaad9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1343,16 +1343,17 @@ CREATE VIEW pg_stat_progress_create_index AS
                       END AS command,
         CASE S.param10 WHEN 0 THEN 'initializing'
                        WHEN 1 THEN 'waiting for writers before build'
-                       WHEN 2 THEN 'building index' ||
+                       WHEN 2 THEN 'waiting for writers to use auxiliary index'
+                       WHEN 3 THEN 'building index' ||
                            COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)),
                                     '')
-                       WHEN 3 THEN 'waiting for writers before validation'
-                       WHEN 4 THEN 'index validation: scanning index'
-                       WHEN 5 THEN 'index validation: sorting tuples'
-                       WHEN 6 THEN 'index validation: scanning table'
-                       WHEN 7 THEN 'waiting for old snapshots'
-                       WHEN 8 THEN 'waiting for readers before marking dead'
-                       WHEN 9 THEN 'waiting for readers before dropping'
+                       WHEN 4 THEN 'waiting for writers before validation'
+                       WHEN 5 THEN 'index validation: scanning index'
+                       WHEN 6 THEN 'index validation: sorting tuples'
+                       WHEN 7 THEN 'index validation: merging indexes'
+                       WHEN 8 THEN 'waiting for old snapshots'
+                       WHEN 9 THEN 'waiting for readers before marking dead'
+                       WHEN 10 THEN 'waiting for readers before dropping'
                        END as phase,
         S.param4 AS lockers_total,
         S.param5 AS lockers_done,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 635679cc1f2..f583239e091 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -182,6 +182,7 @@ CheckIndexCompatible(Oid oldId,
 					 bool isWithoutOverlaps)
 {
 	bool		isconstraint;
+	bool		isauxiliary;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
 	Oid		   *opclassIds;
@@ -232,6 +233,7 @@ CheckIndexCompatible(Oid oldId,
 
 	amcanorder = amRoutine->amcanorder;
 	amsummarizing = amRoutine->amsummarizing;
+	isauxiliary = accessMethodId == STIR_AM_OID;
 
 	/*
 	 * Compute the operator classes, collations, and exclusion operators for
@@ -243,7 +245,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing,
+							  isWithoutOverlaps, isauxiliary);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -556,6 +559,7 @@ DefineIndex(ParseState *pstate,
 {
 	bool		concurrent;
 	char	   *indexRelationName;
+	char	   *auxIndexRelationName = NULL;
 	char	   *accessMethodName;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
@@ -565,6 +569,7 @@ DefineIndex(ParseState *pstate,
 	Oid			namespaceId;
 	Oid			tablespaceId;
 	Oid			createdConstraintId = InvalidOid;
+	Oid			auxIndexRelationId = InvalidOid;
 	List	   *indexColNames;
 	List	   *allIndexParams;
 	Relation	rel;
@@ -586,6 +591,7 @@ DefineIndex(ParseState *pstate,
 	int			numberOfKeyAttributes;
 	TransactionId limitXmin;
 	ObjectAddress address;
+	ObjectAddress auxAddress;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
@@ -836,6 +842,15 @@ DefineIndex(ParseState *pstate,
 											stmt->excludeOpNames,
 											stmt->primary,
 											stmt->isconstraint);
+	/*
+	 * Select name for auxiliary index
+	 */
+	if (concurrent)
+		auxIndexRelationName = ChooseRelationName(indexRelationName,
+												  NULL,
+												  "ccaux",
+												  namespaceId,
+												  false);
 
 	/*
 	 * look up the access method, verify it can handle the requested features
@@ -930,7 +945,8 @@ DefineIndex(ParseState *pstate,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  false);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -1600,6 +1616,16 @@ DefineIndex(ParseState *pstate,
 		return address;
 	}
 
+	/*
+	 * In case of concurrent build - create auxiliary index record.
+	 */
+	if (concurrent)
+	{
+		auxIndexRelationId = index_concurrently_create_aux(rel, indexRelationId,
+											tablespaceId, auxIndexRelationName);
+		ObjectAddressSet(auxAddress, RelationRelationId, auxIndexRelationId);
+	}
+
 	AtEOXact_GUC(false, root_save_nestlevel);
 	SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 
@@ -1628,11 +1654,11 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * For a concurrent build, it's important to make the catalog entries
 	 * visible to other transactions before we start to build the index. That
-	 * will prevent them from making incompatible HOT updates.  The new index
-	 * will be marked not indisready and not indisvalid, so that no one else
-	 * tries to either insert into it or use it for queries.
+	 * will prevent them from making incompatible HOT updates. New indexes
+	 * (main and auxiliary) will be marked not indisready and not indisvalid,
+	 * so that no one else tries to either insert into it or use it for queries.
 	 *
-	 * We must commit our current transaction so that the index becomes
+	 * We must commit our current transaction so that the indexes becomes
 	 * visible; then start another.  Note that all the data structures we just
 	 * built are lost in the commit.  The only data we keep past here are the
 	 * relation IDs.
@@ -1642,7 +1668,7 @@ DefineIndex(ParseState *pstate,
 	 * cannot block, even if someone else is waiting for access, because we
 	 * already have the same lock within our transaction.
 	 *
-	 * Note: we don't currently bother with a session lock on the index,
+	 * Note: we don't currently bother with a session lock on the indexes,
 	 * because there are no operations that could change its state while we
 	 * hold lock on the parent table.  This might need to change later.
 	 */
@@ -1681,7 +1707,7 @@ DefineIndex(ParseState *pstate,
 	 * with the old list of indexes.  Use ShareLock to consider running
 	 * transactions that hold locks that permit writing to the table.  Note we
 	 * do not need to worry about xacts that open the table for writing after
-	 * this point; they will see the new index when they open it.
+	 * this point; they will see the new indexes when they open it.
 	 *
 	 * Note: the reason we use actual lock acquisition here, rather than just
 	 * checking the ProcArray and sleeping, is that deadlock is possible if
@@ -1693,14 +1719,44 @@ DefineIndex(ParseState *pstate,
 
 	/*
 	 * At this moment we are sure that there are no transactions with the
-	 * table open for write that don't have this new index in their list of
+	 * table open for write that don't have this new indexes in their list of
 	 * indexes.  We have waited out all the existing transactions and any new
-	 * transaction will have the new index in its list, but the index is still
-	 * marked as "not-ready-for-inserts".  The index is consulted while
+	 * transaction will have both new indexes in its list, but indexes are still
+	 * marked as "not-ready-for-inserts". The indexes are consulted while
 	 * deciding HOT-safety though.  This arrangement ensures that no new HOT
 	 * chains can be created where the new tuple and the old tuple in the
 	 * chain have different index keys.
 	 *
+	 * Now call build on auxiliary index. Index will be created empty without
+	 * any actual heap scan, but marked as "ready-for-inserts". The goal of
+	 * that index is accumulate new tuples while main index is actually built.
+	 */
+
+	/* Set ActiveSnapshot since functions in the indexes may need it */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	index_concurrently_build(tableId, auxIndexRelationId);
+	/* we can do away with our snapshot */
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Now we need to ensure are no transactions with the with auxiliary index
+	 * marked as "not-ready-for-inserts".
+	 */
+	WaitForLockers(heaplocktag, ShareLock, true);
+
+	/*
+	 * At this moment we are sure that all new tuples in table are inserted into
+	 * the auxiliary index. Now it is time to build the target index itself.
+	 *
 	 * We now take a new snapshot, and build the index using all tuples that
 	 * are visible in this snapshot.  We can be sure that any HOT updates to
 	 * these tuples will be compatible with the index, since any updates made
@@ -1735,9 +1791,28 @@ DefineIndex(ParseState *pstate,
 	 * the index marked as read-only for updates.
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForLockers(heaplocktag, ShareLock, true);
 
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/*
+	 * Now target index is marked as "ready" for all transactions. So, auxiliary
+	 * index is no longer needed. So, start removing process by reverting "ready"
+	 * flag.
+	 */
+	index_set_state_flags(auxIndexRelationId, INDEX_DROP_CLEAR_READY);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * Now take the "reference snapshot" that will be used by validate_index()
 	 * to filter candidate tuples.  Beware!  There might still be snapshots in
@@ -1755,24 +1830,14 @@ DefineIndex(ParseState *pstate,
 	 */
 	snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	PushActiveSnapshot(snapshot);
-
 	/*
-	 * Scan the index and the heap, insert any missing index entries.
-	 */
-	validate_index(tableId, indexRelationId, snapshot);
-
-	/*
-	 * Drop the reference snapshot.  We must do this before waiting out other
-	 * snapshot holders, else we will deadlock against other processes also
-	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
-	 * they must wait for.  But first, save the snapshot's xmin to use as
-	 * limitXmin for GetCurrentVirtualXIDs().
+	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
+	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
 	limitXmin = snapshot->xmin;
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
-
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1799,7 +1864,7 @@ DefineIndex(ParseState *pstate,
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForOlderSnapshots(limitXmin, true);
 
 	/*
@@ -1824,6 +1889,53 @@ DefineIndex(ParseState *pstate,
 	 * to replan; so relcache flush on the index itself was sufficient.)
 	 */
 	CacheInvalidateRelcacheByRelid(heaprelid.relId);
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+	/* Now wait for all transaction to see auxiliary as "non-ready for inserts" */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/* Now it is time to mark auxiliary index as dead */
+	index_concurrently_set_dead(tableId, auxIndexRelationId);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_6);
+	/* Now wait for all transaction to ignore auxiliary because it is dead */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Drop auxiliary index.
+	 *
+	 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
+	 * right lock level.
+	 */
+	performDeletion(&auxAddress, DROP_RESTRICT,
+							 PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
 
 	/*
 	 * Last thing to do is release the session-level lock on the parent table.
@@ -3595,6 +3707,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	typedef struct ReindexIndexInfo
 	{
 		Oid			indexId;
+		Oid			auxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -3700,8 +3813,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
 														   ShareUpdateExclusiveLock);
+					IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-					if (!indexRelation->rd_index->indisvalid)
+
+					if (indexInfo->ii_Auxiliary)
+						ereport(WARNING,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+									get_namespace_name(get_rel_namespace(cellOid)),
+									get_rel_name(cellOid))));
+					else if (!indexRelation->rd_index->indisvalid)
 						ereport(WARNING,
 								(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 								 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3753,8 +3873,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
 															   ShareUpdateExclusiveLock);
+						IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-						if (!indexRelation->rd_index->indisvalid)
+						if (indexInfo->ii_Auxiliary)
+							ereport(WARNING,
+									(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+									 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+											get_namespace_name(get_rel_namespace(cellOid)),
+											get_rel_name(cellOid))));
+						else if (!indexRelation->rd_index->indisvalid)
 							ereport(WARNING,
 									(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 									 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3815,6 +3942,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot reindex invalid index on TOAST table")));
 
+				/* Auxiliary indexes are not allowed to be rebuilt */
+				if (get_rel_relam(relationOid) == STIR_AM_OID)
+					ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("reindex of auxiliary index \"%s\" not supported",
+								get_rel_name(relationOid))));
+
 				/*
 				 * Check if parent relation can be locked and if it exists,
 				 * this needs to be done at this stage as the list of indexes
@@ -3918,15 +4052,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
+		char	   *auxConcurrentName;
 		ReindexIndexInfo *idx = lfirst(lc);
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
+		Oid			auxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
 		int			save_sec_context;
 		int			save_nestlevel;
 		Relation	newIndexRel;
+		Relation	auxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -3977,6 +4114,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 											"ccnew",
 											get_rel_namespace(indexRel->rd_index->indrelid),
 											false);
+		auxConcurrentName = ChooseRelationName(get_rel_name(idx->indexId),
+											NULL,
+											"ccaux",
+											get_rel_namespace(indexRel->rd_index->indrelid),
+											false);
 
 		/* Choose the new tablespace, indexes of toast tables are not moved */
 		if (OidIsValid(params->tablespaceOid) &&
@@ -3990,12 +4132,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 													idx->indexId,
 													tablespaceid,
 													concurrentName);
+		auxIndexId = index_concurrently_create_aux(heapRel,
+												   newIndexId,
+												   tablespaceid,
+												   auxConcurrentName);
 
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
+		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4004,6 +4151,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
+		newidx->auxIndexId = auxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4022,10 +4170,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		lockrelid = palloc_object(LockRelId);
+		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
+		relationLocks = lappend(relationLocks, lockrelid);
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
 		/* Roll back any GUC changes executed by index functions */
@@ -4106,13 +4258,60 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * doing that, wait until no running transactions could have the table of
 	 * the index open with the old list of indexes.  See "phase 2" in
 	 * DefineIndex() for more details.
+	*/
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * Now build all auxiliary indexes and mark them as "ready-for-inserts".
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		StartTransactionCommand();
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/* Set ActiveSnapshot since functions in the indexes may need it */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Build auxiliary index, it is fast - without any actual heap scan, just an empty index. */
+		index_concurrently_build(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	StartTransactionCommand();
+
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Wait until all auxiliary indexes are taken into account by all
+	 * transactions.
+	 */
 	WaitForLockersMultiple(lockTags, ShareLock, true);
 	CommitTransactionCommand();
 
+	/* Now it is time to perform target index build. */
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4159,6 +4358,41 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * need to set the PROC_IN_SAFE_IC flag here.
 	 */
 
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * At this moment all target indexes are marked as "ready-to-insert". So,
+	 * we are free to start process of dropping auxiliary indexes.
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+		StartTransactionCommand();
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		PopActiveSnapshot();
+
+		CommitTransactionCommand();
+	}
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -4166,12 +4400,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * were created during the previous phase.  See "phase 3" in DefineIndex()
 	 * for more details.
 	 */
-
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
-	WaitForLockersMultiple(lockTags, ShareLock, true);
-	CommitTransactionCommand();
-
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4209,7 +4437,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -4238,7 +4466,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * there's no need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-									 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+									 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 		WaitForOlderSnapshots(limitXmin, true);
 
 		CommitTransactionCommand();
@@ -4329,14 +4557,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
-	 * Mark the old indexes as dead.  First we must wait until no running
-	 * transaction could be using the index for a query.  See also
+	 * Mark the old and auxiliary indexes as dead. First we must wait until no
+	 * running transaction could be using the index for a query.  See also
 	 * index_drop() for more details.
 	 */
 
 	INJECTION_POINT("reindex-relation-concurrently-before-set-dead", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	foreach(lc, indexIds)
@@ -4361,6 +4589,28 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PopActiveSnapshot();
 	}
 
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+	}
+
 	/* Commit this transaction to make the updates visible. */
 	CommitTransactionCommand();
 	StartTransactionCommand();
@@ -4374,11 +4624,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
-	 * Drop the old indexes.
+	 * Drop the old and auxiliary indexes.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_6);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -4398,6 +4648,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			add_exact_object_address(&object, objects);
 		}
 
+		foreach(lc, newIndexIds)
+		{
+			ReindexIndexInfo *idx = lfirst(lc);
+			ObjectAddress object;
+
+			object.classId = RelationRelationId;
+			object.objectId = idx->auxIndexId;
+			object.objectSubId = 0;
+
+			add_exact_object_address(&object, objects);
+		}
+
 		/*
 		 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
 		 * right lock level.
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 09ae445694d..8cb2231a7a8 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, bool auxiliary)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -850,6 +850,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Concurrent = concurrent;
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
+	n->ii_Auxiliary = auxiliary;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
@@ -875,7 +876,6 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
-	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 119593b7b46..7c37cb5a248 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -706,7 +706,8 @@ typedef struct TableAmRoutine
 										Relation index_rel,
 										IndexInfo *index_info,
 										Snapshot snapshot,
-										ValidateIndexState *state);
+										ValidateIndexState *state,
+										ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1825,19 +1826,24 @@ table_index_build_range_scan(Relation table_rel,
  * table_index_validate_scan - second table scan for concurrent index build
  *
  * See validate_index() for an explanation.
+ *
+ * Note: it is responsibility of that function to close sortstates in
+ * both `state` and `auxstate`.
  */
 static inline void
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
 						  Snapshot snapshot,
-						  ValidateIndexState *state)
+						  ValidateIndexState *state,
+						  ValidateIndexState *auxstate)
 {
 	table_rel->rd_tableam->index_validate_scan(table_rel,
 											   index_rel,
 											   index_info,
 											   snapshot,
-											   state);
+											   state,
+											   auxstate);
 }
 
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index b259c4141ed..37a390d33de 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -25,6 +25,7 @@ typedef enum
 {
 	INDEX_CREATE_SET_READY,
 	INDEX_CREATE_SET_VALID,
+	INDEX_DROP_CLEAR_READY,
 	INDEX_DROP_CLEAR_VALID,
 	INDEX_DROP_SET_DEAD,
 } IndexStateFlagsAction;
@@ -65,6 +66,7 @@ extern void index_check_primary_key(Relation heapRel,
 #define	INDEX_CREATE_IF_NOT_EXISTS			(1 << 4)
 #define	INDEX_CREATE_PARTITIONED			(1 << 5)
 #define INDEX_CREATE_INVALID				(1 << 6)
+#define INDEX_CREATE_AUXILIARY				(1 << 7)
 
 extern Oid	index_create(Relation heapRelation,
 						 const char *indexRelationName,
@@ -100,6 +102,11 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid tablespaceOid,
 										   const char *newName);
 
+extern Oid	index_concurrently_create_aux(Relation heapRelation,
+										  Oid mainIndexId,
+										  Oid tablespaceOid,
+										  const char *newName);
+
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
 
@@ -145,7 +152,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
+extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 359221dc296..841491a8511 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -111,14 +111,15 @@
 
 /* Phases of CREATE INDEX (as advertised via PROGRESS_CREATEIDX_PHASE) */
 #define PROGRESS_CREATEIDX_PHASE_WAIT_1			1
-#define PROGRESS_CREATEIDX_PHASE_BUILD			2
-#define PROGRESS_CREATEIDX_PHASE_WAIT_2			3
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	4
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		5
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN	6
-#define PROGRESS_CREATEIDX_PHASE_WAIT_3			7
+#define PROGRESS_CREATEIDX_PHASE_WAIT_2			2
+#define PROGRESS_CREATEIDX_PHASE_BUILD			3
+#define PROGRESS_CREATEIDX_PHASE_WAIT_3			4
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	5
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		6
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE	7
 #define PROGRESS_CREATEIDX_PHASE_WAIT_4			8
 #define PROGRESS_CREATEIDX_PHASE_WAIT_5			9
+#define PROGRESS_CREATEIDX_PHASE_WAIT_6			10
 
 /*
  * Subphases of CREATE INDEX, for index_build.
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 982ec25ae14..6cf45a68cbe 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,8 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								bool auxiliary);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index c743fc769cb..aa4fa76358a 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1423,6 +1423,7 @@ DETAIL:  Key (f1)=(b) already exists.
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
 ERROR:  could not create unique index "concur_index3"
 DETAIL:  Key (f2)=(b) is duplicated.
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -3197,6 +3198,7 @@ INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
 ERROR:  could not create unique index "concur_reindex_ind5"
 DETAIL:  Key (c1)=(1) is duplicated.
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
@@ -3209,8 +3211,10 @@ DETAIL:  Key (c1)=(1) is duplicated.
  c1     | integer |           |          | 
 Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1) INVALID
+    "concur_reindex_ind5_ccaux" stir (c1) INVALID
     "concur_reindex_ind5_ccnew" UNIQUE, btree (c1) INVALID
 
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -3238,6 +3242,44 @@ Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1)
 
 DROP TABLE concur_reindex_tab4;
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+ERROR:  relation "concur_reindex_tab4" does not exist
+LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+                    ^
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
+HINT:  Use DROP INDEX or REINDEX INDEX.
+WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index dc629928c8f..9b06ddc87a2 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1585,10 +1585,11 @@ select indexrelid::regclass, indisvalid,
 --------------------------------+------------+-----------------------+-------------------------------
  parted_isvalid_idx             | f          | parted_isvalid_tab    | 
  parted_isvalid_idx_11          | f          | parted_isvalid_tab_11 | parted_isvalid_tab_1_expr_idx
+ parted_isvalid_idx_11_ccaux    | f          | parted_isvalid_tab_11 | 
  parted_isvalid_tab_12_expr_idx | t          | parted_isvalid_tab_12 | parted_isvalid_tab_1_expr_idx
  parted_isvalid_tab_1_expr_idx  | f          | parted_isvalid_tab_1  | parted_isvalid_idx
  parted_isvalid_tab_2_expr_idx  | t          | parted_isvalid_tab_2  | parted_isvalid_idx
-(5 rows)
+(6 rows)
 
 drop table parted_isvalid_tab;
 -- Check state of replica indexes when attaching a partition.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 78a37d9fc8f..9e3462d74f9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2068,14 +2068,15 @@ pg_stat_progress_create_index| SELECT s.pid,
         CASE s.param10
             WHEN 0 THEN 'initializing'::text
             WHEN 1 THEN 'waiting for writers before build'::text
-            WHEN 2 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
-            WHEN 3 THEN 'waiting for writers before validation'::text
-            WHEN 4 THEN 'index validation: scanning index'::text
-            WHEN 5 THEN 'index validation: sorting tuples'::text
-            WHEN 6 THEN 'index validation: scanning table'::text
-            WHEN 7 THEN 'waiting for old snapshots'::text
-            WHEN 8 THEN 'waiting for readers before marking dead'::text
-            WHEN 9 THEN 'waiting for readers before dropping'::text
+            WHEN 2 THEN 'waiting for writers to use auxiliary index'::text
+            WHEN 3 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
+            WHEN 4 THEN 'waiting for writers before validation'::text
+            WHEN 5 THEN 'index validation: scanning index'::text
+            WHEN 6 THEN 'index validation: sorting tuples'::text
+            WHEN 7 THEN 'index validation: merging indexes'::text
+            WHEN 8 THEN 'waiting for old snapshots'::text
+            WHEN 9 THEN 'waiting for readers before marking dead'::text
+            WHEN 10 THEN 'waiting for readers before dropping'::text
             ELSE NULL::text
         END AS phase,
     s.param4 AS lockers_total,
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index eabc9623b20..7ae8e44019b 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -499,6 +499,7 @@ CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS concur_index2 ON concur_heap(f1);
 INSERT INTO concur_heap VALUES ('b','x');
 -- check if constraint is enforced properly at build time
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -1311,10 +1312,12 @@ CREATE TABLE concur_reindex_tab4 (c1 int);
 INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 -- This trick creates an invalid index.
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -1326,6 +1329,24 @@ REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
 DROP TABLE concur_reindex_tab4;
 
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+\d aux_index_tab5
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+DROP TABLE aux_index_tab5;
+
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
-- 
2.53.0



  [text/x-patch] v29-0001-Add-stress-tests-for-concurrent-index-builds.patch (9.3K, 5-v29-0001-Add-stress-tests-for-concurrent-index-builds.patch)
  download | inline diff:
From ea7c3ac7ceff8a38ba6780d9d124233e0f985fd8 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v29 1/7] Add stress tests for concurrent index builds

Introduce stress tests for concurrent index operations:
- test concurrent inserts/updates during CREATE/REINDEX INDEX CONCURRENTLY
- cover various index types (btree, gin, gist, brin, hash, spgist)
- test unique and non-unique indexes
- test with expressions and predicates
- test both parallel and non-parallel operations

These tests verify the behavior of the following commits.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 225 ++++++++++++++++++++++++++++++++
 2 files changed, 226 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 592cef74ecb..51a62dccb7b 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..f160f9d18d7
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,225 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'maintenance_work_mem = 32MB'); # to avoid OOM
+$node->append_conf('postgresql.conf', 'shared_buffers = 32MB'); # to avoid OOM
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp,
+								ia int4[], p point)));
+# uncomment to force non-HOT -> $node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);));
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=15 --jobs=4 --exit-on-abort --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => q(
+			SET debug_parallel_query = off; -- this is because predicate_stable implementation
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=15 --jobs=4 --exit-on-abort --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for unique BTREE',
+	{
+		'concurrent_ops_unique_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE UNIQUE INDEX CONCURRENTLY new_idx ON tbl(i);
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIN with upserts
+$node->pgbench(
+	'--no-vacuum --client=15 --jobs=4 --exit-on-abort --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN/GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_gin_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIN (ia);
+					\sleep 10 ms
+					SELECT gin_index_check('new_idx');
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT gin_index_check('new_idx');
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIST/BRIN/HASH/SPGIST index concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=15 --jobs=4 --exit-on-abort --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN/GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_other_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\set variant random(0, 3)
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIST (p);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING BRIN (updated_at);
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING HASH (updated_at);
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING SPGIST (p);
+					\endif
+					\sleep 10 ms
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->stop;
+done_testing();
\ No newline at end of file
-- 
2.53.0



  [text/x-patch] v29-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch (19.7K, 6-v29-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch)
  download | inline diff:
From 9d3c48caec41d649080acc7d37a4afda5cbf45cf Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 12 Jan 2026 00:57:56 +0300
Subject: [PATCH v29 3/7] Add Datum storage support to tuplestore Extend
 tuplestore to store individual Datum values: - fixed-length datatypes and
 variable-length datatypes: include a length header - by-value types: store
 inline with one extra byte (but without support of random access)

This support enables usages of tuplestore for non-tuple data (TIDs) in the next commit.
---
 src/backend/utils/sort/tuplestore.c | 330 +++++++++++++++++++++++-----
 src/include/utils/tuplestore.h      |  33 +--
 2 files changed, 293 insertions(+), 70 deletions(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index afba82f28a2..692e325eafd 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1,16 +1,19 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.c
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
+ *
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
  * Also, it is possible to support multiple independent read pointers.
  *
  * A temporary file is used to handle the data if it exceeds the
@@ -61,6 +64,8 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "storage/buffile.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
 
@@ -115,16 +120,15 @@ struct Tuplestorestate
 	BufFile    *myfile;			/* underlying file, or NULL if none */
 	MemoryContext context;		/* memory context for holding tuples */
 	ResourceOwner resowner;		/* resowner for holding temp files */
+	Oid			datumType;		/* InvalidOid or oid of Datum's to be stored */
+	int16		datumTypeLen;	/* typelen of that Datum */
+	bool		datumTypeByVal; /* by-value of that Datum */
 
 	/*
 	 * These function pointers decouple the routines that must know what kind
 	 * of tuple we are handling from the routines that don't need to know it.
 	 * They are set up by the tuplestore_begin_xxx routines.
 	 *
-	 * (Although tuplestore.c currently only supports heap tuples, I've copied
-	 * this part of tuplesort.c so that extension to other kinds of objects
-	 * will be easy if it's ever needed.)
-	 *
 	 * Function to copy a supplied input tuple into palloc'd space. (NB: we
 	 * assume that a single pfree() is enough to release the tuple later, so
 	 * the representation must be "flat" in one palloc chunk.) state->availMem
@@ -149,6 +153,12 @@ struct Tuplestorestate
 	 */
 	void	   *(*readtup) (Tuplestorestate *state, unsigned int len);
 
+	/*
+	 * Function to get length of tuple from tape. Used to provide 'len' argument
+	 * for readtup (see above).
+	 */
+	unsigned int(*lentup) (Tuplestorestate *state, bool eofOK);
+
 	/*
 	 * This array holds pointers to tuples in memory if we are in state INMEM.
 	 * In states WRITEFILE and READFILE it's not used.
@@ -185,6 +195,7 @@ struct Tuplestorestate
 #define COPYTUP(state,tup)	((*(state)->copytup) (state, tup))
 #define WRITETUP(state,tup) ((*(state)->writetup) (state, tup))
 #define READTUP(state,len)	((*(state)->readtup) (state, len))
+#define LENTUP(state,eofOK)	((*(state)->lentup) (state, eofOK))
 #define LACKMEM(state)		((state)->availMem < 0)
 #define USEMEM(state,amt)	((state)->availMem -= (amt))
 #define FREEMEM(state,amt)	((state)->availMem += (amt))
@@ -193,9 +204,9 @@ struct Tuplestorestate
  *
  * NOTES about on-tape representation of tuples:
  *
- * We require the first "unsigned int" of a stored tuple to be the total size
- * on-tape of the tuple, including itself (so it is never zero).
- * The remainder of the stored tuple
+ * In case of tuples we use first "unsigned int" of a stored tuple
+ * to be the total size on-tape of the tuple, including itself
+ * (so it is never zero). The remainder of the stored tuple
  * may or may not match the in-memory representation of the tuple ---
  * any conversion needed is the job of the writetup and readtup routines.
  *
@@ -206,10 +217,13 @@ struct Tuplestorestate
  * state->backward is not set, the write/read routines may omit the extra
  * length word.
  *
- * writetup is expected to write both length words as well as the tuple
+ * In the case of Datum with constant length, both "unsigned int" are omitted.
+ *
+ * writetup is expected to write both length words and the tuple
  * data.  When readtup is called, the tape is positioned just after the
- * front length word; readtup must read the tuple data and advance past
- * the back length word (if present).
+ * front length word (if it is not omitted like in case of content-size Datum);
+ * readtup must read the tuple data and advance past the back length word
+ * (if present).
  *
  * The write/read routines can make use of the tuple description data
  * stored in the Tuplestorestate record, if needed. They are also expected
@@ -241,11 +255,16 @@ static Tuplestorestate *tuplestore_begin_common(int eflags,
 static void tuplestore_puttuple_common(Tuplestorestate *state, void *tuple);
 static void dumptuples(Tuplestorestate *state);
 static void tuplestore_updatemax(Tuplestorestate *state);
-static unsigned int getlen(Tuplestorestate *state, bool eofOK);
+
+static unsigned int lentup_heap(Tuplestorestate *state, bool eofOK);
 static void *copytup_heap(Tuplestorestate *state, void *tup);
 static void writetup_heap(Tuplestorestate *state, void *tup);
 static void *readtup_heap(Tuplestorestate *state, unsigned int len);
 
+static unsigned int lentup_datum(Tuplestorestate *state, bool eofOK);
+static void *copytup_datum(Tuplestorestate *state, void *datum);
+static void writetup_datum(Tuplestorestate *state, void *datum);
+static void *readtup_datum(Tuplestorestate *state, unsigned int len);
 
 /*
  *		tuplestore_begin_xxx
@@ -268,6 +287,12 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
 	state->allowedMem = maxKBytes * (int64) 1024;
 	state->availMem = state->allowedMem;
 	state->myfile = NULL;
+	/*
+	 * Set Datum related data to invalid by default.
+	 */
+	state->datumType = InvalidOid;
+	state->datumTypeLen = 0;
+	state->datumTypeByVal = false;
 
 	/*
 	 * The palloc/pfree pattern for tuple memory is in a FIFO pattern.  A
@@ -345,6 +370,37 @@ tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
 	state->copytup = copytup_heap;
 	state->writetup = writetup_heap;
 	state->readtup = readtup_heap;
+	state->lentup = lentup_heap;
+
+	return state;
+}
+
+/*
+ * The same as tuplestore_begin_heap but create store for Datum values.
+ */
+Tuplestorestate *
+tuplestore_begin_datum(Oid datumType, bool randomAccess, bool interXact, int maxKBytes)
+{
+	Tuplestorestate *state;
+	int			eflags;
+
+	/*
+	 * This interpretation of the meaning of randomAccess is compatible with
+	 * the pre-8.3 behavior of tuplestores.
+	 */
+	eflags = randomAccess ?
+		(EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND) :
+		(EXEC_FLAG_REWIND);
+
+	state = tuplestore_begin_common(eflags, interXact, maxKBytes);
+	state->datumType = datumType;
+	get_typlenbyval(state->datumType, &state->datumTypeLen, &state->datumTypeByVal);
+	Assert(!(state->datumTypeByVal && randomAccess));
+
+	state->copytup = copytup_datum;
+	state->writetup = writetup_datum;
+	state->readtup = readtup_datum;
+	state->lentup = lentup_datum;
 
 	return state;
 }
@@ -443,16 +499,19 @@ tuplestore_clear(Tuplestorestate *state)
 	{
 		int64		availMem = state->availMem;
 
-		/*
-		 * Below, we reset the memory context for storing tuples.  To save
-		 * from having to always call GetMemoryChunkSpace() on all stored
-		 * tuples, we adjust the availMem to forget all the tuples and just
-		 * recall USEMEM for the space used by the memtuples array.  Here we
-		 * just Assert that's correct and the memory tracking hasn't gone
-		 * wrong anywhere.
-		 */
-		for (i = state->memtupdeleted; i < state->memtupcount; i++)
-			availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			/*
+			 * Below, we reset the memory context for storing tuples.  To save
+			 * from having to always call GetMemoryChunkSpace() on all stored
+			 * tuples, we adjust the availMem to forget all the tuples and just
+			 * recall USEMEM for the space used by the memtuples array.  Here we
+			 * just Assert that's correct and the memory tracking hasn't gone
+			 * wrong anywhere.
+			 */
+			for (i = state->memtupdeleted; i < state->memtupcount; i++)
+				availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		}
 
 		availMem += GetMemoryChunkSpace(state->memtuples);
 
@@ -776,6 +835,25 @@ tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple)
 	MemoryContextSwitchTo(oldcxt);
 }
 
+/*
+ * Like tuplestore_puttupleslot but for single Datum.
+ */
+void
+tuplestore_putdatum(Tuplestorestate *state, Datum datum)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->context);
+
+	/*
+	 * Copy the Datum.  (Must do this even in WRITEFILE case.  Note that
+	 * COPYTUP includes USEMEM, so we needn't do that here.)
+	 */
+	datum = PointerGetDatum(COPYTUP(state, DatumGetPointer(datum)));
+
+	tuplestore_puttuple_common(state, DatumGetPointer(datum));
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
 /*
  * Similar to tuplestore_puttuple(), but work from values + nulls arrays.
  * This avoids an extra tuple-construction operation.
@@ -1027,10 +1105,10 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 			pg_fallthrough;
 
 		case TSS_READFILE:
-			*should_free = true;
+			*should_free = !state->datumTypeByVal;
 			if (forward)
 			{
-				if ((tuplen = getlen(state, true)) != 0)
+				if ((tuplen = LENTUP(state, true)) != 0)
 				{
 					tup = READTUP(state, tuplen);
 					return tup;
@@ -1042,6 +1120,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				}
 			}
 
+			Assert(!state->datumTypeByVal);
 			/*
 			 * Backward.
 			 *
@@ -1059,7 +1138,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				Assert(!state->truncated);
 				return NULL;
 			}
-			tuplen = getlen(state, false);
+			tuplen = LENTUP(state, false);
 
 			if (readptr->eof_reached)
 			{
@@ -1090,7 +1169,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 					Assert(!state->truncated);
 					return NULL;
 				}
-				tuplen = getlen(state, false);
+				tuplen = LENTUP(state, false);
 			}
 
 			/*
@@ -1152,6 +1231,41 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 	}
 }
 
+bool
+tuplestore_getdatum(Tuplestorestate *state, bool forward,
+					bool *should_free, Datum *result)
+{
+	Datum datum;
+	*should_free = false;
+
+	datum = (Datum) tuplestore_gettuple(state, forward, should_free);
+
+	/* For by-value datum we may receive zero as valid value. */
+	if (state->datumTypeByVal)
+	{
+		/* Return false only on EOF */
+		if (state->readptrs[state->activeptr].eof_reached)
+		{
+			*result = PointerGetDatum(NULL);
+			return false;
+		}
+
+		*result = datum;
+		return true;
+	}
+
+	if (datum)
+	{
+		*result = datum;
+		return true;
+	}
+	else
+	{
+		*result = PointerGetDatum(NULL);
+		return false;
+	}
+}
+
 /*
  * tuplestore_advance - exported function to adjust position without fetching
  *
@@ -1460,8 +1574,11 @@ tuplestore_trim(Tuplestorestate *state)
 	/* Release no-longer-needed tuples */
 	for (i = state->memtupdeleted; i < nremove; i++)
 	{
-		FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
-		pfree(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
+			pfree(state->memtuples[i]);
+		}
 		state->memtuples[i] = NULL;
 	}
 	state->memtupdeleted = nremove;
@@ -1556,25 +1673,6 @@ tuplestore_in_memory(Tuplestorestate *state)
 	return (state->status == TSS_INMEM);
 }
 
-
-/*
- * Tape interface routines
- */
-
-static unsigned int
-getlen(Tuplestorestate *state, bool eofOK)
-{
-	unsigned int len;
-	size_t		nbytes;
-
-	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
-	if (nbytes == 0)
-		return 0;
-	else
-		return len;
-}
-
-
 /*
  * Routines specialized for HeapTuple case
  *
@@ -1585,6 +1683,19 @@ getlen(Tuplestorestate *state, bool eofOK)
  * to write that separately.
  */
 
+static unsigned int
+lentup_heap(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	else
+		return len;
+}
+
 static void *
 copytup_heap(Tuplestorestate *state, void *tup)
 {
@@ -1631,3 +1742,112 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 		BufFileReadExact(state->myfile, &tuplen, sizeof(tuplen));
 	return tuple;
 }
+
+/*
+ * Routines specialized for Datum case.
+ *
+ * Handles both fixed and variable-length Datums efficiently:
+ * - Fixed-length and Variable-length includes length prefix (and suffix if backward scan)
+ * - By-value types handled inline without extra copying, storing single extra byte
+ *   XXX: consider refactoring to avoid it, currently need it for correct rewind logic
+ */
+
+static unsigned int
+lentup_datum(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	Assert(state->datumType != InvalidOid);
+
+	if (state->datumTypeByVal)
+	{
+		uint8_t	junk;
+		nbytes = BufFileReadMaybeEOF(state->myfile, &junk, sizeof(uint8_t), eofOK);
+		if (nbytes == 0)
+			return 0;
+		Assert(junk == (uint8_t) state->datumTypeLen);
+		return state->datumTypeLen;
+	}
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	return len;
+}
+
+static void *
+copytup_datum(Tuplestorestate *state, void *datum)
+{
+	Datum d;
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+		return DatumGetPointer(PointerGetDatum(datum));
+
+	d = datumCopy(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+	USEMEM(state, GetMemoryChunkSpace(DatumGetPointer(d)));
+	return DatumGetPointer(d);
+}
+
+static void
+writetup_datum(Tuplestorestate *state, void *datum)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		uint8_t junk = state->datumTypeLen; /* overflow is ok */
+		Datum v;
+		Assert(state->datumTypeLen > 0);
+
+		/* just marker byte used to track the end of data for rewind logic */
+		BufFileWrite(state->myfile, &junk, sizeof(junk));
+		store_att_byval(&v, PointerGetDatum(datum), state->datumTypeLen);
+		BufFileWrite(state->myfile, &v, state->datumTypeLen);
+		Assert(!state->backward);
+	}
+	else
+	{
+		unsigned int size = state->datumTypeLen;
+		if (state->datumTypeLen < 0)
+			size = datumGetSize(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+
+		BufFileWrite(state->myfile, &size, sizeof(size));
+
+		BufFileWrite(state->myfile, datum, size);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileWrite(state->myfile, &size, sizeof(size));
+
+		FREEMEM(state, GetMemoryChunkSpace(datum));
+		pfree(datum);
+	}
+}
+
+static void *
+readtup_datum(Tuplestorestate *state, unsigned int len)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		Datum datum;
+
+		Assert(state->datumTypeLen > 0);
+		Assert(len == state->datumTypeLen);
+		BufFileReadExact(state->myfile, &datum, state->datumTypeLen);
+
+		Assert(!state->backward);
+		return DatumGetPointer(fetch_att(&datum, true, state->datumTypeLen));
+	}
+	else
+	{
+		Datum *data = palloc(len);
+		BufFileReadExact(state->myfile, data, len);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileReadExact(state->myfile, &len, sizeof(len));
+
+		return data;
+	}
+}
diff --git a/src/include/utils/tuplestore.h b/src/include/utils/tuplestore.h
index 1c08e219e89..665d6d57635 100644
--- a/src/include/utils/tuplestore.h
+++ b/src/include/utils/tuplestore.h
@@ -1,17 +1,18 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.h
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
- * Also, it is possible to support multiple independent read pointers.
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
  * A temporary file is used to handle the data if it exceeds the
  * space limit specified by the caller.
@@ -39,14 +40,13 @@
  */
 typedef struct Tuplestorestate Tuplestorestate;
 
-/*
- * Currently we only need to store MinimalTuples, but it would be easy
- * to support the same behavior for IndexTuples and/or bare Datums.
- */
-
 extern Tuplestorestate *tuplestore_begin_heap(bool randomAccess,
 											  bool interXact,
 											  int maxKBytes);
+extern Tuplestorestate *tuplestore_begin_datum(Oid datumType,
+											   bool randomAccess,
+											   bool interXact,
+											   int maxKBytes);
 
 extern void tuplestore_set_eflags(Tuplestorestate *state, int eflags);
 
@@ -55,6 +55,7 @@ extern void tuplestore_puttupleslot(Tuplestorestate *state,
 extern void tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple);
 extern void tuplestore_putvalues(Tuplestorestate *state, TupleDesc tdesc,
 								 const Datum *values, const bool *isnull);
+extern void tuplestore_putdatum(Tuplestorestate *state, Datum datum);
 
 extern int	tuplestore_alloc_read_pointer(Tuplestorestate *state, int eflags);
 
@@ -72,6 +73,8 @@ extern bool tuplestore_in_memory(Tuplestorestate *state);
 
 extern bool tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 									bool copy, TupleTableSlot *slot);
+extern bool tuplestore_getdatum(Tuplestorestate *state, bool forward,
+								bool *should_free, Datum *result);
 
 extern bool tuplestore_advance(Tuplestorestate *state, bool forward);
 
-- 
2.53.0



  [text/x-patch] v29-0007-Refresh-snapshot-periodically-during-index-valid.patch (21.3K, 7-v29-0007-Refresh-snapshot-periodically-during-index-valid.patch)
  download | inline diff:
From b95bc24f025a08577451c60d9995f73ca1ea4c2e Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 21 Apr 2025 14:11:53 +0200
Subject: [PATCH v29 7/7] Refresh snapshot periodically during index validation

Enhances validation phase of concurrently built indexes by periodically refreshing snapshots rather than using a single reference snapshot. This addresses issues with xmin propagation during long-running validations.

The validation now takes a fresh snapshot every few pages, allowing the xmin horizon to advance. This restores feature of commit d9d076222f5b, which was reverted in commit e28bb8851969. New STIR-based approach does not depend on single reference snapshot anymore.
---
 src/backend/access/heap/README.HOT       |  4 +-
 src/backend/access/heap/heapam_handler.c | 65 +++++++++++++++++++++++-
 src/backend/access/spgist/spgvacuum.c    | 12 +++--
 src/backend/catalog/index.c              | 50 +++++++++++++-----
 src/backend/commands/indexcmds.c         | 50 +++---------------
 src/include/access/tableam.h             | 25 ++++-----
 src/include/access/transam.h             | 15 ++++++
 src/include/catalog/index.h              |  2 +-
 8 files changed, 145 insertions(+), 78 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index b1c797517ee..382fe1723a5 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -401,12 +401,12 @@ live tuple.
 We mark the index open for inserts (but still not ready for reads) then
 we again wait for transactions which have the table open.  Then validate
 the index.  This searches for tuples missing from the index in auxiliary
-index, and inserts any missing ones if they are visible to reference snapshot.
+index, and inserts any missing ones if they are visible to a fresh snapshot.
 Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
-the second reference snapshot is finished.  This ensures that nobody is
+the latest used snapshot is finished.  This ensures that nobody is
 alive any longer who could need to see any tuples that might be missing
 from the index, as well as ensuring that no one can see any inconsistent
 rows in a broken HOT chain (the first condition is stronger than the
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2d60a65ac98..e2d6455be2f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2010,23 +2010,26 @@ heapam_index_validate_scan_read_stream_next(
 	return result;
 }
 
-static void
+static TransactionId
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
-						   Snapshot snapshot,
 						   ValidateIndexState *state,
 						   ValidateIndexState *auxState)
 {
+	TransactionId limitXmin;
+
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Snapshot		snapshot;
 	TupleTableSlot  *slot;
 	EState			*estate;
 	ExprContext		*econtext;
 	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	int				num_to_check;
+	BlockNumber		page_read_counter = 1; /* set to 1 to skip snapshot reset at start */
 	Tuplestorestate *tuples_for_check;
 	ValidateIndexScanState callback_private_data;
 
@@ -2037,6 +2040,8 @@ heapam_index_validate_scan(Relation heapRelation,
 	/* Use 10% of memory for tuple store. */
 	int		store_work_mem_part = maintenance_work_mem / 10;
 
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
@@ -2045,6 +2050,12 @@ heapam_index_validate_scan(Relation heapRelation,
 	 */
 	tuples_for_check =  tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
+	PopActiveSnapshot();
+	InvalidateCatalogSnapshot();
+
+	Assert(!HaveRegisteredOrActiveSnapshot());
+	Assert(!TransactionIdIsValid(MyProc->xmin));
+
 	/*
 	 * sanity checks
 	 */
@@ -2060,6 +2071,29 @@ heapam_index_validate_scan(Relation heapRelation,
 
 	state->tuplesort = auxState->tuplesort = NULL;
 
+	/*
+	 * Now take the first snapshot that will be used to filter candidate
+	 * tuples. We are going to replace it by newer snapshot every so often
+	 * to propagate horizon.
+	 *
+	 * Beware!  There might still be snapshots in use that treat some transaction
+	 * as in-progress that our temporary snapshot treats as committed.
+	 *
+	 * If such a recently-committed transaction deleted tuples in the table,
+	 * we will not include them in the index; yet those transactions which
+	 * see the deleting one as still-in-progress will expect such tuples to
+	 * be there once we mark the index as valid.
+	 *
+	 * We solve this by waiting for all endangered transactions to exit before
+	 * we mark the index as valid, for that reason limitXmin is supported.
+	 *
+	 * We also set ActiveSnapshot to this snap, since functions in indexes may
+	 * need a snapshot.
+	 */
+	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	PushActiveSnapshot(snapshot);
+	limitXmin = snapshot->xmin;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
@@ -2093,6 +2127,7 @@ heapam_index_validate_scan(Relation heapRelation,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		block_number = BufferGetBlockNumber(buf);
+		page_read_counter++;
 
 		i = 0;
 		while ((off = tuples[i]) != InvalidOffsetNumber)
@@ -2150,6 +2185,20 @@ heapam_index_validate_scan(Relation heapRelation,
 		}
 
 		ReleaseBuffer(buf);
+#define VALIDATE_INDEX_RESET_SNAPSHOT_EACH_N_PAGE 4096
+		if (page_read_counter % VALIDATE_INDEX_RESET_SNAPSHOT_EACH_N_PAGE == 0)
+		{
+			PopActiveSnapshot();
+			UnregisterSnapshot(snapshot);
+			/* to make sure we propagate xmin */
+			InvalidateCatalogSnapshot();
+			Assert(!TransactionIdIsValid(MyProc->xmin));
+
+			snapshot = RegisterSnapshot(GetLatestSnapshot());
+			PushActiveSnapshot(snapshot);
+			/* xmin should not go backwards, but just in case */
+			limitXmin = TransactionIdNewer(limitXmin, snapshot->xmin);
+		}
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -2159,11 +2208,23 @@ heapam_index_validate_scan(Relation heapRelation,
 	read_stream_end(read_stream);
 	tuplestore_end(tuples_for_check);
 
+	/*
+	 * Drop the latest snapshot.  We must do this before waiting out other
+	 * snapshot holders, else we will deadlock against other processes also
+	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
+	 * they must wait for.
+	 */
+	PopActiveSnapshot();
+	UnregisterSnapshot(snapshot);
+	InvalidateCatalogSnapshot();
+	Assert(MyProc->xmin == InvalidTransactionId);
 	FreeAccessStrategy(bstrategy);
 
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
+
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..7ea60c18e6f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -191,14 +191,16 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 			 * Add target TID to pending list if the redirection could have
 			 * happened since VACUUM started.  (If xid is invalid, assume it
 			 * must have happened before VACUUM started, since REINDEX
-			 * CONCURRENTLY locks out VACUUM.)
+			 * CONCURRENTLY locks out VACUUM, if myXmin is invalid it is
+			 * validation scan.)
 			 *
 			 * Note: we could make a tighter test by seeing if the xid is
 			 * "running" according to the active snapshot; but snapmgr.c
 			 * doesn't currently export a suitable API, and it's not entirely
 			 * clear that a tighter test is worth the cycles anyway.
 			 */
-			if (TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
+			if (!TransactionIdIsValid(bds->myXmin) ||
+					TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
 				spgAddPendingTID(bds, &dt->pointer);
 		}
 		else
@@ -808,7 +810,6 @@ spgvacuumscan(spgBulkDeleteState *bds)
 	/* Finish setting up spgBulkDeleteState */
 	initSpGistState(&bds->spgstate, index);
 	bds->pendingList = NULL;
-	bds->myXmin = GetActiveSnapshot()->xmin;
 	bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
 
 	/*
@@ -959,6 +960,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	bds.stats = stats;
 	bds.callback = callback;
 	bds.callback_state = callback_state;
+	if (info->validate_index)
+		bds.myXmin = InvalidTransactionId;
+	else
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 	spgvacuumscan(&bds);
 
@@ -999,6 +1004,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.stats = stats;
 		bds.callback = dummy_callback;
 		bds.callback_state = NULL;
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 		spgvacuumscan(&bds);
 	}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ed563da5a32..222d0b1e77c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3511,8 +3511,9 @@ IndexCheckExclusion(Relation heapRelation,
  * insert their new tuples into it. At the same moment we clear "indisready" for
  * auxiliary index, since it is no more required to be updated.
  *
- * We then take a new reference snapshot, any tuples that are valid according
- * to this snap, but are not in the index, must be added to the index.
+ * We then take a new snapshot, any tuples that are valid according
+ * to this snap, but are not in the index, must be added to the index. In
+ * order to propagate xmin we reset that snapshot every so often.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
  * the snap need not be indexed, because we will wait out all transactions
@@ -3525,7 +3526,7 @@ IndexCheckExclusion(Relation heapRelation,
  * TIDs of both auxiliary and target indexes, and doing a "merge join" against
  * the TID lists to see which tuples from auxiliary index are missing from the
  * target index.  Thus we will ensure that all tuples valid according to the
- * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * latest snapshot are in the index. Notice we need to do bulkdelete in the
  * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
@@ -3546,13 +3547,14 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * Also, some actions to concurrent drop the auxiliary index are performed.
  */
-void
-validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
+TransactionId
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId)
 {
 	Relation	heapRelation,
 				indexRelation,
 				auxIndexRelation;
 	IndexInfo  *indexInfo;
+	TransactionId limitXmin;
 	IndexVacuumInfo ivinfo, auxivinfo;
 	ValidateIndexState state, auxState;
 	Oid			save_userid;
@@ -3602,8 +3604,12 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	 * Fetch info needed for index_insert.  (You might think this should be
 	 * passed in from DefineIndex, but its copy is long gone due to having
 	 * been built in a previous transaction.)
+	 *
+	 * We might need snapshot for index expressions or predicates.
 	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	indexInfo = BuildIndexInfo(indexRelation);
+	PopActiveSnapshot();
 
 	/* mark build is concurrent just for consistency */
 	indexInfo->ii_Concurrent = true;
@@ -3639,6 +3645,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 										   NULL, TUPLESORT_NONE);
 	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	(void) index_bulk_delete(&auxivinfo, NULL,
 							 validate_index_callback, &auxState);
 	/* If aux index is empty, merge may be skipped */
@@ -3658,7 +3667,13 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		index_close(indexRelation, NoLock);
 		table_close(heapRelation, NoLock);
 
-		return;
+		PushActiveSnapshot(GetTransactionSnapshot());
+		limitXmin = GetActiveSnapshot()->xmin;
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		Assert(!TransactionIdIsValid(MyProc->xmin));
+		return limitXmin;
 	}
 
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
@@ -3667,6 +3682,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	/* ambulkdelete updates progress metrics */
 	(void) index_bulk_delete(&ivinfo, NULL,
 							 validate_index_callback, &state);
@@ -3686,19 +3704,24 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	tuplesort_performsort(auxState.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+	Assert(!TransactionIdIsValid(MyProc->xmin));
 
 	/*
 	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
-	table_index_validate_scan(heapRelation,
-							  indexRelation,
-							  indexInfo,
-							  snapshot,
-							  &state,
-							  &auxState);
+	limitXmin = table_index_validate_scan(heapRelation,
+										  indexRelation,
+										  indexInfo,
+										  &state,
+										  &auxState);
 
 	/* Tuple sort closed by table_index_validate_scan */
 	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
@@ -3721,6 +3744,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
+
+	Assert(!TransactionIdIsValid(MyProc->xmin));
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 599e3375833..7f8adbdbda3 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -595,7 +595,6 @@ DefineIndex(ParseState *pstate,
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -1813,32 +1812,11 @@ DefineIndex(ParseState *pstate,
 	/* Tell concurrent index builds to ignore us, if index qualifies */
 	if (safe_index)
 		set_indexsafe_procflags();
-
-	/*
-	 * Now take the "reference snapshot" that will be used by validate_index()
-	 * to filter candidate tuples.  Beware!  There might still be snapshots in
-	 * use that treat some transaction as in-progress that our reference
-	 * snapshot treats as committed.  If such a recently-committed transaction
-	 * deleted tuples in the table, we will not include them in the index; yet
-	 * those transactions which see the deleting one as still-in-progress will
-	 * expect such tuples to be there once we mark the index as valid.
-	 *
-	 * We solve this by waiting for all endangered transactions to exit before
-	 * we mark the index as valid.
-	 *
-	 * We also set ActiveSnapshot to this snap, since functions in indexes may
-	 * need a snapshot.
-	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
-	PushActiveSnapshot(snapshot);
 	/*
 	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
-	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
-	limitXmin = snapshot->xmin;
+	limitXmin = validate_index(tableId, indexRelationId, auxIndexRelationId);
 
-	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1860,8 +1838,8 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * The index is now valid in the sense that it contains all currently
 	 * interesting tuples.  But since it might not contain tuples deleted just
-	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.
+	 * before the last snapshot during validating was taken, we have to wait
+	 * out any transactions that might have older snapshots.
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
@@ -4426,7 +4404,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
-		Snapshot	snapshot;
 
 		StartTransactionCommand();
 
@@ -4441,13 +4418,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		if (newidx->safe)
 			set_indexsafe_procflags();
 
-		/*
-		 * Take the "reference snapshot" that will be used by validate_index()
-		 * to filter candidate tuples.
-		 */
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
-		PushActiveSnapshot(snapshot);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
@@ -4459,16 +4429,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
-
-		/*
-		 * We can now do away with our active snapshot, we still need to save
-		 * the xmin limit to wait for older snapshots.
-		 */
-		limitXmin = snapshot->xmin;
-
-		PopActiveSnapshot();
-		UnregisterSnapshot(snapshot);
+		limitXmin = validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId);
+		Assert(!TransactionIdIsValid(MyProc->xmin));
 
 		/*
 		 * To ensure no deadlocks, we must commit and start yet another
@@ -4481,7 +4443,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		/*
 		 * The index is now valid in the sense that it contains all currently
 		 * interesting tuples.  But since it might not contain tuples deleted
-		 * just before the reference snap was taken, we have to wait out any
+		 * just before the latest snap was taken, we have to wait out any
 		 * transactions that might have older snapshots.
 		 *
 		 * Because we don't take a snapshot or Xid in this transaction,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7c37cb5a248..7347eb52153 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -702,12 +702,11 @@ typedef struct TableAmRoutine
 										   TableScanDesc scan);
 
 	/* see table_index_validate_scan for reference about parameters */
-	void		(*index_validate_scan) (Relation table_rel,
-										Relation index_rel,
-										IndexInfo *index_info,
-										Snapshot snapshot,
-										ValidateIndexState *state,
-										ValidateIndexState *aux_state);
+	TransactionId		(*index_validate_scan) (Relation table_rel,
+												Relation index_rel,
+												IndexInfo *index_info,
+												ValidateIndexState *state,
+												ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1830,20 +1829,18 @@ table_index_build_range_scan(Relation table_rel,
  * Note: it is responsibility of that function to close sortstates in
  * both `state` and `auxstate`.
  */
-static inline void
+static inline TransactionId
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
-						  Snapshot snapshot,
 						  ValidateIndexState *state,
 						  ValidateIndexState *auxstate)
 {
-	table_rel->rd_tableam->index_validate_scan(table_rel,
-											   index_rel,
-											   index_info,
-											   snapshot,
-											   state,
-											   auxstate);
+	return table_rel->rd_tableam->index_validate_scan(table_rel,
+													  index_rel,
+													  index_info,
+													  state,
+													  auxstate);
 }
 
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 6fa91bfcdc0..b33084cb91a 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -417,6 +417,21 @@ NormalTransactionIdOlder(TransactionId a, TransactionId b)
 	return b;
 }
 
+/* return the newer of the two IDs */
+static inline TransactionId
+TransactionIdNewer(TransactionId a, TransactionId b)
+{
+	if (!TransactionIdIsValid(a))
+		return b;
+
+	if (!TransactionIdIsValid(b))
+		return a;
+
+	if (TransactionIdFollows(a, b))
+		return a;
+	return b;
+}
+
 /* return the newer of the two IDs */
 static inline FullTransactionId
 FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 37a390d33de..d1f6411dd78 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -152,7 +152,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
+extern TransactionId validate_index(Oid heapId, Oid indexId, Oid auxIndexId);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
-- 
2.53.0



  [text/x-patch] v29-0006-Optimize-auxiliary-index-handling.patch (2.1K, 8-v29-0006-Optimize-auxiliary-index-handling.patch)
  download | inline diff:
From db97c52e46924bc7033b359a6ebb45fe74763b20 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 30 Dec 2024 16:37:12 +0100
Subject: [PATCH v29 6/7] Optimize auxiliary index handling

Skip unnecessary computations for auxiliary indices by:
- in the index-insert path, detect auxiliary indexes and bypass Datum value computation
- set indexUnchanged=false for auxiliary indices to avoid redundant checks

These optimizations reduce overhead during concurrent index operations.
---
 src/backend/catalog/index.c         | 11 +++++++++++
 src/backend/executor/execIndexing.c |  5 ++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 91125d37150..ed563da5a32 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2914,6 +2914,17 @@ FormIndexDatum(IndexInfo *indexInfo,
 	ListCell   *indexpr_item;
 	int			i;
 
+	/* Auxiliary index does not need any values to be computed */
+	if (unlikely(indexInfo->ii_Auxiliary))
+	{
+		for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+		{
+			values[i] = PointerGetDatum(NULL);
+			isnull[i] = true;
+		}
+		return;
+	}
+
 	if (indexInfo->ii_Expressions != NIL &&
 		indexInfo->ii_ExpressionsState == NIL)
 	{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..ce76a213556 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -438,8 +438,11 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
 		 * There's definitely going to be an index_insert() call for this
 		 * index.  If we're being called as part of an UPDATE statement,
 		 * consider if the 'indexUnchanged' = true hint should be passed.
+		 *
+		 * For auxiliary indexes, always pass false to skip value comparison checks,
+		 * since auxiliary indexes only store TIDs and don't track value changes.
 		 */
-		indexUnchanged = ((flags & EIIT_IS_UPDATE) &&
+		indexUnchanged = ((flags & EIIT_IS_UPDATE) && likely(!indexInfo->ii_Auxiliary) &&
 						  index_unchanged_by_update(resultRelInfo,
 													estate,
 													indexInfo,
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-03-09 00:09  Mihail Nikalayeu <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Mihail Nikalayeu @ 2026-03-09 00:09 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; +Cc: Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Rebased.


Attachments:

  [application/x-patch] v30-0006-Optimize-auxiliary-index-handling.patch (2.1K, 2-v30-0006-Optimize-auxiliary-index-handling.patch)
  download | inline diff:
From 056bb42e7c7d6f2cdd867a58ecba1ffb514d846e Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 30 Dec 2024 16:37:12 +0100
Subject: [PATCH v30 6/7] Optimize auxiliary index handling

Skip unnecessary computations for auxiliary indices by:
- in the index-insert path, detect auxiliary indexes and bypass Datum value computation
- set indexUnchanged=false for auxiliary indices to avoid redundant checks

These optimizations reduce overhead during concurrent index operations.
---
 src/backend/catalog/index.c         | 11 +++++++++++
 src/backend/executor/execIndexing.c |  5 ++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 91125d37150..ed563da5a32 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2914,6 +2914,17 @@ FormIndexDatum(IndexInfo *indexInfo,
 	ListCell   *indexpr_item;
 	int			i;
 
+	/* Auxiliary index does not need any values to be computed */
+	if (unlikely(indexInfo->ii_Auxiliary))
+	{
+		for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+		{
+			values[i] = PointerGetDatum(NULL);
+			isnull[i] = true;
+		}
+		return;
+	}
+
 	if (indexInfo->ii_Expressions != NIL &&
 		indexInfo->ii_ExpressionsState == NIL)
 	{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..ce76a213556 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -438,8 +438,11 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
 		 * There's definitely going to be an index_insert() call for this
 		 * index.  If we're being called as part of an UPDATE statement,
 		 * consider if the 'indexUnchanged' = true hint should be passed.
+		 *
+		 * For auxiliary indexes, always pass false to skip value comparison checks,
+		 * since auxiliary indexes only store TIDs and don't track value changes.
 		 */
-		indexUnchanged = ((flags & EIIT_IS_UPDATE) &&
+		indexUnchanged = ((flags & EIIT_IS_UPDATE) && likely(!indexInfo->ii_Auxiliary) &&
 						  index_unchanged_by_update(resultRelInfo,
 													estate,
 													indexInfo,
-- 
2.43.0



  [application/x-patch] v30-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch (36.3K, 3-v30-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch)
  download | inline diff:
From ff9edd2c5143d299879a1fb2aff3dab0c8b04420 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sun, 11 Jan 2026 19:27:52 +0300
Subject: [PATCH v30 2/7] Add STIR access method and flags related to auxiliary
 indexes

This patch provides infrastructure for following enhancements to concurrent index builds by:
- ii_Auxiliary in IndexInfo: indicates that an index is an auxiliary index used during concurrent index build
- validate_index in IndexVacuumInfo: set if index_bulk_delete called during the validation phase of concurrent index build
- STIR (Short-Term Index Replacement) access method is introduced, intended solely for short-lived, auxiliary usage

STIR functions are designed as an ephemeral helper during concurrent index builds, temporarily storing TIDs without providing the full features of a typical access method. As such, it raises warnings or errors when accessed outside its specialized usage path.

Planned to be used in following commits.
---
 contrib/pgstattuple/pgstattuple.c        |   3 +
 src/backend/access/Makefile              |   1 +
 src/backend/access/heap/vacuumlazy.c     |   2 +
 src/backend/access/meson.build           |   1 +
 src/backend/access/stir/Makefile         |  18 +
 src/backend/access/stir/meson.build      |   5 +
 src/backend/access/stir/stir.c           | 559 +++++++++++++++++++++++
 src/backend/catalog/index.c              |   1 +
 src/backend/catalog/toasting.c           |   1 +
 src/backend/commands/analyze.c           |   1 +
 src/backend/commands/vacuumparallel.c    |   1 +
 src/backend/nodes/makefuncs.c            |   1 +
 src/include/access/genam.h               |   1 +
 src/include/access/reloptions.h          |   3 +-
 src/include/access/stir.h                | 113 +++++
 src/include/catalog/pg_am.dat            |   3 +
 src/include/catalog/pg_opclass.dat       |   4 +
 src/include/catalog/pg_opfamily.dat      |   2 +
 src/include/catalog/pg_proc.dat          |   4 +
 src/include/nodes/execnodes.h            |   7 +-
 src/include/utils/index_selfuncs.h       |   8 +
 src/test/regress/expected/amutils.out    |   8 +-
 src/test/regress/expected/opr_sanity.out |   7 +-
 src/test/regress/expected/psql.out       |  24 +-
 24 files changed, 760 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/stir/Makefile
 create mode 100644 src/backend/access/stir/meson.build
 create mode 100644 src/backend/access/stir/stir.c
 create mode 100644 src/include/access/stir.h

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 6a7f8cb4a7c..5b5984e3aa2 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -285,6 +285,9 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 			case SPGIST_AM_OID:
 				err = "spgist index";
 				break;
+			case STIR_AM_OID:
+				err = "stir index";
+				break;
 			case BRIN_AM_OID:
 				err = "brin index";
 				break;
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index e88d72ea039..ebbcfa90715 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -19,6 +19,7 @@ SUBDIRS	    = \
 	nbtree \
 	rmgrdesc \
 	spgist \
+	stir \
 	sequence \
 	table \
 	tablesample \
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..f1785b9a456 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3138,6 +3138,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
@@ -3189,6 +3190,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
diff --git a/src/backend/access/meson.build b/src/backend/access/meson.build
index 5fd18de74f9..7219c65f365 100644
--- a/src/backend/access/meson.build
+++ b/src/backend/access/meson.build
@@ -11,6 +11,7 @@ subdir('nbtree')
 subdir('rmgrdesc')
 subdir('sequence')
 subdir('spgist')
+subdir('stir')
 subdir('table')
 subdir('tablesample')
 subdir('transam')
diff --git a/src/backend/access/stir/Makefile b/src/backend/access/stir/Makefile
new file mode 100644
index 00000000000..8785dab37bd
--- /dev/null
+++ b/src/backend/access/stir/Makefile
@@ -0,0 +1,18 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/stir
+#
+# IDENTIFICATION
+#    src/backend/access/stir/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/stir
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	stir.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/stir/meson.build b/src/backend/access/stir/meson.build
new file mode 100644
index 00000000000..4b7ad15346c
--- /dev/null
+++ b/src/backend/access/stir/meson.build
@@ -0,0 +1,5 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+backend_sources += files(
+	'stir.c',
+)
diff --git a/src/backend/access/stir/stir.c b/src/backend/access/stir/stir.c
new file mode 100644
index 00000000000..e550d8892e6
--- /dev/null
+++ b/src/backend/access/stir/stir.c
@@ -0,0 +1,559 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.c
+ *	  Implementation of Short-Term Index Replacement.
+ *
+ * STIR is a specialized access method type designed for temporary storage
+ * of TID values during concurrent index build operations.
+ *
+ * The typical lifecycle of a STIR index is:
+ * 1. created as an auxiliary index for CIC/RIC
+ * 2. accepts inserts for a period
+ * 3. stirbulkdelete called during index validation phase
+ * 4. gets dropped
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/stir/stir.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/stir.h"
+#include "miscadmin.h"
+#include "access/amvalidate.h"
+#include "access/htup_details.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_amop.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_opfamily.h"
+#include "commands/vacuum.h"
+#include "storage/bufmgr.h"
+#include "utils/catcache.h"
+#include "utils/fmgrprotos.h"
+#include "utils/index_selfuncs.h"
+#include "utils/memutils.h"
+#include "utils/regproc.h"
+#include "utils/syscache.h"
+
+/*
+ * Stir handler function: return IndexAmRoutine with access method parameters
+ * and callbacks.
+ */
+Datum
+stirhandler(PG_FUNCTION_ARGS)
+{
+	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
+
+	/* Set STIR-specific strategy and procedure numbers */
+	amroutine->amstrategies = STIR_NSTRATEGIES;
+	amroutine->amsupport = STIR_NPROC;
+	amroutine->amoptsprocnum = STIR_OPTIONS_PROC;
+
+	/* STIR doesn't support most index operations */
+	amroutine->amcanorder = false;
+	amroutine->amcanorderbyop = false;
+	amroutine->amcanbackward = false;
+	amroutine->amcanunique = false;
+	amroutine->amcanmulticol = true;
+	amroutine->amoptionalkey = true;
+	amroutine->amsearcharray = false;
+	amroutine->amsearchnulls = false;
+	amroutine->amstorage = false;
+	amroutine->amclusterable = false;
+	amroutine->ampredlocks = false;
+	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
+	amroutine->amcaninclude = true;
+	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amparallelvacuumoptions =
+			VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_CLEANUP;
+	amroutine->amkeytype = InvalidOid;
+
+	/* Set up function callbacks */
+	amroutine->ambuild = stirbuild;
+	amroutine->ambuildempty = stirbuildempty;
+	amroutine->aminsert = stirinsert;
+	amroutine->aminsertcleanup = NULL;
+	amroutine->ambulkdelete = stirbulkdelete;
+	amroutine->amvacuumcleanup = stirvacuumcleanup;
+	amroutine->amcanreturn = NULL;
+	amroutine->amcostestimate = stircostestimate;
+	amroutine->amoptions = stiroptions;
+	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
+	amroutine->amvalidate = stirvalidate;
+	amroutine->amadjustmembers = NULL;
+	amroutine->ambeginscan = stirbeginscan;
+	amroutine->amrescan = stirrescan;
+	amroutine->amgettuple = NULL;
+	amroutine->amgetbitmap = NULL;
+	amroutine->amendscan = stirendscan;
+	amroutine->ammarkpos = NULL;
+	amroutine->amrestrpos = NULL;
+	amroutine->amestimateparallelscan = NULL;
+	amroutine->aminitparallelscan = NULL;
+	amroutine->amparallelrescan = NULL;
+
+	PG_RETURN_POINTER(amroutine);
+}
+
+/*
+ * Validates operator class for STIR index.
+ *
+ * STIR is not a real index, so validate may be skipped.
+ * But we do it just for consistency.
+ */
+bool
+stirvalidate(Oid opclassoid)
+{
+	bool result = true;
+	HeapTuple classtup;
+	Form_pg_opclass classform;
+	Oid opfamilyoid;
+	HeapTuple familytup;
+	Form_pg_opfamily familyform;
+	char *opfamilyname;
+	CatCList *oprlist;
+	int i;
+
+	/* Fetch opclass information */
+	classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
+	if (!HeapTupleIsValid(classtup))
+		elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
+	classform = (Form_pg_opclass) GETSTRUCT(classtup);
+
+	opfamilyoid = classform->opcfamily;
+
+	/* Fetch opfamily information */
+	familytup = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfamilyoid));
+	if (!HeapTupleIsValid(familytup))
+		elog(ERROR, "cache lookup failed for operator family %u", opfamilyoid);
+	familyform = (Form_pg_opfamily) GETSTRUCT(familytup);
+
+	opfamilyname = NameStr(familyform->opfname);
+
+	/* Fetch all operators and support functions of the opfamily */
+	oprlist = SearchSysCacheList1(AMOPSTRATEGY, ObjectIdGetDatum(opfamilyoid));
+
+	/* Check individual operators */
+	for (i = 0; i < oprlist->n_members; i++)
+	{
+		HeapTuple oprtup = &oprlist->members[i]->tuple;
+		Form_pg_amop oprform = (Form_pg_amop) GETSTRUCT(oprtup);
+
+		/* Check it's allowed strategy for stir */
+		if (oprform->amopstrategy < 1 ||
+			oprform->amopstrategy > STIR_NSTRATEGIES)
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with invalid strategy number %d",
+					        opfamilyname,
+					        format_operator(oprform->amopopr),
+					        oprform->amopstrategy)));
+			result = false;
+		}
+
+		/* stir doesn't support ORDER BY operators */
+		if (oprform->amoppurpose != AMOP_SEARCH ||
+			OidIsValid(oprform->amopsortfamily))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains invalid ORDER BY specification for operator %s",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+
+		/* Check operator signature --- same for all stir strategies */
+		if (!check_amop_signature(oprform->amopopr, BOOLOID,
+		                          oprform->amoplefttype,
+		                          oprform->amoprighttype))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with wrong signature",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+	}
+
+	ReleaseCatCacheList(oprlist);
+	ReleaseSysCache(familytup);
+	ReleaseSysCache(classtup);
+
+	return result;
+}
+
+/*
+ * Initialize meta-page of a STIR index.
+ * The skipInserts flag determines if new inserts will be accepted or skipped.
+ */
+void
+StirFillMetapage(Relation index, Page metaPage, bool skipInserts)
+{
+	StirMetaPageData *metadata;
+
+	StirInitPage(metaPage, STIR_META);
+	metadata = StirPageGetMeta(metaPage);
+	memset(metadata, 0, sizeof(StirMetaPageData));
+	metadata->magicNumber = STIR_MAGIC_NUMBER;
+	metadata->skipInserts = skipInserts;
+	((PageHeader) metaPage)->pd_lower = ((char *) metadata + sizeof(StirMetaPageData)) - (char *) metaPage;
+}
+
+/*
+ * Create and initialize the metapage for a STIR index.
+ * This is called during index creation.
+ */
+void
+StirInitMetapage(Relation index, ForkNumber forknum)
+{
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	/*
+	 * Make a new page; since it is the first page it should be associated with
+	 * block number 0 (STIR_METAPAGE_BLKNO).  No need to hold the extension
+	 * lock because there cannot be concurrent inserters yet.
+	 */
+	metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	START_CRIT_SECTION();
+	Assert(BufferGetBlockNumber(metaBuffer) == STIR_METAPAGE_BLKNO);
+
+	metaPage = BufferGetPage(metaBuffer);
+	StirFillMetapage(index, metaPage, forknum == INIT_FORKNUM);
+
+	MarkBufferDirty(metaBuffer);
+	END_CRIT_SECTION();
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * Initialize any page of a stir index.
+ */
+void
+StirInitPage(Page page, uint16 flags)
+{
+	StirPageOpaque opaque;
+
+	PageInit(page, BLCKSZ, sizeof(StirPageOpaqueData));
+
+	opaque = StirPageGetOpaque(page);
+	opaque->flags = flags;
+	opaque->stir_page_id = STIR_PAGE_ID;
+}
+
+/*
+ * Add a tuple to a STIR page. Returns false if the tuple doesn't fit.
+ * The tuple is added to the end of the page.
+ */
+static bool
+StirPageAddItem(Page page, StirTuple *tuple)
+{
+	StirTuple *itup;
+	StirPageOpaque opaque;
+	char *ptr;
+
+	/* We shouldn't be pointed to an invalid page */
+	Assert(!PageIsNew(page));
+
+	/* Does the new tuple fit on the page? */
+	if (StirPageGetFreeSpace(page) < sizeof(StirTuple))
+		return false;
+
+	/* Copy a new tuple to the end of the page */
+	opaque = StirPageGetOpaque(page);
+	itup = StirPageGetTuple(page, opaque->maxoff + 1);
+	memcpy(itup, tuple, sizeof(StirTuple));
+
+	/* Adjust maxoff and pd_lower */
+	opaque->maxoff++;
+	ptr = (char *) StirPageGetTuple(page, opaque->maxoff + 1);
+	((PageHeader) page)->pd_lower = ptr - page;
+
+	/* Assert we didn't overrun available space */
+	Assert(((PageHeader) page)->pd_lower <= ((PageHeader) page)->pd_upper);
+	return true;
+}
+
+/*
+ * Insert a new tuple into a STIR index.
+ */
+bool
+stirinsert(Relation index, Datum *values, bool *isnull,
+		  ItemPointer ht_ctid, Relation heapRel,
+		  IndexUniqueCheck checkUnique,
+		  bool indexUnchanged,
+		  struct IndexInfo *indexInfo)
+{
+	StirTuple itup;
+	StirMetaPageData *metaData;
+	Buffer buffer,
+			metaBuffer;
+	Page page;
+	BlockNumber blkNo;
+
+	itup.heapPtr = *ht_ctid;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+
+	for (;;)
+	{
+		LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if inserts are allowed */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+		blkNo = metaData->lastBlkNo;
+		/* Don't hold metabuffer lock while doing insert */
+		LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+
+		if (blkNo > 0)
+		{
+			buffer = ReadBuffer(index, blkNo);
+			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+			START_CRIT_SECTION();
+
+			page = BufferGetPage(buffer);
+
+			Assert(!PageIsNew(page));
+
+			/* Try to add tuple to the existing page */
+			if (StirPageAddItem(page, &itup))
+			{
+				/* Success!  Apply the change, clean up, and exit */
+				MarkBufferDirty(buffer);
+				END_CRIT_SECTION();
+
+				UnlockReleaseBuffer(buffer);
+				ReleaseBuffer(metaBuffer);
+				return false;
+			}
+
+			END_CRIT_SECTION();
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Need to add a new page - get exclusive lock on meta-page */
+		LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if another backend already extended the index */
+
+		if (blkNo != metaData->lastBlkNo)
+		{
+			Assert(blkNo < metaData->lastBlkNo);
+			/* Someone else inserted the new page into the index, let's try again */
+			LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+			continue;
+		}
+		else
+		{
+			/* Must extend the file */
+			buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
+									   EB_LOCK_FIRST);
+			page = BufferGetPage(buffer);
+			START_CRIT_SECTION();
+
+			StirInitPage(page, 0);
+
+			if (!StirPageAddItem(page, &itup))
+			{
+				/* We shouldn't be here since we're inserting to an empty page */
+				elog(ERROR, "could not add new stir tuple to empty page");
+			}
+
+			/* Update meta-page with new last block number */
+			metaData->lastBlkNo = BufferGetBlockNumber(buffer);
+
+			MarkBufferDirty(metaBuffer);
+			MarkBufferDirty(buffer);
+
+			END_CRIT_SECTION();
+
+			UnlockReleaseBuffer(buffer);
+			UnlockReleaseBuffer(metaBuffer);
+
+			return false;
+		}
+	}
+}
+
+/*
+ * STIR doesn't support scans - these functions all error out
+ */
+IndexScanDesc
+stirbeginscan(Relation r, int nkeys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void
+stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+		  ScanKey orderbys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void stirendscan(IndexScanDesc scan)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+/*
+ * Build a STIR index - only allowed for auxiliary indexes.
+ * Just initializes the meta-page without any heap scans.
+ */
+IndexBuildResult *
+stirbuild(Relation heap, Relation index,
+						   struct IndexInfo *indexInfo)
+{
+	IndexBuildResult *result;
+
+	if (!indexInfo->ii_Auxiliary)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Building STIR indexes is not supported")));
+
+	StirInitMetapage(index, MAIN_FORKNUM);
+
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	result->heap_tuples = 0;
+	result->index_tuples = 0;
+	return result;
+}
+
+void stirbuildempty(Relation index)
+{
+	StirInitMetapage(index, INIT_FORKNUM);
+}
+
+IndexBulkDeleteResult *
+stirbulkdelete(IndexVacuumInfo *info,
+									 IndexBulkDeleteResult *stats,
+									 IndexBulkDeleteCallback callback,
+									 void *callback_state)
+{
+	Relation index = info->index;
+	BlockNumber blkno, npages;
+	Buffer buffer;
+	Page page;
+
+	/* For normal VACUUM, mark to skip inserts and warn about an index drop needed */
+	if (!info->validate_index)
+	{
+		StirMarkAsSkipInserts(index);
+
+		ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+		return NULL;
+	}
+
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	/*
+	 * Iterate over the pages. We don't care about concurrently added pages,
+	 * because the index is marked as not-ready for that moment and the index is not
+	 * used for insert.
+	 */
+	npages = RelationGetNumberOfBlocks(index);
+	for (blkno = STIR_HEAD_BLKNO; blkno < npages; blkno++)
+	{
+		StirTuple *itup, *itupEnd;
+
+		vacuum_delay_point(false);
+
+		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
+									RBM_NORMAL, info->strategy);
+
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+		page = BufferGetPage(buffer);
+
+		if (PageIsNew(page))
+		{
+			UnlockReleaseBuffer(buffer);
+			continue;
+		}
+
+		itup = StirPageGetTuple(page, FirstOffsetNumber);
+		itupEnd = StirPageGetTuple(page, OffsetNumberNext(StirPageGetMaxOffset(page)));
+		while (itup < itupEnd)
+		{
+			/* Do we have to delete this tuple? */
+			if (callback(&itup->heapPtr, callback_state))
+			{
+				ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("we never delete in stir")));
+			}
+
+			itup = StirPageGetNextTuple(itup);
+		}
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	return stats;
+}
+
+/*
+ * Mark a STIR index to skip future inserts
+ */
+void
+StirMarkAsSkipInserts(Relation index)
+{
+	StirMetaPageData *metaData;
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	START_CRIT_SECTION();
+
+	metaPage = BufferGetPage(metaBuffer);
+	metaData = StirPageGetMeta(metaPage);
+
+	if (!metaData->skipInserts)
+	{
+		metaData->skipInserts = true;
+		MarkBufferDirty(metaBuffer);
+	}
+	END_CRIT_SECTION();
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+IndexBulkDeleteResult *
+stirvacuumcleanup(IndexVacuumInfo *info,
+				  IndexBulkDeleteResult *stats)
+{
+	StirMarkAsSkipInserts(info->index);
+	ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+	return NULL;
+}
+
+bytea *
+stiroptions(Datum reloptions, bool validate)
+{
+	return NULL;
+}
+
+void
+stircostestimate(PlannerInfo *root, IndexPath *path,
+					 double loop_count, Cost *indexStartupCost,
+					 Cost *indexTotalCost, Selectivity *indexSelectivity,
+					 double *indexCorrelation, double *indexPages)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 43de42ce39e..1325f3d9700 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3409,6 +3409,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
+	ivinfo.validate_index = true;
 
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index c78dcea98c1..87e01e74ad7 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -307,6 +307,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_ParallelWorkers = 0;
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
+	indexInfo->ii_Auxiliary = false;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 53adac9139b..f81dd30df24 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -725,6 +725,7 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 			ivinfo.message_level = elevel;
 			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
 			ivinfo.strategy = vac_strategy;
+			ivinfo.validate_index = false;
 
 			stats = index_vacuum_cleanup(&ivinfo, NULL);
 
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 279108ca89f..dfdccfaf991 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -885,6 +885,7 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
 	ivinfo.estimated_count = pvs->shared->estimated_count;
 	ivinfo.num_heap_tuples = pvs->shared->reltuples;
 	ivinfo.strategy = pvs->bstrategy;
+	ivinfo.validate_index = false;
 
 	/* Update error traceback information */
 	pvs->indname = pstrdup(RelationGetRelationName(indrel));
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 2caec621d73..09ae445694d 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -875,6 +875,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
+	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..cd467582731 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -56,6 +56,7 @@ typedef struct IndexVacuumInfo
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
 	double		num_heap_tuples;	/* tuples remaining in heap */
+	bool		validate_index; /* validating concurrently built index? */
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index 0bd17b30ca7..e2966165e6f 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -52,8 +52,9 @@ typedef enum relopt_kind
 	RELOPT_KIND_VIEW = (1 << 9),
 	RELOPT_KIND_BRIN = (1 << 10),
 	RELOPT_KIND_PARTITIONED = (1 << 11),
+	RELOPT_KIND_STIR = (1 << 12),
 	/* if you add a new kind, make sure you update "last_default" too */
-	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_STIR,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/access/stir.h b/src/include/access/stir.h
new file mode 100644
index 00000000000..18ee36506fd
--- /dev/null
+++ b/src/include/access/stir.h
@@ -0,0 +1,113 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.h
+ *	  header file for postgres stir access method implementation.
+ *
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * src/include/access/stir.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STIR_H
+#define STIR_H
+
+#include "access/amapi.h"
+#include "access/xlog.h"
+#include "access/generic_xlog.h"
+#include "access/itup.h"
+#include "nodes/pathnodes.h"
+
+/* Support procedures numbers */
+#define STIR_NPROC				0
+
+/* Scan strategies */
+#define STIR_NSTRATEGIES		1
+
+#define STIR_OPTIONS_PROC				0
+
+/* Macros for accessing stir page structures */
+#define StirPageGetOpaque(page) ((StirPageOpaque) PageGetSpecialPointer(page))
+#define StirPageGetMaxOffset(page) (StirPageGetOpaque(page)->maxoff)
+#define StirPageIsMeta(page) \
+	((StirPageGetOpaque(page)->flags & STIR_META) != 0)
+#define StirPageGetData(page)		((StirTuple *)PageGetContents(page))
+#define StirPageGetTuple(page, offset) \
+	((StirTuple *)(PageGetContents(page) \
+		+ sizeof(StirTuple) * ((offset) - 1)))
+#define StirPageGetNextTuple(tuple) \
+	((StirTuple *)((char *)(tuple) + sizeof(StirTuple)))
+
+
+
+/* Preserved page numbers */
+#define STIR_METAPAGE_BLKNO	(0)
+#define STIR_HEAD_BLKNO		(1) /* first data page */
+
+
+/* Opaque for stir pages */
+typedef struct StirPageOpaqueData
+{
+	OffsetNumber maxoff;		/* number of index tuples on the page */
+	uint16		flags;			/* see bit definitions below */
+	uint16		stir_page_id;	/* for identification of STIR indexes */
+} StirPageOpaqueData;
+
+/* Stir page flags */
+#define STIR_META		(1<<0)
+
+typedef StirPageOpaqueData *StirPageOpaque;
+
+#define STIR_PAGE_ID		0xFF84
+
+/* Metadata of stir index */
+typedef struct StirMetaPageData
+{
+	uint32		magicNumber;
+	BlockNumber	lastBlkNo;
+	bool		skipInserts;	/* should we just exit without any inserts? */
+} StirMetaPageData;
+
+/* Magic number to distinguish stir pages from others */
+#define STIR_MAGIC_NUMBER (0xDBAC0DEF)
+
+#define StirPageGetMeta(page)	((StirMetaPageData *) PageGetContents(page))
+
+typedef struct StirTuple
+{
+	ItemPointerData heapPtr;
+} StirTuple;
+
+#define StirPageGetFreeSpace(page) \
+	(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+		- StirPageGetMaxOffset(page) * (sizeof(StirTuple)) \
+		- MAXALIGN(sizeof(StirPageOpaqueData)))
+
+extern void StirFillMetapage(Relation index, Page metaPage, bool skipInserts);
+extern void StirInitMetapage(Relation index, ForkNumber forknum);
+extern void StirInitPage(Page page, uint16 flags);
+extern void StirMarkAsSkipInserts(Relation index);
+
+/* index access method interface functions */
+extern bool stirvalidate(Oid opclassoid);
+extern bool stirinsert(Relation index, Datum *values, bool *isnull,
+					 ItemPointer ht_ctid, Relation heapRel,
+					 IndexUniqueCheck checkUnique,
+					 bool indexUnchanged,
+					 struct IndexInfo *indexInfo);
+extern IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys);
+extern void stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+					 ScanKey orderbys, int norderbys);
+extern void stirendscan(IndexScanDesc scan);
+extern IndexBuildResult *stirbuild(Relation heap, Relation index,
+								 struct IndexInfo *indexInfo);
+extern void stirbuildempty(Relation index);
+extern IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+										   IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+										   void *callback_state);
+extern IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+											  IndexBulkDeleteResult *stats);
+extern bytea *stiroptions(Datum reloptions, bool validate);
+
+#endif			/* STIR_H */
diff --git a/src/include/catalog/pg_am.dat b/src/include/catalog/pg_am.dat
index 46d361047fe..8bd2c2b46ba 100644
--- a/src/include/catalog/pg_am.dat
+++ b/src/include/catalog/pg_am.dat
@@ -33,5 +33,8 @@
 { oid => '3580', oid_symbol => 'BRIN_AM_OID',
   descr => 'block range index (BRIN) access method',
   amname => 'brin', amhandler => 'brinhandler', amtype => 'i' },
+{ oid => '5555', oid_symbol => 'STIR_AM_OID',
+  descr => 'short term index replacement access method',
+  amname => 'stir', amhandler => 'stirhandler', amtype => 'i' },
 
 ]
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index df170b80840..a3457e749db 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -492,4 +492,8 @@
 
 # no brin opclass for the geometric types except box
 
+# allow any types for STIR
+{ opcmethod => 'stir', oid_symbol => 'ANY_STIR_OPS_OID', opcname => 'stir_ops',
+  opcfamily => 'stir/any_ops', opcintype => 'any'},
+
 ]
diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat
index 7a027c4810e..6ffc20a061c 100644
--- a/src/include/catalog/pg_opfamily.dat
+++ b/src/include/catalog/pg_opfamily.dat
@@ -308,5 +308,7 @@
   opfmethod => 'hash', opfname => 'multirange_ops' },
 { oid => '6158',
   opfmethod => 'gist', opfname => 'multirange_ops' },
+{ oid => '5558',
+  opfmethod => 'stir', opfname => 'any_ops' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 361e2cfffeb..f1475def487 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -935,6 +935,10 @@
   proname => 'brinhandler', provolatile => 'v',
   prorettype => 'index_am_handler', proargtypes => 'internal',
   prosrc => 'brinhandler' },
+{ oid => '5556', descr => 'short term index replacement access method handler',
+  proname => 'stirhandler', provolatile => 'v',
+  prorettype => 'index_am_handler', proargtypes => 'internal',
+  prosrc => 'stirhandler' },
 { oid => '3952', descr => 'brin: standalone scan new table pages',
   proname => 'brin_summarize_new_values', provolatile => 'v',
   proparallel => 'u', prorettype => 'int4', proargtypes => 'regclass',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..6ce9154b28d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -157,8 +157,8 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
- * during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
+ * are used only during index build; they're conventionally zeroed otherwise.
  * ----------------
  */
 typedef struct IndexInfo
@@ -218,7 +218,8 @@ typedef struct IndexInfo
 	bool		ii_WithoutOverlaps;
 	/* # of workers requested (excludes leader) */
 	int			ii_ParallelWorkers;
-
+	/* is auxiliary for concurrent index build? */
+	bool		ii_Auxiliary;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/utils/index_selfuncs.h b/src/include/utils/index_selfuncs.h
index 74793a1a19d..bf0e30dabe9 100644
--- a/src/include/utils/index_selfuncs.h
+++ b/src/include/utils/index_selfuncs.h
@@ -62,6 +62,14 @@ extern void spgcostestimate(struct PlannerInfo *root,
 							Selectivity *indexSelectivity,
 							double *indexCorrelation,
 							double *indexPages);
+extern void stircostestimate(struct PlannerInfo *root,
+							struct IndexPath *path,
+							double loop_count,
+							Cost *indexStartupCost,
+							Cost *indexTotalCost,
+							Selectivity *indexSelectivity,
+							double *indexCorrelation,
+							double *indexPages);
 extern void gincostestimate(struct PlannerInfo *root,
 							struct IndexPath *path,
 							double loop_count,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index 7ab6113c619..92c033a2010 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -173,7 +173,13 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
  spgist | can_exclude   | t
  spgist | can_include   | t
  spgist | bogus         | 
-(36 rows)
+ stir   | can_order     | f
+ stir   | can_unique    | f
+ stir   | can_multi_col | t
+ stir   | can_exclude   | f
+ stir   | can_include   | t
+ stir   | bogus         | 
+(42 rows)
 
 --
 -- additional checks for pg_index_column_has_property
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 6ff4d7ee901..9259679eea2 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2129,9 +2129,10 @@ FROM pg_opclass AS c1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS a1
                  WHERE a1.amopfamily = c1.opcfamily
                    AND binary_coercible(c1.opcintype, a1.amoplefttype));
- opcname | opcfamily 
----------+-----------
-(0 rows)
+ opcname  | opcfamily 
+----------+-----------
+ stir_ops |      5558
+(1 row)
 
 -- Check that each operator listed in pg_amop has an associated opclass,
 -- that is one whose opcintype matches oprleft (possibly by coercion).
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index c8f3932edf0..ecc2c2a6049 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5171,7 +5171,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA *
 List of access methods
@@ -5185,7 +5186,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA h*
 List of access methods
@@ -5210,9 +5212,9 @@ List of access methods
 
 \dA: extra argument "bar" ignored
 \dA+
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5221,12 +5223,13 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ *
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5235,7 +5238,8 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ h*
                      List of access methods
-- 
2.43.0



  [application/x-patch] v30-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch (94.9K, 4-v30-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch)
  download | inline diff:
From caadba1e61ab6881831a64ce3d891829ddcb8208 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 15:03:10 +0100
Subject: [PATCH v30 4/7] Use auxiliary indexes for concurrent index operations

Replace the second table full scan in concurrent index builds with an auxiliary index approach:
- create a STIR auxiliary index with the same predicate (if exists) as in main index
- use it to track tuples inserted during the first phase
- merge auxiliary index with main index during validation to catch up new index with any tuples missed during the first phase
- automatically drop auxiliary when main index is ready

To merge main and auxiliary indexes:
- index_bulk_delete called for both, TIDs put into tuplesort
- both tuplesort are being sorted
- both tuplesort scanned with two pointers looking for the TIDs present in auxiliary index, but absent in main one
- all such TIDs are put into tuplestore
- all TIDs in tuplestore are fetched using the stream, tuplestore used in heapam_index_validate_scan_read_stream_next to provide the next page to prefetch
- if fetched tuple is alive - it is inserted into the main index

This eliminates the need for a second full table scan during validation, improving performance, especially for large tables. Affects both CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY operations.
---
 doc/src/sgml/monitoring.sgml               |  26 +-
 doc/src/sgml/ref/create_index.sgml         |  34 +-
 doc/src/sgml/ref/reindex.sgml              |  42 +-
 src/backend/access/heap/README.HOT         |  13 +-
 src/backend/access/heap/heapam_handler.c   | 548 ++++++++++++++-------
 src/backend/catalog/index.c                | 308 ++++++++++--
 src/backend/catalog/system_views.sql       |  17 +-
 src/backend/commands/indexcmds.c           | 344 +++++++++++--
 src/backend/nodes/makefuncs.c              |   4 +-
 src/include/access/tableam.h               |  12 +-
 src/include/catalog/index.h                |   9 +-
 src/include/commands/progress.h            |  13 +-
 src/include/nodes/makefuncs.h              |   3 +-
 src/test/regress/expected/create_index.out |  42 ++
 src/test/regress/expected/indexing.out     |   3 +-
 src/test/regress/expected/rules.out        |  17 +-
 src/test/regress/sql/create_index.sql      |  21 +
 17 files changed, 1121 insertions(+), 335 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b3d53550688..ce08e0d8b10 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6643,6 +6643,18 @@ FROM pg_stat_get_backend_idset() AS backendid;
        information for this phase.
       </entry>
      </row>
+     <row>
+      <entry><literal>waiting for writers to use auxiliary index</literal></entry>
+      <entry>
+       <command>CREATE INDEX CONCURRENTLY</command> or <command>REINDEX CONCURRENTLY</command> is waiting for transactions
+       with write locks that can potentially see the table to finish, to ensure use of auxiliary index for new tuples in
+       future transactions.
+       This phase is skipped when not in concurrent mode.
+       Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
+       and <structname>current_locker_pid</structname> contain the progress
+       information for this phase.
+      </entry>
+     </row>
      <row>
       <entry><literal>building index</literal></entry>
       <entry>
@@ -6683,13 +6695,12 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </entry>
      </row>
      <row>
-      <entry><literal>index validation: scanning table</literal></entry>
+      <entry><literal>index validation: merging indexes</literal></entry>
       <entry>
-       <command>CREATE INDEX CONCURRENTLY</command> is scanning the table
-       to validate the index tuples collected in the previous two phases.
+       <command>CREATE INDEX CONCURRENTLY</command> is merging content of auxiliary index with the target index.
        This phase is skipped when not in concurrent mode.
-       Columns <structname>blocks_total</structname> (set to the total size of the table)
-       and <structname>blocks_done</structname> contain the progress information for this phase.
+       Columns <structname>tuples_total</structname> (set to the number of tuples to be merged)
+       and <structname>tuples_done</structname> contain the progress information for this phase.
       </entry>
      </row>
      <row>
@@ -6706,8 +6717,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
      <row>
       <entry><literal>waiting for readers before marking dead</literal></entry>
       <entry>
-       <command>REINDEX CONCURRENTLY</command> is waiting for transactions
-       with read locks on the table to finish, before marking the old index dead.
+       <command>CREATE INDEX CONCURRENTLY</command> is waiting for transactions
+        with read locks on the table to finish, before marking the auxiliary index as dead.
+       <command>REINDEX CONCURRENTLY</command> is also waiting before marking the old index as dead.
        This phase is skipped when not in concurrent mode.
        Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
        and <structname>current_locker_pid</structname> contain the progress
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index bb7505d171b..12c88587a79 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -620,10 +620,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     out writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>CREATE INDEX</command>.
     When this option is used,
-    <productname>PostgreSQL</productname> must perform two scans of the table, and in
-    addition it must wait for all existing transactions that could potentially
-    modify or use the index to terminate.  Thus
-    this method requires more total work than a standard index build and takes
+    <productname>PostgreSQL</productname> must perform table scan followed by
+    validation phase, and in addition it must wait for all existing transactions
+    that could potentially modify or use the index to terminate.  Thus
+    this method requires more total work than a standard index build and may take
     significantly longer to complete.  However, since it allows normal
     operations to continue while the index is built, this method is useful for
     adding new indexes in a production environment.  Of course, the extra CPU
@@ -631,14 +631,14 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </para>
 
    <para>
-    In a concurrent index build, the index is actually entered as an
-    <quote>invalid</quote> index into
-    the system catalogs in one transaction, then two table scans occur in
-    two more transactions.  Before each table scan, the index build must
+    In a concurrent index build, the main and auxiliary indexes are actually
+    entered as an <quote>invalid</quote> index into
+    the system catalogs in one transaction, then two phases occur in
+    multiple transactions.  Before each phase, the index build must
     wait for existing transactions that have modified the table to terminate.
-    After the second scan, the index build must wait for any transactions
+    After the second phase, the index build must wait for any transactions
     that have a snapshot (see <xref linkend="mvcc"/>) predating the second
-    scan to terminate, including transactions used by any phase of concurrent
+    phase to terminate, including transactions used by any phase of concurrent
     index builds on other tables, if the indexes involved are partial or have
     columns that are not simple column references.
     Then finally the index can be marked <quote>valid</quote> and ready for use,
@@ -651,10 +651,11 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    <para>
     If a problem arises while scanning the table, such as a deadlock or a
     uniqueness violation in a unique index, the <command>CREATE INDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> index. This index
-    will be ignored for querying purposes because it might be incomplete;
-    however it will still consume update overhead. The <application>psql</application>
-    <command>\d</command> command will report such an index as <literal>INVALID</literal>:
+    command will fail but leave behind an <quote>invalid</quote> index and its
+    associated auxiliary index. These indexes
+    will be ignored for querying purposes because they might be incomplete;
+    however they will still consume update overhead. The <application>psql</application>
+    <command>\d</command> command will report such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -664,11 +665,12 @@ postgres=# \d tab
  col    | integer |           |          |
 Indexes:
     "idx" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     The recommended recovery
-    method in such cases is to drop the index and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>.  (Another possibility is
+    method in such cases is to drop these indexes and try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
     to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
    </para>
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 185cd75ca30..1c3c7a97f6a 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -76,7 +76,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
       this index is left as <quote>invalid</quote>. Such indexes are useless
       but it can be convenient to use <command>REINDEX</command> to rebuild
       them. Note that only <command>REINDEX INDEX</command> is able
-      to perform a concurrent build on an invalid index.
+      to perform a concurrent build on a invalid index.
      </para>
     </listitem>
 
@@ -368,9 +368,8 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <productname>PostgreSQL</productname> supports rebuilding indexes with minimum locking
     of writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>REINDEX</command>. When this option
-    is used, <productname>PostgreSQL</productname> must perform two scans of the table
-    for each index that needs to be rebuilt and wait for termination of
-    all existing transactions that could potentially use the index.
+    is used, <productname>PostgreSQL</productname> must perform several steps to ensure data
+    consistency while allowing normal operations to continue.
     This method requires more total work than a standard index
     rebuild and takes significantly longer to complete as it needs to wait
     for unfinished transactions that might modify the index. However, since
@@ -388,7 +387,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <orderedlist>
      <listitem>
       <para>
-       A new transient index definition is added to the catalog
+       A new transient index definition and an auxiliary index are added to the catalog
        <literal>pg_index</literal>.  This definition will be used to replace
        the old index.  A <literal>SHARE UPDATE EXCLUSIVE</literal> lock at
        session level is taken on the indexes being reindexed as well as their
@@ -398,7 +397,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       A first pass to build the index is done for each new index.  Once the
+       The auxiliary index is marked as "ready for inserts", making
+       it visible to other sessions. This index efficiently tracks all new
+       tuples during the reindex process.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The new main index is built by scanning the table.  Once the
        index is built, its flag <literal>pg_index.indisready</literal> is
        switched to <quote>true</quote> to make it ready for inserts, making it
        visible to other sessions once the transaction that performed the build
@@ -409,9 +416,9 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       Then a second pass is performed to add tuples that were added while the
-       first pass was running.  This step is also done in a separate
-       transaction for each index.
+       A validation phase merges any missing entries from the auxiliary index
+       into the main index, ensuring all concurrent changes are captured.
+       This step is also done in a separate transaction for each index.
       </para>
      </listitem>
 
@@ -428,7 +435,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes have <literal>pg_index.indisready</literal> switched to
+       The old and auxiliary indexes have <literal>pg_index.indisready</literal> switched to
        <quote>false</quote> to prevent any new tuple insertions, after waiting
        for running queries that might reference the old index to complete.
       </para>
@@ -436,7 +443,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes are dropped.  The <literal>SHARE UPDATE
+       The old and auxiliary indexes are dropped.  The <literal>SHARE UPDATE
        EXCLUSIVE</literal> session locks for the indexes and the table are
        released.
       </para>
@@ -447,11 +454,11 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
    <para>
     If a problem arises while rebuilding the indexes, such as a
     uniqueness violation in a unique index, the <command>REINDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> new index in addition to
-    the pre-existing one. This index will be ignored for querying purposes
-    because it might be incomplete; however it will still consume update
+    command will fail but leave behind an <quote>invalid</quote> new index and its auxiliary index in addition to
+    the pre-existing one. These indexes will be ignored for querying purposes
+    because they might be incomplete; however they will still consume update
     overhead. The <application>psql</application> <command>\d</command> command will report
-    such an index as <literal>INVALID</literal>:
+    such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -462,12 +469,13 @@ postgres=# \d tab
 Indexes:
     "idx" btree (col)
     "idx_ccnew" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     If the index marked <literal>INVALID</literal> is suffixed
-    <literal>_ccnew</literal>, then it corresponds to the transient
+    <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
-    recovery method is to drop it using <literal>DROP INDEX</literal>,
+    recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
     then attempt <command>REINDEX CONCURRENTLY</command> again.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 74e407f375a..b1c797517ee 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -375,6 +375,11 @@ constraint on which updates can be HOT.  Other transactions must include
 such an index when determining HOT-safety of updates, even though they
 must ignore it for both insertion and searching purposes.
 
+Also, special auxiliary index is created the same way. It is marked as
+"ready for inserts" without any actual table scan. Its purpose is to collect
+new tuples inserted into table while our target index is still "not ready
+for inserts".
+
 We must do this to avoid making incorrect index entries.  For example,
 suppose we are building an index on column X and we make an index entry for
 a non-HOT tuple with X=1.  Then some other backend, unaware that X is an
@@ -394,10 +399,10 @@ entry at the root of the HOT-update chain but we use the key value from the
 live tuple.
 
 We mark the index open for inserts (but still not ready for reads) then
-we again wait for transactions which have the table open.  Then we take
-a second reference snapshot and validate the index.  This searches for
-tuples missing from the index, and inserts any missing ones.  Again,
-the index entries have to have TIDs equal to HOT-chain root TIDs, but
+we again wait for transactions which have the table open.  Then validate
+the index.  This searches for tuples missing from the index in auxiliary
+index, and inserts any missing ones if they are visible to reference snapshot.
+Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3ff36f59bf8..4ad8a2c0f81 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -41,6 +41,7 @@
 #include "storage/bufpage.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -1759,242 +1760,409 @@ heapam_index_build_range_scan(Relation heapRelation,
 	return reltuples;
 }
 
+/*
+ * Calculate set difference (relative complement) of main and aux
+ * sets.
+ *
+ * All records which are present in auxiliary tuplesort but not in
+ * main are added to the store.
+ *
+ * In set theory notation store = aux - main or store = aux / main.
+ *
+ * returns number of items added to store
+ */
+static int
+heapam_index_validate_tuplesort_difference(Tuplesortstate  *main,
+										   Tuplesortstate  *aux,
+										   Tuplestorestate *store)
+{
+	int				num = 0;
+	/* state variables for the merge */
+	ItemPointer 	indexcursor = NULL,
+					auxindexcursor = NULL;
+	ItemPointerData decoded,
+					auxdecoded;
+	bool			tuplesort_empty = false,
+					auxtuplesort_empty = false;
+
+	/* Initialize pointers. */
+	ItemPointerSetInvalid(&decoded);
+	ItemPointerSetInvalid(&auxdecoded);
+
+	/*
+	 * Main loop: we step through the auxiliary sort (auxState->tuplesort),
+	 * which holds TIDs that must compared to those from the "main" sort
+	 * (state->tuplesort).
+	 */
+	while (!auxtuplesort_empty)
+	{
+		Datum		ts_val;
+		bool		ts_isnull;
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		* Attempt to fetch the next TID from the auxiliary sort. If it's
+		* empty, we set auxindexcursor to NULL.
+		*/
+		auxtuplesort_empty = !tuplesort_getdatum(aux, true,
+												 false, &ts_val, &ts_isnull,
+												 NULL);
+		Assert(auxtuplesort_empty || !ts_isnull);
+		if (!auxtuplesort_empty)
+		{
+			itemptr_decode(&auxdecoded, DatumGetInt64(ts_val));
+			auxindexcursor = &auxdecoded;
+		}
+		else
+		{
+			auxindexcursor = NULL;
+		}
+
+		/*
+		* If the auxiliary sort is not yet empty, we now try to synchronize
+		* the "main" sort cursor (indexcursor) with auxindexcursor. We advance
+		* the main sort cursor until we've reached or passed the auxiliary TID.
+		*/
+		if (!auxtuplesort_empty)
+		{
+			/*
+			 * Move the main sort forward while:
+			 *   (1) It's not exhausted (tuplesort_empty == false), and
+			 *   (2) Either indexcursor is NULL (first iteration) or
+			 *       indexcursor < auxindexcursor in TID order.
+			 */
+			while (!tuplesort_empty && (indexcursor == NULL || /* null on first time here */
+						ItemPointerCompare(indexcursor, auxindexcursor) < 0))
+			{
+				/*
+				 * Get the next TID from the main sort. If it's empty,
+				 * we set indexcursor to NULL.
+				 */
+				tuplesort_empty = !tuplesort_getdatum(main, true,
+													  false, &ts_val, &ts_isnull,
+													  NULL);
+				Assert(tuplesort_empty || !ts_isnull);
+
+				if (!tuplesort_empty)
+				{
+					itemptr_decode(&decoded, DatumGetInt64(ts_val));
+					indexcursor = &decoded;
+				}
+				else
+				{
+					indexcursor = NULL;
+				}
+
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * Now, if either:
+			 *  - the main sort is empty, or
+			 *  - indexcursor > auxindexcursor,
+			 *
+			 * then auxindexcursor identifies a TID that doesn't appear in
+			 * the main sort. We likely need to insert it
+			 * into the target index if it’s visible in the heap.
+			 */
+			if (tuplesort_empty || ItemPointerCompare(indexcursor, auxindexcursor) > 0)
+			{
+				tuplestore_putdatum(store, Int64GetDatum(itemptr_encode(auxindexcursor)));
+				num++;
+			}
+		}
+	}
+
+	return num;
+}
+
+typedef struct ValidateIndexScanState
+{
+	Tuplestorestate		*store;
+	BlockNumber			prev_block_number;
+	OffsetNumber		prev_offset_number;
+} ValidateIndexScanState;
+
+/*
+ * This is ReadStreamBlockNumberCB implementation which works as follows:
+ *
+ * 1) It iterates over a sorted tuplestore, where each element is an encoded
+ *    ItemPointer
+ *
+ * 2) It returns the current BlockNumber and collects all OffsetNumbers
+ *    for that block in per_buffer_data.
+ *
+ * 3) Once the code encounters a new BlockNumber, it stops reading more
+ *    offsets and saves the OffsetNumber of the new block for the next call.
+ *
+ * 4) The list of offsets for a block is always terminated with InvalidOffsetNumber.
+ *
+ * This function is intended to be repeatedly called, each time returning
+ * the next block and its corresponding set of offsets.
+ */
+static BlockNumber
+heapam_index_validate_scan_read_stream_next(
+								  ReadStream *stream,
+								  void *void_callback_private_data,
+								  void *void_per_buffer_data
+								  )
+{
+	bool should_free;
+	Datum datum;
+	BlockNumber result = InvalidBlockNumber;
+	int i = 0;
+
+	/*
+	 * Retrieve the specialized callback state and the output buffer.
+	 * callback_private_data keeps track of the previous block and offset
+	 * from a prior invocation, if any.
+	 */
+	ValidateIndexScanState *callback_private_data = void_callback_private_data;
+	OffsetNumber *per_buffer_data = void_per_buffer_data;
+
+	/*
+	 * If there is a "leftover" offset number from the previous invocation,
+	 * it means we had switched to a new block in the middle of the last call.
+	 * We place that leftover offset number into the buffer first.
+	 */
+	if (callback_private_data->prev_offset_number != InvalidOffsetNumber)
+	{
+		Assert(callback_private_data->prev_block_number != InvalidBlockNumber);
+		/*
+		 * 'result' is the block number to return. We set it to the block
+		 * from the previous leftover offset.
+		 */
+		result = callback_private_data->prev_block_number;
+		/* Place leftover offset number in the output buffer. */
+		per_buffer_data[i++] = callback_private_data->prev_offset_number;
+		/*
+		 * Clear the leftover offset number so it won't be reused unless
+		 * we encounter another block change.
+		 */
+		callback_private_data->prev_offset_number = InvalidOffsetNumber;
+	}
+
+	/*
+	 * Read from the tuplestore until we either run out of tuples or we
+	 * encounter a block change. For each tuple:
+	 *
+	 *   1) Decode its block/offset from the Datum.
+	 *   2) If it's the first time in this call (prev_block_number == InvalidBlockNumber),
+	 *      initialize prev_block_number.
+	 *   3) If the block number matches the current block, collect the offset.
+	 *   4) If the block number differs, save that offset as leftover and break
+	 *      so that the next call can handle the new block.
+	 */
+	while (tuplestore_getdatum(callback_private_data->store, true, &should_free, &datum))
+	{
+		BlockNumber next_block_number;
+		ItemPointerData next_data;
+
+		/* Decode the datum into an ItemPointer (block + offset). */
+		itemptr_decode(&next_data, DatumGetInt64(datum));
+		next_block_number = ItemPointerGetBlockNumber(&next_data);
+
+		/*
+		 * If we haven't set a block number yet this round, initialize it
+		 * using the first tuple we read.
+		 */
+		if (callback_private_data->prev_block_number == InvalidBlockNumber)
+			callback_private_data->prev_block_number = next_block_number;
+
+		/*
+		 * Always set the result to be the "current" block number
+		 * we are filling offsets for.
+		 */
+		result = callback_private_data->prev_block_number;
+
+		/*
+		 * If this tuple is from the same block, just store its offset
+		 * in our per_buffer_data array.
+		 */
+		if (next_block_number == callback_private_data->prev_block_number)
+		{
+			per_buffer_data[i++] = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+		}
+		else
+		{
+			/*
+			 * If the block just changed, store the offset of the new block
+			 * as leftover for the next invocation and break out.
+			 */
+			callback_private_data->prev_block_number = next_block_number;
+			callback_private_data->prev_offset_number = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+
+			/* Break to let the next call handle the new block. */
+			break;
+		}
+	}
+
+	/*
+	 * Terminate the list of offsets for this block with an InvalidOffsetNumber.
+	 */
+	per_buffer_data[i] = InvalidOffsetNumber;
+	return result;
+}
+
 static void
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
 						   Snapshot snapshot,
-						   ValidateIndexState *state)
+						   ValidateIndexState *state,
+						   ValidateIndexState *auxState)
 {
-	TableScanDesc scan;
-	HeapScanDesc hscan;
-	HeapTuple	heapTuple;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-	BlockNumber previous_blkno = InvalidBlockNumber;
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
+
+	TupleTableSlot  *slot;
+	EState			*estate;
+	ExprContext		*econtext;
+	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	int				num_to_check;
+	Tuplestorestate *tuples_for_check;
+	ValidateIndexScanState callback_private_data;
+
+	Buffer buf;
+	OffsetNumber* tuples;
+	ReadStream *read_stream;
+
+	/* Use 10% of memory for tuple store. */
+	int		store_work_mem_part = maintenance_work_mem / 10;
+
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting
+	 * item pointers.  This can be significantly faster, primarily because TID
+	 * is a pass-by-reference type on all platforms, whereas int8 is
+	 * pass-by-value on most platforms.
+	 */
+	tuples_for_check =  tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
 	/*
 	 * sanity checks
 	 */
 	Assert(OidIsValid(indexRelation->rd_rel->relam));
 
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
+	num_to_check = heapam_index_validate_tuplesort_difference(state->tuplesort,
+														 auxState->tuplesort,
+														 tuples_for_check);
+
+	/* It is our responsibility to close tuple sort as fast as we can */
+	tuplesort_end(state->tuplesort);
+	tuplesort_end(auxState->tuplesort);
+
+	state->tuplesort = auxState->tuplesort = NULL;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+									&TTSOpsBufferHeapTuple);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
 
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+	callback_private_data.prev_block_number = InvalidBlockNumber;
+	callback_private_data.store = tuples_for_check;
+	callback_private_data.prev_offset_number = InvalidOffsetNumber;
 
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	scan = table_beginscan_strat(heapRelation,	/* relation */
-								 snapshot,	/* snapshot */
-								 0, /* number of keys */
-								 NULL,	/* scan key */
-								 true,	/* buffer access strategy OK */
-								 false);	/* syncscan not OK */
-	hscan = (HeapScanDesc) scan;
+	read_stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING,
+														 bstrategy,
+														 heapRelation, MAIN_FORKNUM,
+														 heapam_index_validate_scan_read_stream_next,
+														 &callback_private_data,
+														 (MaxHeapTuplesPerPage + 1) * sizeof(OffsetNumber));
 
-	pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_TOTAL,
-								 hscan->rs_nblocks);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_TOTAL, num_to_check);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE, 0);
 
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while ((buf = read_stream_next_buffer(read_stream, (void*) &tuples)) != InvalidBuffer)
 	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
+		HeapTupleData	heap_tuple_data[MaxHeapTuplesPerPage];
+		int i;
+		OffsetNumber off;
+		BlockNumber block_number;
 
 		CHECK_FOR_INTERRUPTS();
 
-		state->htups += 1;
+		LockBuffer(buf, BUFFER_LOCK_SHARE);
+		block_number = BufferGetBlockNumber(buf);
 
-		if ((previous_blkno == InvalidBlockNumber) ||
-			(hscan->rs_cblock != previous_blkno))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
-										 hscan->rs_cblock);
-			previous_blkno = hscan->rs_cblock;
+			ItemPointerData tid;
+			bool		all_dead, found;
+			ItemPointerSet(&tid, block_number, off);
+
+			found = heap_hot_search_buffer(&tid, heapRelation, buf, snapshot,
+										   &heap_tuple_data[i], &all_dead, true);
+			if (!found)
+				ItemPointerSetInvalid(&heap_tuple_data[i].t_self);
+			i++;
+			state->htups += 1;
 		}
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		/*
-		 * As commented in table_index_build_scan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
-		 */
-		if (hscan->rs_cblock != root_blkno)
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
-		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
-		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
-		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
+			if (ItemPointerIsValid(&heap_tuple_data[i].t_self))
 			{
+				ItemPointerData root_tid;
+				ItemPointerSet(&root_tid, block_number, off);
+
+				/* Reset the per-tuple memory context for the next fetch. */
+				MemoryContextReset(econtext->ecxt_per_tuple_memory);
+				ExecStoreBufferHeapTuple(&heap_tuple_data[i], slot, buf);
+
+				/* Compute the key values and null flags for this tuple. */
+				FormIndexDatum(indexInfo,
+							   slot,
+							   estate,
+							   values,
+							   isnull);
+
 				/*
-				 * Remember index items seen earlier on the current heap page
+				 * Insert the tuple into the target index.
 				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+				index_insert(indexRelation,
+							 values,
+							 isnull,
+							 &root_tid, /* insert root tuple */
+							 heapRelation,
+							 indexInfo->ii_Unique ?
+							 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+							 false,
+							 indexInfo);
+
+				state->tups_inserted += 1;
 			}
 
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  false, &ts_val, &ts_isnull,
-												  NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_TUPLES_DONE, 1);
+			i++;
 		}
 
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 false,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
+		ReleaseBuffer(buf);
 	}
 
-	table_endscan(scan);
-
 	ExecDropSingleTupleTableSlot(slot);
 
 	FreeExecutorState(estate);
 
+	read_stream_end(read_stream);
+	tuplestore_end(tuples_for_check);
+
+	FreeAccessStrategy(bstrategy);
+
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1325f3d9700..4f77627fb3b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -712,11 +712,16 @@ UpdateIndexRelation(Oid indexoid,
  *			already exists.
  *		INDEX_CREATE_PARTITIONED:
  *			create a partitioned index (table must be partitioned)
+ *		INDEX_CREATE_AUXILIARY:
+ *			mark index as auxiliary index
  * constr_flags: flags passed to index_constraint_create
  *		(only if INDEX_CREATE_ADD_CONSTRAINT is set)
  * allow_system_table_mods: allow table to be a system catalog
  * is_internal: if true, post creation hook for new index
  * constraintId: if not NULL, receives OID of created constraint
+ * relpersistence: persistence level to use for index. In most of the
+ *		cases it should be equal to the persistence level of the table,
+ *		auxiliary indexes are only exception here.
  *
  * Returns the OID of the created index.
  */
@@ -757,6 +762,7 @@ index_create(Relation heapRelation,
 	bool		invalid = (flags & INDEX_CREATE_INVALID) != 0;
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
+	bool		auxiliary = (flags & INDEX_CREATE_AUXILIARY) != 0;
 	char		relkind;
 	TransactionId relfrozenxid;
 	MultiXactId relminmxid;
@@ -782,7 +788,10 @@ index_create(Relation heapRelation,
 	namespaceId = RelationGetNamespace(heapRelation);
 	shared_relation = heapRelation->rd_rel->relisshared;
 	mapped_relation = RelationIsMapped(heapRelation);
-	relpersistence = heapRelation->rd_rel->relpersistence;
+	if (auxiliary)
+		relpersistence = RELPERSISTENCE_UNLOGGED; /* aux indexes are always unlogged */
+	else
+		relpersistence = heapRelation->rd_rel->relpersistence;
 
 	/*
 	 * check parameters
@@ -790,6 +799,11 @@ index_create(Relation heapRelation,
 	if (indexInfo->ii_NumIndexAttrs < 1)
 		elog(ERROR, "must index at least one column");
 
+	if (indexInfo->ii_Am == STIR_AM_OID && !auxiliary)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("user-defined indexes with STIR access method are not supported")));
+
 	if (!allow_system_table_mods &&
 		IsSystemRelation(heapRelation) &&
 		IsNormalProcessingMode())
@@ -1395,7 +1409,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							false,	/* not ready for inserts */
 							true,
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							false);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1470,6 +1485,154 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	return newIndexId;
 }
 
+/*
+ * index_concurrently_create_aux
+ *
+ * Create concurrently an auxiliary index based on the definition of the one
+ * provided by caller.  The index is inserted into catalogs and needs to be
+ * built later on. This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ */
+Oid
+index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
+							   Oid tablespaceOid, const char *newName)
+{
+	Relation	indexRelation;
+	IndexInfo  *oldInfo,
+			*newInfo;
+	Oid			newIndexId = InvalidOid;
+	HeapTuple	indexTuple;
+
+	List	   *indexColNames = NIL;
+	List	   *indexExprs = NIL;
+	List	   *indexPreds = NIL;
+
+	Oid *auxOpclassIds;
+	int16 *auxColoptions;
+
+	indexRelation = index_open(mainIndexId, RowExclusiveLock);
+
+	/* The new index needs some information from the old index */
+	oldInfo = BuildIndexInfo(indexRelation);
+
+	/*
+	 * Build of an auxiliary index with exclusion constraints is not
+	 * supported.
+	 */
+	if (oldInfo->ii_ExclusionOps != NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("auxiliary index creation for exclusion constraints is not supported")));
+
+	/* Get the array of class and column options IDs from index info */
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(mainIndexId));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", mainIndexId);
+
+
+	/*
+	 * Fetch the list of expressions and predicates directly from the
+	 * catalogs.  This cannot rely on the information from IndexInfo of the
+	 * old index as these have been flattened for the planner.
+	 */
+	if (oldInfo->ii_Expressions != NIL)
+	{
+		Datum		exprDatum;
+		char	   *exprString;
+
+		exprDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indexprs);
+		exprString = TextDatumGetCString(exprDatum);
+		indexExprs = (List *) stringToNode(exprString);
+		pfree(exprString);
+	}
+	if (oldInfo->ii_Predicate != NIL)
+	{
+		Datum		predDatum;
+		char	   *predString;
+
+		predDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indpred);
+		predString = TextDatumGetCString(predDatum);
+		indexPreds = (List *) stringToNode(predString);
+
+		/* Also convert to implicit-AND format */
+		indexPreds = make_ands_implicit((Expr *) indexPreds);
+		pfree(predString);
+	}
+
+	/*
+	 * Build the index information for the new index.  Note that rebuild of
+	 * indexes with exclusion constraints is not supported, hence there is no
+	 * need to fill all the ii_Exclusion* fields.
+	 */
+	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
+							oldInfo->ii_NumIndexKeyAttrs,
+							STIR_AM_OID, /* special AM for aux indexes */
+							indexExprs,
+							indexPreds,
+							false,	/* aux index are not unique */
+							oldInfo->ii_NullsNotDistinct,
+							false,	/* not ready for inserts */
+							true,
+							false,	/* aux are not summarizing */
+							false,	/* aux are not without overlaps */
+							true	/* auxiliary */);
+
+	/*
+	 * Extract the list of column names and the column numbers for the new
+	 * index information.  All this information will be used for the index
+	 * creation.
+	 */
+	for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+	{
+		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
+		Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+		indexColNames = lappend(indexColNames, NameStr(att->attname));
+		newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+	}
+
+	auxOpclassIds = palloc0(sizeof(Oid) * newInfo->ii_NumIndexAttrs);
+	auxColoptions = palloc0(sizeof(int16) * newInfo->ii_NumIndexAttrs);
+
+	/* Fill with "any ops" */
+	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
+	{
+		auxOpclassIds[i] = ANY_STIR_OPS_OID;
+		auxColoptions[i] = 0;
+	}
+
+	newIndexId = index_create(heapRelation,
+							  newName,
+							  InvalidOid,    /* indexRelationId */
+							  InvalidOid,    /* parentIndexRelid */
+							  InvalidOid,    /* parentConstraintId */
+							  InvalidRelFileNumber, /* relFileNumber */
+							  newInfo,
+							  indexColNames,
+							  STIR_AM_OID,
+							  tablespaceOid,
+							  indexRelation->rd_indcollation,
+							  auxOpclassIds,
+							  NULL,
+							  auxColoptions,
+							  NULL,
+							  (Datum) 0,
+							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT | INDEX_CREATE_AUXILIARY,
+							  0,
+							  true, /* allow table to be a system catalog? */
+							  false,    /* is_internal? */
+							  NULL);
+
+	/* Close the relations used and clean up */
+	index_close(indexRelation, NoLock);
+	ReleaseSysCache(indexTuple);
+
+	return newIndexId;
+}
+
 /*
  * index_concurrently_build
  *
@@ -2450,7 +2613,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2510,7 +2674,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3286,12 +3451,21 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * We do a concurrent index build by first inserting the catalog entry for the
  * index via index_create(), marking it not indisready and not indisvalid.
+ * Then we create special auxiliary index the same way. It based on STIR AM.
  * Then we commit our transaction and start a new one, then we wait for all
  * transactions that could have been modifying the table to terminate.  Now
- * we know that any subsequently-started transactions will see the index and
+ * we know that any subsequently-started transactions will see indexes and
  * honor its constraints on HOT updates; so while existing HOT-chains might
  * be broken with respect to the index, no currently live tuple will have an
- * incompatible HOT update done to it.  We now build the index normally via
+ * incompatible HOT update done to it.
+ *
+ * After that, we build the auxiliary index. It is fast operation without any actual
+ * table scan. As result, we have empty STIR index. We commit transaction and
+ * again wait for all transactions that could have been modifying the table
+ * to terminate. At that moment all new tuples are going to be inserted into
+ * auxiliary index.
+ *
+ * We now build the index normally via
  * index_build(), while holding a weak lock that allows concurrent
  * insert/update/delete.  Also, we index only tuples that are valid
  * as of the start of the scan (see table_index_build_scan), whereas a normal
@@ -3301,14 +3475,17 @@ IndexCheckExclusion(Relation heapRelation,
  * bogus unique-index failures due to concurrent UPDATEs (we might see
  * different versions of the same row as being valid when we pass over them,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
- * does not contain any tuples added to the table while we built the index.
+ * does not contain any tuples added to the table while we built the index
+ * (but these tuples contained in auxiliary index).
  *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
- * commit the second transaction and start a third.  Again we wait for all
+ * commit the third transaction and start a fourth.  Again we wait for all
  * transactions that could have been modifying the table to terminate.  Now
  * we know that any subsequently-started transactions will see the index and
- * insert their new tuples into it.  We then take a new reference snapshot
- * which is passed to validate_index().  Any tuples that are valid according
+ * insert their new tuples into it. At the same moment we clear "indisready" for
+ * auxiliary index, since it is no more required to be updated.
+ *
+ * We then take a new reference snapshot, any tuples that are valid according
  * to this snap, but are not in the index, must be added to the index.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
@@ -3316,12 +3493,14 @@ IndexCheckExclusion(Relation heapRelation,
  * that might care about them before we mark the index valid.)
  *
  * validate_index() works by first gathering all the TIDs currently in the
- * index, using a bulkdelete callback that just stores the TIDs and doesn't
+ * indexes, using a bulkdelete callback that just stores the TIDs and doesn't
  * ever say "delete it".  (This should be faster than a plain indexscan;
  * also, not all index AMs support full-index indexscan.)  Then we sort the
- * TIDs, and finally scan the table doing a "merge join" against the TID list
- * to see which tuples are missing from the index.  Thus we will ensure that
- * all tuples valid according to the reference snapshot are in the index.
+ * TIDs of both auxiliary and target indexes, and doing a "merge join" against
+ * the TID lists to see which tuples from auxiliary index are missing from the
+ * target index.  Thus we will ensure that all tuples valid according to the
+ * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
  * tuple that is already dead or is in process of being deleted, and we
@@ -3339,22 +3518,26 @@ IndexCheckExclusion(Relation heapRelation,
  * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
  * transactions will be able to use it for queries.
  *
- * Doing two full table scans is a brute-force strategy.  We could try to be
- * cleverer, eg storing new tuples in a special area of the table (perhaps
- * making the table append-only by setting use_fsm).  However that would
- * add yet more locking issues.
+ * Also, some actions to concurrent drop the auxiliary index are performed.
  */
 void
-validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 {
 	Relation	heapRelation,
-				indexRelation;
+				indexRelation,
+				auxIndexRelation;
 	IndexInfo  *indexInfo;
-	IndexVacuumInfo ivinfo;
-	ValidateIndexState state;
+	IndexVacuumInfo ivinfo, auxivinfo;
+	ValidateIndexState state, auxState;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	/* Use 80% of maintenance_work_mem to target index sorting and
+	 * 10% rest for auxiliary.
+	 *
+	 * Rest 10% will be used for tuplestore later. */
+	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
+	int			aux_work_mem_part = maintenance_work_mem / 10;
 
 	{
 		const int	progress_index[] = {
@@ -3387,6 +3570,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	RestrictSearchPath();
 
 	indexRelation = index_open(indexId, RowExclusiveLock);
+	auxIndexRelation = index_open(auxIndexId, RowExclusiveLock);
 
 	/*
 	 * Fetch info needed for index_insert.  (You might think this should be
@@ -3411,15 +3595,49 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.strategy = NULL;
 	ivinfo.validate_index = true;
 
+	/*
+	 * Copy all info to auxiliary info, changing only relation.
+	 */
+	auxivinfo = ivinfo;
+	auxivinfo.index = auxIndexRelation;
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
 	 * is a pass-by-reference type on all platforms, whereas int8 is
 	 * pass-by-value on most platforms.
 	 */
+	auxState.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
+										   InvalidOid, false,
+										   aux_work_mem_part,
+										   NULL, TUPLESORT_NONE);
+	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
+
+	(void) index_bulk_delete(&auxivinfo, NULL,
+							 validate_index_callback, &auxState);
+	/* If aux index is empty, merge may be skipped */
+	if (auxState.itups == 0)
+	{
+		tuplesort_end(auxState.tuplesort);
+		auxState.tuplesort = NULL;
+
+		/* Roll back any GUC changes executed by index functions */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+
+		/* Close rels, but keep locks */
+		index_close(auxIndexRelation, NoLock);
+		index_close(indexRelation, NoLock);
+		table_close(heapRelation, NoLock);
+
+		return;
+	}
+
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
 											InvalidOid, false,
-											maintenance_work_mem,
+											(int) main_work_mem_part,
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
@@ -3442,27 +3660,30 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	tuplesort_performsort(auxState.tuplesort);
 
 	/*
-	 * Now scan the heap and "merge" it with the index
+	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN);
+								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
 	table_index_validate_scan(heapRelation,
 							  indexRelation,
 							  indexInfo,
 							  snapshot,
-							  &state);
+							  &state,
+							  &auxState);
 
-	/* Done with tuplesort object */
-	tuplesort_end(state.tuplesort);
+	/* Tuple sort closed by table_index_validate_scan */
+	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
 
 	/* Make sure to release resources cached in indexInfo (if needed). */
 	index_insert_cleanup(indexRelation, indexInfo);
 
 	elog(DEBUG2,
-		 "validate_index found %.0f heap tuples, %.0f index tuples; inserted %.0f missing tuples",
-		 state.htups, state.itups, state.tups_inserted);
+		 "validate_index fetched %.0f heap tuples, %.0f index tuples;"
+						" %.0f aux index tuples; inserted %.0f missing tuples",
+		 state.htups, state.itups, auxState.itups, state.tups_inserted);
 
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
@@ -3471,6 +3692,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
 	/* Close rels, but keep locks */
+	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
 }
@@ -3531,6 +3753,12 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			Assert(!indexForm->indisvalid);
 			indexForm->indisvalid = true;
 			break;
+		case INDEX_DROP_CLEAR_READY:
+			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
+			Assert(indexForm->indisready);
+			Assert(!indexForm->indisvalid);
+			indexForm->indisready = false;
+			break;
 		case INDEX_DROP_CLEAR_VALID:
 
 			/*
@@ -3802,6 +4030,13 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		indexInfo->ii_ExclusionStrats = NULL;
 	}
 
+	/* Auxiliary indexes are not allowed to be rebuilt */
+	if (indexInfo->ii_Auxiliary)
+		ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("reindex of auxiliary index \"%s\" not supported",
+					RelationGetRelationName(iRel))));
+
 	/* Suppress use of the target index while rebuilding it */
 	SetReindexProcessing(heapId, indexId);
 
@@ -4044,6 +4279,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
+		Oid			indexAm = get_rel_relam(indexOid);
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4069,6 +4305,18 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
+		if (indexAm == STIR_AM_OID)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+							get_namespace_name(indexNamespaceId),
+							get_rel_name(indexOid))));
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			continue;
+		}
+
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ecb7c996e86..3e40ed2f439 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1350,16 +1350,17 @@ CREATE VIEW pg_stat_progress_create_index AS
                       END AS command,
         CASE S.param10 WHEN 0 THEN 'initializing'
                        WHEN 1 THEN 'waiting for writers before build'
-                       WHEN 2 THEN 'building index' ||
+                       WHEN 2 THEN 'waiting for writers to use auxiliary index'
+                       WHEN 3 THEN 'building index' ||
                            COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)),
                                     '')
-                       WHEN 3 THEN 'waiting for writers before validation'
-                       WHEN 4 THEN 'index validation: scanning index'
-                       WHEN 5 THEN 'index validation: sorting tuples'
-                       WHEN 6 THEN 'index validation: scanning table'
-                       WHEN 7 THEN 'waiting for old snapshots'
-                       WHEN 8 THEN 'waiting for readers before marking dead'
-                       WHEN 9 THEN 'waiting for readers before dropping'
+                       WHEN 4 THEN 'waiting for writers before validation'
+                       WHEN 5 THEN 'index validation: scanning index'
+                       WHEN 6 THEN 'index validation: sorting tuples'
+                       WHEN 7 THEN 'index validation: merging indexes'
+                       WHEN 8 THEN 'waiting for old snapshots'
+                       WHEN 9 THEN 'waiting for readers before marking dead'
+                       WHEN 10 THEN 'waiting for readers before dropping'
                        END as phase,
         S.param4 AS lockers_total,
         S.param5 AS lockers_done,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 635679cc1f2..f583239e091 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -182,6 +182,7 @@ CheckIndexCompatible(Oid oldId,
 					 bool isWithoutOverlaps)
 {
 	bool		isconstraint;
+	bool		isauxiliary;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
 	Oid		   *opclassIds;
@@ -232,6 +233,7 @@ CheckIndexCompatible(Oid oldId,
 
 	amcanorder = amRoutine->amcanorder;
 	amsummarizing = amRoutine->amsummarizing;
+	isauxiliary = accessMethodId == STIR_AM_OID;
 
 	/*
 	 * Compute the operator classes, collations, and exclusion operators for
@@ -243,7 +245,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing,
+							  isWithoutOverlaps, isauxiliary);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -556,6 +559,7 @@ DefineIndex(ParseState *pstate,
 {
 	bool		concurrent;
 	char	   *indexRelationName;
+	char	   *auxIndexRelationName = NULL;
 	char	   *accessMethodName;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
@@ -565,6 +569,7 @@ DefineIndex(ParseState *pstate,
 	Oid			namespaceId;
 	Oid			tablespaceId;
 	Oid			createdConstraintId = InvalidOid;
+	Oid			auxIndexRelationId = InvalidOid;
 	List	   *indexColNames;
 	List	   *allIndexParams;
 	Relation	rel;
@@ -586,6 +591,7 @@ DefineIndex(ParseState *pstate,
 	int			numberOfKeyAttributes;
 	TransactionId limitXmin;
 	ObjectAddress address;
+	ObjectAddress auxAddress;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
@@ -836,6 +842,15 @@ DefineIndex(ParseState *pstate,
 											stmt->excludeOpNames,
 											stmt->primary,
 											stmt->isconstraint);
+	/*
+	 * Select name for auxiliary index
+	 */
+	if (concurrent)
+		auxIndexRelationName = ChooseRelationName(indexRelationName,
+												  NULL,
+												  "ccaux",
+												  namespaceId,
+												  false);
 
 	/*
 	 * look up the access method, verify it can handle the requested features
@@ -930,7 +945,8 @@ DefineIndex(ParseState *pstate,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  false);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -1600,6 +1616,16 @@ DefineIndex(ParseState *pstate,
 		return address;
 	}
 
+	/*
+	 * In case of concurrent build - create auxiliary index record.
+	 */
+	if (concurrent)
+	{
+		auxIndexRelationId = index_concurrently_create_aux(rel, indexRelationId,
+											tablespaceId, auxIndexRelationName);
+		ObjectAddressSet(auxAddress, RelationRelationId, auxIndexRelationId);
+	}
+
 	AtEOXact_GUC(false, root_save_nestlevel);
 	SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 
@@ -1628,11 +1654,11 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * For a concurrent build, it's important to make the catalog entries
 	 * visible to other transactions before we start to build the index. That
-	 * will prevent them from making incompatible HOT updates.  The new index
-	 * will be marked not indisready and not indisvalid, so that no one else
-	 * tries to either insert into it or use it for queries.
+	 * will prevent them from making incompatible HOT updates. New indexes
+	 * (main and auxiliary) will be marked not indisready and not indisvalid,
+	 * so that no one else tries to either insert into it or use it for queries.
 	 *
-	 * We must commit our current transaction so that the index becomes
+	 * We must commit our current transaction so that the indexes becomes
 	 * visible; then start another.  Note that all the data structures we just
 	 * built are lost in the commit.  The only data we keep past here are the
 	 * relation IDs.
@@ -1642,7 +1668,7 @@ DefineIndex(ParseState *pstate,
 	 * cannot block, even if someone else is waiting for access, because we
 	 * already have the same lock within our transaction.
 	 *
-	 * Note: we don't currently bother with a session lock on the index,
+	 * Note: we don't currently bother with a session lock on the indexes,
 	 * because there are no operations that could change its state while we
 	 * hold lock on the parent table.  This might need to change later.
 	 */
@@ -1681,7 +1707,7 @@ DefineIndex(ParseState *pstate,
 	 * with the old list of indexes.  Use ShareLock to consider running
 	 * transactions that hold locks that permit writing to the table.  Note we
 	 * do not need to worry about xacts that open the table for writing after
-	 * this point; they will see the new index when they open it.
+	 * this point; they will see the new indexes when they open it.
 	 *
 	 * Note: the reason we use actual lock acquisition here, rather than just
 	 * checking the ProcArray and sleeping, is that deadlock is possible if
@@ -1693,14 +1719,44 @@ DefineIndex(ParseState *pstate,
 
 	/*
 	 * At this moment we are sure that there are no transactions with the
-	 * table open for write that don't have this new index in their list of
+	 * table open for write that don't have this new indexes in their list of
 	 * indexes.  We have waited out all the existing transactions and any new
-	 * transaction will have the new index in its list, but the index is still
-	 * marked as "not-ready-for-inserts".  The index is consulted while
+	 * transaction will have both new indexes in its list, but indexes are still
+	 * marked as "not-ready-for-inserts". The indexes are consulted while
 	 * deciding HOT-safety though.  This arrangement ensures that no new HOT
 	 * chains can be created where the new tuple and the old tuple in the
 	 * chain have different index keys.
 	 *
+	 * Now call build on auxiliary index. Index will be created empty without
+	 * any actual heap scan, but marked as "ready-for-inserts". The goal of
+	 * that index is accumulate new tuples while main index is actually built.
+	 */
+
+	/* Set ActiveSnapshot since functions in the indexes may need it */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	index_concurrently_build(tableId, auxIndexRelationId);
+	/* we can do away with our snapshot */
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Now we need to ensure are no transactions with the with auxiliary index
+	 * marked as "not-ready-for-inserts".
+	 */
+	WaitForLockers(heaplocktag, ShareLock, true);
+
+	/*
+	 * At this moment we are sure that all new tuples in table are inserted into
+	 * the auxiliary index. Now it is time to build the target index itself.
+	 *
 	 * We now take a new snapshot, and build the index using all tuples that
 	 * are visible in this snapshot.  We can be sure that any HOT updates to
 	 * these tuples will be compatible with the index, since any updates made
@@ -1735,9 +1791,28 @@ DefineIndex(ParseState *pstate,
 	 * the index marked as read-only for updates.
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForLockers(heaplocktag, ShareLock, true);
 
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/*
+	 * Now target index is marked as "ready" for all transactions. So, auxiliary
+	 * index is no longer needed. So, start removing process by reverting "ready"
+	 * flag.
+	 */
+	index_set_state_flags(auxIndexRelationId, INDEX_DROP_CLEAR_READY);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * Now take the "reference snapshot" that will be used by validate_index()
 	 * to filter candidate tuples.  Beware!  There might still be snapshots in
@@ -1755,24 +1830,14 @@ DefineIndex(ParseState *pstate,
 	 */
 	snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	PushActiveSnapshot(snapshot);
-
 	/*
-	 * Scan the index and the heap, insert any missing index entries.
-	 */
-	validate_index(tableId, indexRelationId, snapshot);
-
-	/*
-	 * Drop the reference snapshot.  We must do this before waiting out other
-	 * snapshot holders, else we will deadlock against other processes also
-	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
-	 * they must wait for.  But first, save the snapshot's xmin to use as
-	 * limitXmin for GetCurrentVirtualXIDs().
+	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
+	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
 	limitXmin = snapshot->xmin;
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
-
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1799,7 +1864,7 @@ DefineIndex(ParseState *pstate,
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForOlderSnapshots(limitXmin, true);
 
 	/*
@@ -1824,6 +1889,53 @@ DefineIndex(ParseState *pstate,
 	 * to replan; so relcache flush on the index itself was sufficient.)
 	 */
 	CacheInvalidateRelcacheByRelid(heaprelid.relId);
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+	/* Now wait for all transaction to see auxiliary as "non-ready for inserts" */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/* Now it is time to mark auxiliary index as dead */
+	index_concurrently_set_dead(tableId, auxIndexRelationId);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_6);
+	/* Now wait for all transaction to ignore auxiliary because it is dead */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Drop auxiliary index.
+	 *
+	 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
+	 * right lock level.
+	 */
+	performDeletion(&auxAddress, DROP_RESTRICT,
+							 PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
 
 	/*
 	 * Last thing to do is release the session-level lock on the parent table.
@@ -3595,6 +3707,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	typedef struct ReindexIndexInfo
 	{
 		Oid			indexId;
+		Oid			auxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -3700,8 +3813,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
 														   ShareUpdateExclusiveLock);
+					IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-					if (!indexRelation->rd_index->indisvalid)
+
+					if (indexInfo->ii_Auxiliary)
+						ereport(WARNING,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+									get_namespace_name(get_rel_namespace(cellOid)),
+									get_rel_name(cellOid))));
+					else if (!indexRelation->rd_index->indisvalid)
 						ereport(WARNING,
 								(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 								 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3753,8 +3873,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
 															   ShareUpdateExclusiveLock);
+						IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-						if (!indexRelation->rd_index->indisvalid)
+						if (indexInfo->ii_Auxiliary)
+							ereport(WARNING,
+									(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+									 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+											get_namespace_name(get_rel_namespace(cellOid)),
+											get_rel_name(cellOid))));
+						else if (!indexRelation->rd_index->indisvalid)
 							ereport(WARNING,
 									(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 									 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3815,6 +3942,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot reindex invalid index on TOAST table")));
 
+				/* Auxiliary indexes are not allowed to be rebuilt */
+				if (get_rel_relam(relationOid) == STIR_AM_OID)
+					ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("reindex of auxiliary index \"%s\" not supported",
+								get_rel_name(relationOid))));
+
 				/*
 				 * Check if parent relation can be locked and if it exists,
 				 * this needs to be done at this stage as the list of indexes
@@ -3918,15 +4052,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
+		char	   *auxConcurrentName;
 		ReindexIndexInfo *idx = lfirst(lc);
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
+		Oid			auxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
 		int			save_sec_context;
 		int			save_nestlevel;
 		Relation	newIndexRel;
+		Relation	auxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -3977,6 +4114,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 											"ccnew",
 											get_rel_namespace(indexRel->rd_index->indrelid),
 											false);
+		auxConcurrentName = ChooseRelationName(get_rel_name(idx->indexId),
+											NULL,
+											"ccaux",
+											get_rel_namespace(indexRel->rd_index->indrelid),
+											false);
 
 		/* Choose the new tablespace, indexes of toast tables are not moved */
 		if (OidIsValid(params->tablespaceOid) &&
@@ -3990,12 +4132,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 													idx->indexId,
 													tablespaceid,
 													concurrentName);
+		auxIndexId = index_concurrently_create_aux(heapRel,
+												   newIndexId,
+												   tablespaceid,
+												   auxConcurrentName);
 
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
+		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4004,6 +4151,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
+		newidx->auxIndexId = auxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4022,10 +4170,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		lockrelid = palloc_object(LockRelId);
+		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
+		relationLocks = lappend(relationLocks, lockrelid);
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
 		/* Roll back any GUC changes executed by index functions */
@@ -4106,13 +4258,60 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * doing that, wait until no running transactions could have the table of
 	 * the index open with the old list of indexes.  See "phase 2" in
 	 * DefineIndex() for more details.
+	*/
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * Now build all auxiliary indexes and mark them as "ready-for-inserts".
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		StartTransactionCommand();
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/* Set ActiveSnapshot since functions in the indexes may need it */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Build auxiliary index, it is fast - without any actual heap scan, just an empty index. */
+		index_concurrently_build(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	StartTransactionCommand();
+
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Wait until all auxiliary indexes are taken into account by all
+	 * transactions.
+	 */
 	WaitForLockersMultiple(lockTags, ShareLock, true);
 	CommitTransactionCommand();
 
+	/* Now it is time to perform target index build. */
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4159,6 +4358,41 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * need to set the PROC_IN_SAFE_IC flag here.
 	 */
 
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * At this moment all target indexes are marked as "ready-to-insert". So,
+	 * we are free to start process of dropping auxiliary indexes.
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+		StartTransactionCommand();
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		PopActiveSnapshot();
+
+		CommitTransactionCommand();
+	}
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -4166,12 +4400,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * were created during the previous phase.  See "phase 3" in DefineIndex()
 	 * for more details.
 	 */
-
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
-	WaitForLockersMultiple(lockTags, ShareLock, true);
-	CommitTransactionCommand();
-
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4209,7 +4437,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -4238,7 +4466,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * there's no need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-									 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+									 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 		WaitForOlderSnapshots(limitXmin, true);
 
 		CommitTransactionCommand();
@@ -4329,14 +4557,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
-	 * Mark the old indexes as dead.  First we must wait until no running
-	 * transaction could be using the index for a query.  See also
+	 * Mark the old and auxiliary indexes as dead. First we must wait until no
+	 * running transaction could be using the index for a query.  See also
 	 * index_drop() for more details.
 	 */
 
 	INJECTION_POINT("reindex-relation-concurrently-before-set-dead", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	foreach(lc, indexIds)
@@ -4361,6 +4589,28 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PopActiveSnapshot();
 	}
 
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+	}
+
 	/* Commit this transaction to make the updates visible. */
 	CommitTransactionCommand();
 	StartTransactionCommand();
@@ -4374,11 +4624,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
-	 * Drop the old indexes.
+	 * Drop the old and auxiliary indexes.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_6);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -4398,6 +4648,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			add_exact_object_address(&object, objects);
 		}
 
+		foreach(lc, newIndexIds)
+		{
+			ReindexIndexInfo *idx = lfirst(lc);
+			ObjectAddress object;
+
+			object.classId = RelationRelationId;
+			object.objectId = idx->auxIndexId;
+			object.objectSubId = 0;
+
+			add_exact_object_address(&object, objects);
+		}
+
 		/*
 		 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
 		 * right lock level.
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 09ae445694d..8cb2231a7a8 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, bool auxiliary)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -850,6 +850,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Concurrent = concurrent;
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
+	n->ii_Auxiliary = auxiliary;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
@@ -875,7 +876,6 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
-	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..1a997537800 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -705,7 +705,8 @@ typedef struct TableAmRoutine
 										Relation index_rel,
 										IndexInfo *index_info,
 										Snapshot snapshot,
-										ValidateIndexState *state);
+										ValidateIndexState *state,
+										ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1824,19 +1825,24 @@ table_index_build_range_scan(Relation table_rel,
  * table_index_validate_scan - second table scan for concurrent index build
  *
  * See validate_index() for an explanation.
+ *
+ * Note: it is responsibility of that function to close sortstates in
+ * both `state` and `auxstate`.
  */
 static inline void
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
 						  Snapshot snapshot,
-						  ValidateIndexState *state)
+						  ValidateIndexState *state,
+						  ValidateIndexState *auxstate)
 {
 	table_rel->rd_tableam->index_validate_scan(table_rel,
 											   index_rel,
 											   index_info,
 											   snapshot,
-											   state);
+											   state,
+											   auxstate);
 }
 
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index b259c4141ed..37a390d33de 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -25,6 +25,7 @@ typedef enum
 {
 	INDEX_CREATE_SET_READY,
 	INDEX_CREATE_SET_VALID,
+	INDEX_DROP_CLEAR_READY,
 	INDEX_DROP_CLEAR_VALID,
 	INDEX_DROP_SET_DEAD,
 } IndexStateFlagsAction;
@@ -65,6 +66,7 @@ extern void index_check_primary_key(Relation heapRel,
 #define	INDEX_CREATE_IF_NOT_EXISTS			(1 << 4)
 #define	INDEX_CREATE_PARTITIONED			(1 << 5)
 #define INDEX_CREATE_INVALID				(1 << 6)
+#define INDEX_CREATE_AUXILIARY				(1 << 7)
 
 extern Oid	index_create(Relation heapRelation,
 						 const char *indexRelationName,
@@ -100,6 +102,11 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid tablespaceOid,
 										   const char *newName);
 
+extern Oid	index_concurrently_create_aux(Relation heapRelation,
+										  Oid mainIndexId,
+										  Oid tablespaceOid,
+										  const char *newName);
+
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
 
@@ -145,7 +152,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
+extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 359221dc296..841491a8511 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -111,14 +111,15 @@
 
 /* Phases of CREATE INDEX (as advertised via PROGRESS_CREATEIDX_PHASE) */
 #define PROGRESS_CREATEIDX_PHASE_WAIT_1			1
-#define PROGRESS_CREATEIDX_PHASE_BUILD			2
-#define PROGRESS_CREATEIDX_PHASE_WAIT_2			3
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	4
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		5
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN	6
-#define PROGRESS_CREATEIDX_PHASE_WAIT_3			7
+#define PROGRESS_CREATEIDX_PHASE_WAIT_2			2
+#define PROGRESS_CREATEIDX_PHASE_BUILD			3
+#define PROGRESS_CREATEIDX_PHASE_WAIT_3			4
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	5
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		6
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE	7
 #define PROGRESS_CREATEIDX_PHASE_WAIT_4			8
 #define PROGRESS_CREATEIDX_PHASE_WAIT_5			9
+#define PROGRESS_CREATEIDX_PHASE_WAIT_6			10
 
 /*
  * Subphases of CREATE INDEX, for index_build.
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 982ec25ae14..6cf45a68cbe 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,8 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								bool auxiliary);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 55538c4c41e..d1723f47e89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1437,6 +1437,7 @@ DETAIL:  Key (f1)=(b) already exists.
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
 ERROR:  could not create unique index "concur_index3"
 DETAIL:  Key (f2)=(b) is duplicated.
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -3211,6 +3212,7 @@ INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
 ERROR:  could not create unique index "concur_reindex_ind5"
 DETAIL:  Key (c1)=(1) is duplicated.
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
@@ -3223,8 +3225,10 @@ DETAIL:  Key (c1)=(1) is duplicated.
  c1     | integer |           |          | 
 Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1) INVALID
+    "concur_reindex_ind5_ccaux" stir (c1) INVALID
     "concur_reindex_ind5_ccnew" UNIQUE, btree (c1) INVALID
 
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -3252,6 +3256,44 @@ Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1)
 
 DROP TABLE concur_reindex_tab4;
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+ERROR:  relation "concur_reindex_tab4" does not exist
+LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+                    ^
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
+HINT:  Use DROP INDEX or REINDEX INDEX.
+WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index dc629928c8f..9b06ddc87a2 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1585,10 +1585,11 @@ select indexrelid::regclass, indisvalid,
 --------------------------------+------------+-----------------------+-------------------------------
  parted_isvalid_idx             | f          | parted_isvalid_tab    | 
  parted_isvalid_idx_11          | f          | parted_isvalid_tab_11 | parted_isvalid_tab_1_expr_idx
+ parted_isvalid_idx_11_ccaux    | f          | parted_isvalid_tab_11 | 
  parted_isvalid_tab_12_expr_idx | t          | parted_isvalid_tab_12 | parted_isvalid_tab_1_expr_idx
  parted_isvalid_tab_1_expr_idx  | f          | parted_isvalid_tab_1  | parted_isvalid_idx
  parted_isvalid_tab_2_expr_idx  | t          | parted_isvalid_tab_2  | parted_isvalid_idx
-(5 rows)
+(6 rows)
 
 drop table parted_isvalid_tab;
 -- Check state of replica indexes when attaching a partition.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index deb6e2ad6a9..11cb06cfb72 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2068,14 +2068,15 @@ pg_stat_progress_create_index| SELECT s.pid,
         CASE s.param10
             WHEN 0 THEN 'initializing'::text
             WHEN 1 THEN 'waiting for writers before build'::text
-            WHEN 2 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
-            WHEN 3 THEN 'waiting for writers before validation'::text
-            WHEN 4 THEN 'index validation: scanning index'::text
-            WHEN 5 THEN 'index validation: sorting tuples'::text
-            WHEN 6 THEN 'index validation: scanning table'::text
-            WHEN 7 THEN 'waiting for old snapshots'::text
-            WHEN 8 THEN 'waiting for readers before marking dead'::text
-            WHEN 9 THEN 'waiting for readers before dropping'::text
+            WHEN 2 THEN 'waiting for writers to use auxiliary index'::text
+            WHEN 3 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
+            WHEN 4 THEN 'waiting for writers before validation'::text
+            WHEN 5 THEN 'index validation: scanning index'::text
+            WHEN 6 THEN 'index validation: sorting tuples'::text
+            WHEN 7 THEN 'index validation: merging indexes'::text
+            WHEN 8 THEN 'waiting for old snapshots'::text
+            WHEN 9 THEN 'waiting for readers before marking dead'::text
+            WHEN 10 THEN 'waiting for readers before dropping'::text
             ELSE NULL::text
         END AS phase,
     s.param4 AS lockers_total,
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 82e4062a215..c2c1b031527 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -503,6 +503,7 @@ CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS concur_index2 ON concur_heap(f1);
 INSERT INTO concur_heap VALUES ('b','x');
 -- check if constraint is enforced properly at build time
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -1315,10 +1316,12 @@ CREATE TABLE concur_reindex_tab4 (c1 int);
 INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 -- This trick creates an invalid index.
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -1330,6 +1333,24 @@ REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
 DROP TABLE concur_reindex_tab4;
 
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+\d aux_index_tab5
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+DROP TABLE aux_index_tab5;
+
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
-- 
2.43.0



  [application/x-patch] v30-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch (30.9K, 5-v30-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch)
  download | inline diff:
From 8ff96b8f44ceccf429e3e635a806311b14fc26d7 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 14:36:31 +0100
Subject: [PATCH v30 5/7] Track and drop auxiliary indexes in DROP/REINDEX

During concurrent index operations, auxiliary indexes may be left as orphaned objects when errors occur (junk auxiliary indexes).

This patch improves the handling of such auxiliary indexes:
- add auxiliaryForIndexId parameter to index_create() to track dependencies between main and auxiliary indexes
- automatically drop auxiliary indexes when the main index is dropped
- delete junk auxiliary indexes properly during REINDEX operations
---
 doc/src/sgml/ref/create_index.sgml         |  14 ++-
 doc/src/sgml/ref/reindex.sgml              |   8 +-
 src/backend/catalog/dependency.c           |   2 +-
 src/backend/catalog/index.c                |  71 ++++++++++----
 src/backend/catalog/pg_depend.c            |  58 ++++++++++++
 src/backend/catalog/toasting.c             |   1 +
 src/backend/commands/indexcmds.c           |  37 +++++++-
 src/backend/commands/tablecmds.c           |  52 +++++++++-
 src/backend/nodes/makefuncs.c              |   3 +-
 src/include/catalog/dependency.h           |   1 +
 src/include/nodes/execnodes.h              |   2 +
 src/include/nodes/makefuncs.h              |   2 +-
 src/test/regress/expected/create_index.out | 105 +++++++++++++++++++--
 src/test/regress/sql/create_index.sql      |  57 ++++++++++-
 14 files changed, 371 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 12c88587a79..7f751453317 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -668,10 +668,16 @@ Indexes:
     "idx_ccaux" stir (col) INVALID
 </programlisting>
 
-    The recommended recovery
-    method in such cases is to drop these indexes and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
-    to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
+    The recommended recovery method in such cases is to drop the index with
+    <command>DROP INDEX</command>. The auxiliary index (suffixed with
+    <literal>_ccaux</literal>) will be automatically dropped when the main
+    index is dropped. After dropping the indexes, you can try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is to
+    rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>,
+    which will also handle cleanup of any invalid auxiliary indexes.)
+    If the only invalid index is one suffixed <literal>_ccaux</literal>,
+    recommended recovery method is just <literal>DROP INDEX</literal>
+    for that index.
    </para>
 
    <para>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 1c3c7a97f6a..384c5fc8b3f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -476,11 +476,15 @@ Indexes:
     <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
     recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
-    then attempt <command>REINDEX CONCURRENTLY</command> again.
+    then attempt <command>REINDEX CONCURRENTLY</command> again. The auxiliary index
+    (suffixed with <literal>_ccaux</literal>) will be automatically dropped
+    along with its main index.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
     the recommended recovery method is to just drop said index, since the
-    rebuild proper has been successful.
+    rebuild proper has been successful. If the only
+    invalid index is one suffixed <literal>_ccaux</literal>, recommended
+    recovery method is just <literal>DROP INDEX</literal> for that index.
     A nonzero number may be appended to the suffix of the invalid index
     names to keep them unique, like <literal>_ccnew1</literal>,
     <literal>_ccold2</literal>, etc.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 09575278de3..74f8335888b 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -287,7 +287,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.  (Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object, 0);
+	AcquireDeletionLock(object, flags);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4f77627fb3b..91125d37150 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -773,6 +773,8 @@ index_create(Relation heapRelation,
 		   ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
 	/* partitioned indexes must never be "built" by themselves */
 	Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
+	/* ii_AuxiliaryForIndexId and INDEX_CREATE_AUXILIARY are required both or neither */
+	Assert(OidIsValid(indexInfo->ii_AuxiliaryForIndexId) == auxiliary);
 
 	relkind = partitioned ? RELKIND_PARTITIONED_INDEX : RELKIND_INDEX;
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -1178,6 +1180,15 @@ index_create(Relation heapRelation,
 			recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_SEC);
 		}
 
+		/*
+		 * Record dependency on the main index in case of auxiliary index.
+		 */
+		if (OidIsValid(indexInfo->ii_AuxiliaryForIndexId))
+		{
+			ObjectAddressSet(referenced, RelationRelationId, indexInfo->ii_AuxiliaryForIndexId);
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+		}
+
 		/* placeholder for normal dependencies */
 		addrs = new_object_addresses();
 
@@ -1410,7 +1421,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							true,
 							indexRelation->rd_indam->amsummarizing,
 							oldInfo->ii_WithoutOverlaps,
-							false);
+							false,
+							InvalidOid);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1578,7 +1590,8 @@ index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
 							true,
 							false,	/* aux are not summarizing */
 							false,	/* aux are not without overlaps */
-							true	/* auxiliary */);
+							true	/* auxiliary */,
+							mainIndexId /* auxiliaryForIndexId */);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -2614,7 +2627,8 @@ BuildIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid /* auxiliary_for_index_id is set only during build */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2675,7 +2689,8 @@ BuildDummyIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3840,6 +3855,7 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 				heapRelation;
 	Oid			heapId;
 	Oid			save_userid;
+	Oid			junkAuxIndexId;
 	int			save_sec_context;
 	int			save_nestlevel;
 	IndexInfo  *indexInfo;
@@ -3896,6 +3912,19 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
 	}
 
+	/* Check for the auxiliary index for that index, it needs to be dropped */
+	junkAuxIndexId = get_auxiliary_index(indexId);
+	if (OidIsValid(junkAuxIndexId))
+	{
+		ObjectAddress object;
+		object.classId = RelationRelationId;
+		object.objectId = junkAuxIndexId;
+		object.objectSubId = 0;
+		performDeletion(&object, DROP_RESTRICT,
+								 PERFORM_DELETION_INTERNAL |
+								 PERFORM_DELETION_QUIETLY);
+	}
+
 	/*
 	 * Open the target index relation and get an exclusive lock on it, to
 	 * ensure that no one else is touching this particular index.
@@ -4184,7 +4213,8 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 {
 	Relation	rel;
 	Oid			toast_relid;
-	List	   *indexIds;
+	List	   *indexIds,
+			   *auxIndexIds = NIL;
 	char		persistence;
 	bool		result = false;
 	ListCell   *indexId;
@@ -4273,13 +4303,30 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	else
 		persistence = rel->rd_rel->relpersistence;
 
+	foreach(indexId, indexIds)
+	{
+		Oid			indexOid = lfirst_oid(indexId);
+		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* All STIR indexes are auxiliary indexes */
+		if (indexAm == STIR_AM_OID)
+		{
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			auxIndexIds = lappend_oid(auxIndexIds, indexOid);
+		}
+	}
+
 	/* Reindex all the indexes. */
 	i = 1;
 	foreach(indexId, indexIds)
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
-		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* Auxiliary indexes are going to be dropped during main index rebuild */
+		if (list_member_oid(auxIndexIds, indexOid))
+			continue;
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4305,18 +4352,6 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
-		if (indexAm == STIR_AM_OID)
-		{
-			ereport(WARNING,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
-							get_namespace_name(indexNamespaceId),
-							get_rel_name(indexOid))));
-			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
-				RemoveReindexPending(indexOid);
-			continue;
-		}
-
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 07c2d41c189..7e0e29bdb5b 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -20,6 +20,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am_d.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_depend.h"
 #include "catalog/pg_extension.h"
@@ -1108,6 +1109,63 @@ get_index_constraint(Oid indexId)
 	return constraintId;
 }
 
+/*
+ * get_auxiliary_index
+ *		Given the OID of an index, return the OID of its auxiliary
+ *		index, or InvalidOid if there is no auxiliary index.
+ */
+Oid
+get_auxiliary_index(Oid indexId)
+{
+	Oid			auxiliaryIndexOid = InvalidOid;
+	Relation	depRel;
+	ScanKeyData key[3];
+	SysScanDesc scan;
+	HeapTuple	tup;
+
+	/* Search the dependency table for the index */
+	depRel = table_open(DependRelationId, AccessShareLock);
+
+	ScanKeyInit(&key[0],
+				Anum_pg_depend_refclassid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationRelationId));
+	ScanKeyInit(&key[1],
+				Anum_pg_depend_refobjid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indexId));
+	ScanKeyInit(&key[2],
+				Anum_pg_depend_refobjsubid,
+				BTEqualStrategyNumber, F_INT4EQ,
+				Int32GetDatum(0));
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+							  NULL, 3, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend deprec = (Form_pg_depend) GETSTRUCT(tup);
+
+		/*
+		 * We assume AUTO dependency on index with rel_kind
+		 * of RELKIND_INDEX and AM eq STIR is that we are looking for.
+		 */
+		if (deprec->classid == RelationRelationId &&
+			(deprec->deptype == DEPENDENCY_AUTO) &&
+			get_rel_relkind(deprec->objid) == RELKIND_INDEX &&
+			get_rel_relam(deprec->objid) == STIR_AM_OID)
+		{
+			auxiliaryIndexOid = deprec->objid;
+			break;
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(depRel, AccessShareLock);
+
+	return auxiliaryIndexOid;
+}
+
 /*
  * get_index_ref_constraints
  *		Given the OID of an index, return the OID of all foreign key
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 87e01e74ad7..c511563f3ff 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -308,6 +308,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
 	indexInfo->ii_Auxiliary = false;
+	indexInfo->ii_AuxiliaryForIndexId = InvalidOid;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f583239e091..599e3375833 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -246,7 +246,7 @@ CheckIndexCompatible(Oid oldId,
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
 							  false, false, amsummarizing,
-							  isWithoutOverlaps, isauxiliary);
+							  isWithoutOverlaps, isauxiliary, InvalidOid);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -946,7 +946,8 @@ DefineIndex(ParseState *pstate,
 							  concurrent,
 							  amissummarizing,
 							  stmt->iswithoutoverlaps,
-							  false);
+							  false,
+							  InvalidOid);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -3708,6 +3709,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		Oid			indexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -4057,6 +4059,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
@@ -4064,6 +4067,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		int			save_nestlevel;
 		Relation	newIndexRel;
 		Relation	auxIndexRel;
+		Relation	junkAuxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -4137,12 +4141,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 												   tablespaceid,
 												   auxConcurrentName);
 
+		/* Search for auxiliary index for reindexed index, to drop it */
+		junkAuxIndexId = get_auxiliary_index(idx->indexId);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
+		if (OidIsValid(junkAuxIndexId))
+			junkAuxIndexRel = index_open(junkAuxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4152,6 +4161,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
 		newidx->auxIndexId = auxIndexId;
+		newidx->junkAuxIndexId = junkAuxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4173,10 +4183,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		if (OidIsValid(junkAuxIndexId))
+		{
+			lockrelid = palloc_object(LockRelId);
+			*lockrelid = junkAuxIndexRel->rd_lockInfo.lockRelId;
+			relationLocks = lappend(relationLocks, lockrelid);
+		}
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		if (OidIsValid(junkAuxIndexId))
+			index_close(junkAuxIndexRel, NoLock);
 		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
@@ -4365,7 +4383,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 	/*
 	 * At this moment all target indexes are marked as "ready-to-insert". So,
-	 * we are free to start process of dropping auxiliary indexes.
+	 * we are free to start process of dropping auxiliary indexes - including
+	 * junk indexes detected earlier.
 	 */
 	foreach(lc, newIndexIds)
 	{
@@ -4388,6 +4407,9 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 */
 		PushActiveSnapshot(GetTransactionSnapshot());
 		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		/* Ensure the junk index is marked as non-ready */
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_set_state_flags(newidx->junkAuxIndexId, INDEX_DROP_CLEAR_READY);
 		PopActiveSnapshot();
 
 		CommitTransactionCommand();
@@ -4607,6 +4629,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_concurrently_set_dead(newidx->tableId, newidx->junkAuxIndexId);
 
 		PopActiveSnapshot();
 	}
@@ -4658,6 +4682,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			object.objectSubId = 0;
 
 			add_exact_object_address(&object, objects);
+
+			if (OidIsValid(idx->junkAuxIndexId))
+			{
+				object.objectId = idx->junkAuxIndexId;
+				object.objectSubId = 0;
+				add_exact_object_address(&object, objects);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 85242dcc245..1622dfb05ca 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1548,6 +1548,8 @@ RemoveRelations(DropStmt *drop)
 	ListCell   *cell;
 	int			flags = 0;
 	LOCKMODE	lockmode = AccessExclusiveLock;
+	MemoryContext private_context,
+				  oldcontext;
 
 	/* DROP CONCURRENTLY uses a weaker lock, and has some restrictions */
 	if (drop->concurrent)
@@ -1608,9 +1610,20 @@ RemoveRelations(DropStmt *drop)
 			relkind = 0;		/* keep compiler quiet */
 			break;
 	}
+	/*
+	 * Create a memory context that will survive forced transaction commits we
+	 * may need to do below (in case of concurrent index drop).
+	 * Since it is a child of PortalContext, it will go away eventually even if
+	 * we suffer an error; there's no need for special abort cleanup logic.
+	 */
+	private_context = AllocSetContextCreate(PortalContext,
+											"RemoveRelations",
+											ALLOCSET_SMALL_SIZES);
 
+	oldcontext = MemoryContextSwitchTo(private_context);
 	/* Lock and validate each relation; build a list of object addresses */
 	objects = new_object_addresses();
+	MemoryContextSwitchTo(oldcontext);
 
 	foreach(cell, drop->objects)
 	{
@@ -1662,6 +1675,38 @@ RemoveRelations(DropStmt *drop)
 			flags |= PERFORM_DELETION_CONCURRENTLY;
 		}
 
+		/*
+		 * Concurrent index drop requires it to be the first transaction. But in
+		 * case we have junk auxiliary index - we want to drop it too (and also
+		 * in a concurrent way). In this case perform silent internal deletion
+		 * of auxiliary index, and restore transaction state. It is fine to do it
+		 * in the loop because there is only single element in drop->objects.
+		 */
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
+			state.actual_relkind == RELKIND_INDEX)
+		{
+			Oid junkAuxIndexOid = get_auxiliary_index(relOid);
+			if (OidIsValid(junkAuxIndexOid))
+			{
+				ObjectAddress object;
+				object.classId = RelationRelationId;
+				object.objectId = junkAuxIndexOid;
+				object.objectSubId = 0;
+				performDeletion(&object, DROP_RESTRICT,
+										 PERFORM_DELETION_CONCURRENTLY |
+										 PERFORM_DELETION_INTERNAL |
+										 PERFORM_DELETION_QUIETLY);
+				CommitTransactionCommand();
+				MemoryContextDelete(private_context);
+
+				/* And start again - now without auxiliary index. */
+				StartTransactionCommand();
+				PushActiveSnapshot(GetTransactionSnapshot());
+				RemoveRelations(drop);
+				return;
+			}
+		}
+
 		/*
 		 * Concurrent index drop cannot be used with partitioned indexes,
 		 * either.
@@ -1690,12 +1735,17 @@ RemoveRelations(DropStmt *drop)
 		obj.objectId = relOid;
 		obj.objectSubId = 0;
 
+		oldcontext = MemoryContextSwitchTo(private_context);
 		add_exact_object_address(&obj, objects);
+		MemoryContextSwitchTo(oldcontext);
 	}
 
+	/* Deletion may involve multiple commits, so, switch to memory context */
+	oldcontext = MemoryContextSwitchTo(private_context);
 	performMultipleDeletions(objects, drop->behavior, flags);
+	MemoryContextSwitchTo(oldcontext);
 
-	free_object_addresses(objects);
+	MemoryContextDelete(private_context);
 }
 
 /*
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 8cb2231a7a8..ccc1294e730 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps, bool auxiliary)
+			  bool withoutoverlaps, bool auxiliary, Oid auxiliary_for_index_id)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -851,6 +851,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
 	n->ii_Auxiliary = auxiliary;
+	n->ii_AuxiliaryForIndexId = auxiliary_for_index_id;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 2f3c1eae3c7..6ae210c584e 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -193,6 +193,7 @@ extern List *getOwnedSequences(Oid relid);
 extern Oid	getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok);
 
 extern Oid	get_index_constraint(Oid indexId);
+extern Oid	get_auxiliary_index(Oid indexId);
 
 extern List *get_index_ref_constraints(Oid indexId);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6ce9154b28d..e8236eede00 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -220,6 +220,8 @@ typedef struct IndexInfo
 	int			ii_ParallelWorkers;
 	/* is auxiliary for concurrent index build? */
 	bool		ii_Auxiliary;
+	/* if creating an auxiliary index, the OID of the main index; otherwise InvalidOid. */
+	Oid			ii_AuxiliaryForIndexId;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 6cf45a68cbe..92dff90c3de 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -100,7 +100,7 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
 								bool summarizing, bool withoutoverlaps,
-								bool auxiliary);
+								bool auxiliary, Oid auxiliary_for_index_id);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index d1723f47e89..2d6abb15a89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3279,20 +3279,109 @@ ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-ERROR:  relation "concur_reindex_tab4" does not exist
-LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-                    ^
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
-ERROR:  could not create unique index "aux_index_ind6"
-DETAIL:  Key (c1)=(1) is duplicated.
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
 WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
 HINT:  Use DROP INDEX or REINDEX INDEX.
 WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
 NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+-- Make sure it is still exists
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
+DROP INDEX aux_index_ind6;
+ERROR:  index "aux_index_ind6" does not exist
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index c2c1b031527..fd96d80abbc 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1344,11 +1344,62 @@ REINDEX INDEX aux_index_ind6_ccaux;
 -- Concurrently also
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
+-- Make sure it is still exists
+\d aux_index_tab5
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.43.0



  [application/x-patch] v30-0007-Refresh-snapshot-periodically-during-index-valid.patch (21.5K, 6-v30-0007-Refresh-snapshot-periodically-during-index-valid.patch)
  download | inline diff:
From 1f0083bc9b7a1b944dff98f425ef754262adaa25 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 21 Apr 2025 14:11:53 +0200
Subject: [PATCH v30 7/7] Refresh snapshot periodically during index validation

Enhances validation phase of concurrently built indexes by periodically refreshing snapshots rather than using a single reference snapshot. This addresses issues with xmin propagation during long-running validations.

The validation now takes a fresh snapshot every few pages, allowing the xmin horizon to advance. This restores feature of commit d9d076222f5b, which was reverted in commit e28bb8851969. New STIR-based approach does not depend on single reference snapshot anymore.
---
 src/backend/access/heap/README.HOT       |  4 +-
 src/backend/access/heap/heapam_handler.c | 65 +++++++++++++++++++++++-
 src/backend/access/spgist/spgvacuum.c    | 12 +++--
 src/backend/catalog/index.c              | 51 ++++++++++++++-----
 src/backend/commands/indexcmds.c         | 50 +++---------------
 src/include/access/tableam.h             | 25 ++++-----
 src/include/access/transam.h             | 15 ++++++
 src/include/catalog/index.h              |  2 +-
 8 files changed, 146 insertions(+), 78 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index b1c797517ee..382fe1723a5 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -401,12 +401,12 @@ live tuple.
 We mark the index open for inserts (but still not ready for reads) then
 we again wait for transactions which have the table open.  Then validate
 the index.  This searches for tuples missing from the index in auxiliary
-index, and inserts any missing ones if they are visible to reference snapshot.
+index, and inserts any missing ones if they are visible to a fresh snapshot.
 Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
-the second reference snapshot is finished.  This ensures that nobody is
+the latest used snapshot is finished.  This ensures that nobody is
 alive any longer who could need to see any tuples that might be missing
 from the index, as well as ensuring that no one can see any inconsistent
 rows in a broken HOT chain (the first condition is stronger than the
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 4ad8a2c0f81..0becbcc5b87 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2012,23 +2012,26 @@ heapam_index_validate_scan_read_stream_next(
 	return result;
 }
 
-static void
+static TransactionId
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
-						   Snapshot snapshot,
 						   ValidateIndexState *state,
 						   ValidateIndexState *auxState)
 {
+	TransactionId limitXmin;
+
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Snapshot		snapshot;
 	TupleTableSlot  *slot;
 	EState			*estate;
 	ExprContext		*econtext;
 	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	int				num_to_check;
+	BlockNumber		page_read_counter = 1; /* set to 1 to skip snapshot reset at start */
 	Tuplestorestate *tuples_for_check;
 	ValidateIndexScanState callback_private_data;
 
@@ -2039,6 +2042,8 @@ heapam_index_validate_scan(Relation heapRelation,
 	/* Use 10% of memory for tuple store. */
 	int		store_work_mem_part = maintenance_work_mem / 10;
 
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
@@ -2047,6 +2052,12 @@ heapam_index_validate_scan(Relation heapRelation,
 	 */
 	tuples_for_check =  tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
+	PopActiveSnapshot();
+	InvalidateCatalogSnapshot();
+
+	Assert(!HaveRegisteredOrActiveSnapshot());
+	Assert(!TransactionIdIsValid(MyProc->xmin));
+
 	/*
 	 * sanity checks
 	 */
@@ -2062,6 +2073,29 @@ heapam_index_validate_scan(Relation heapRelation,
 
 	state->tuplesort = auxState->tuplesort = NULL;
 
+	/*
+	 * Now take the first snapshot that will be used to filter candidate
+	 * tuples. We are going to replace it by newer snapshot every so often
+	 * to propagate horizon.
+	 *
+	 * Beware!  There might still be snapshots in use that treat some transaction
+	 * as in-progress that our temporary snapshot treats as committed.
+	 *
+	 * If such a recently-committed transaction deleted tuples in the table,
+	 * we will not include them in the index; yet those transactions which
+	 * see the deleting one as still-in-progress will expect such tuples to
+	 * be there once we mark the index as valid.
+	 *
+	 * We solve this by waiting for all endangered transactions to exit before
+	 * we mark the index as valid, for that reason limitXmin is supported.
+	 *
+	 * We also set ActiveSnapshot to this snap, since functions in indexes may
+	 * need a snapshot.
+	 */
+	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	PushActiveSnapshot(snapshot);
+	limitXmin = snapshot->xmin;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
@@ -2095,6 +2129,7 @@ heapam_index_validate_scan(Relation heapRelation,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		block_number = BufferGetBlockNumber(buf);
+		page_read_counter++;
 
 		i = 0;
 		while ((off = tuples[i]) != InvalidOffsetNumber)
@@ -2152,6 +2187,20 @@ heapam_index_validate_scan(Relation heapRelation,
 		}
 
 		ReleaseBuffer(buf);
+#define VALIDATE_INDEX_RESET_SNAPSHOT_EACH_N_PAGE 4096
+		if (page_read_counter % VALIDATE_INDEX_RESET_SNAPSHOT_EACH_N_PAGE == 0)
+		{
+			PopActiveSnapshot();
+			UnregisterSnapshot(snapshot);
+			/* to make sure we propagate xmin */
+			InvalidateCatalogSnapshot();
+			Assert(!TransactionIdIsValid(MyProc->xmin));
+
+			snapshot = RegisterSnapshot(GetLatestSnapshot());
+			PushActiveSnapshot(snapshot);
+			/* xmin should not go backwards, but just in case */
+			limitXmin = TransactionIdNewer(limitXmin, snapshot->xmin);
+		}
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -2161,11 +2210,23 @@ heapam_index_validate_scan(Relation heapRelation,
 	read_stream_end(read_stream);
 	tuplestore_end(tuples_for_check);
 
+	/*
+	 * Drop the latest snapshot.  We must do this before waiting out other
+	 * snapshot holders, else we will deadlock against other processes also
+	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
+	 * they must wait for.
+	 */
+	PopActiveSnapshot();
+	UnregisterSnapshot(snapshot);
+	InvalidateCatalogSnapshot();
+	Assert(MyProc->xmin == InvalidTransactionId);
 	FreeAccessStrategy(bstrategy);
 
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
+
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..7ea60c18e6f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -191,14 +191,16 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 			 * Add target TID to pending list if the redirection could have
 			 * happened since VACUUM started.  (If xid is invalid, assume it
 			 * must have happened before VACUUM started, since REINDEX
-			 * CONCURRENTLY locks out VACUUM.)
+			 * CONCURRENTLY locks out VACUUM, if myXmin is invalid it is
+			 * validation scan.)
 			 *
 			 * Note: we could make a tighter test by seeing if the xid is
 			 * "running" according to the active snapshot; but snapmgr.c
 			 * doesn't currently export a suitable API, and it's not entirely
 			 * clear that a tighter test is worth the cycles anyway.
 			 */
-			if (TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
+			if (!TransactionIdIsValid(bds->myXmin) ||
+					TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
 				spgAddPendingTID(bds, &dt->pointer);
 		}
 		else
@@ -808,7 +810,6 @@ spgvacuumscan(spgBulkDeleteState *bds)
 	/* Finish setting up spgBulkDeleteState */
 	initSpGistState(&bds->spgstate, index);
 	bds->pendingList = NULL;
-	bds->myXmin = GetActiveSnapshot()->xmin;
 	bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
 
 	/*
@@ -959,6 +960,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	bds.stats = stats;
 	bds.callback = callback;
 	bds.callback_state = callback_state;
+	if (info->validate_index)
+		bds.myXmin = InvalidTransactionId;
+	else
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 	spgvacuumscan(&bds);
 
@@ -999,6 +1004,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.stats = stats;
 		bds.callback = dummy_callback;
 		bds.callback_state = NULL;
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 		spgvacuumscan(&bds);
 	}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ed563da5a32..16958bc8e91 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -68,6 +68,7 @@
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -3511,8 +3512,9 @@ IndexCheckExclusion(Relation heapRelation,
  * insert their new tuples into it. At the same moment we clear "indisready" for
  * auxiliary index, since it is no more required to be updated.
  *
- * We then take a new reference snapshot, any tuples that are valid according
- * to this snap, but are not in the index, must be added to the index.
+ * We then take a new snapshot, any tuples that are valid according
+ * to this snap, but are not in the index, must be added to the index. In
+ * order to propagate xmin we reset that snapshot every so often.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
  * the snap need not be indexed, because we will wait out all transactions
@@ -3525,7 +3527,7 @@ IndexCheckExclusion(Relation heapRelation,
  * TIDs of both auxiliary and target indexes, and doing a "merge join" against
  * the TID lists to see which tuples from auxiliary index are missing from the
  * target index.  Thus we will ensure that all tuples valid according to the
- * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * latest snapshot are in the index. Notice we need to do bulkdelete in the
  * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
@@ -3546,13 +3548,14 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * Also, some actions to concurrent drop the auxiliary index are performed.
  */
-void
-validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
+TransactionId
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId)
 {
 	Relation	heapRelation,
 				indexRelation,
 				auxIndexRelation;
 	IndexInfo  *indexInfo;
+	TransactionId limitXmin;
 	IndexVacuumInfo ivinfo, auxivinfo;
 	ValidateIndexState state, auxState;
 	Oid			save_userid;
@@ -3602,8 +3605,12 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	 * Fetch info needed for index_insert.  (You might think this should be
 	 * passed in from DefineIndex, but its copy is long gone due to having
 	 * been built in a previous transaction.)
+	 *
+	 * We might need snapshot for index expressions or predicates.
 	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	indexInfo = BuildIndexInfo(indexRelation);
+	PopActiveSnapshot();
 
 	/* mark build is concurrent just for consistency */
 	indexInfo->ii_Concurrent = true;
@@ -3639,6 +3646,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 										   NULL, TUPLESORT_NONE);
 	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	(void) index_bulk_delete(&auxivinfo, NULL,
 							 validate_index_callback, &auxState);
 	/* If aux index is empty, merge may be skipped */
@@ -3658,7 +3668,13 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		index_close(indexRelation, NoLock);
 		table_close(heapRelation, NoLock);
 
-		return;
+		PushActiveSnapshot(GetTransactionSnapshot());
+		limitXmin = GetActiveSnapshot()->xmin;
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		Assert(!TransactionIdIsValid(MyProc->xmin));
+		return limitXmin;
 	}
 
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
@@ -3667,6 +3683,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	/* ambulkdelete updates progress metrics */
 	(void) index_bulk_delete(&ivinfo, NULL,
 							 validate_index_callback, &state);
@@ -3686,19 +3705,24 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	tuplesort_performsort(auxState.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+	Assert(!TransactionIdIsValid(MyProc->xmin));
 
 	/*
 	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
-	table_index_validate_scan(heapRelation,
-							  indexRelation,
-							  indexInfo,
-							  snapshot,
-							  &state,
-							  &auxState);
+	limitXmin = table_index_validate_scan(heapRelation,
+										  indexRelation,
+										  indexInfo,
+										  &state,
+										  &auxState);
 
 	/* Tuple sort closed by table_index_validate_scan */
 	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
@@ -3721,6 +3745,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
+
+	Assert(!TransactionIdIsValid(MyProc->xmin));
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 599e3375833..7f8adbdbda3 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -595,7 +595,6 @@ DefineIndex(ParseState *pstate,
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -1813,32 +1812,11 @@ DefineIndex(ParseState *pstate,
 	/* Tell concurrent index builds to ignore us, if index qualifies */
 	if (safe_index)
 		set_indexsafe_procflags();
-
-	/*
-	 * Now take the "reference snapshot" that will be used by validate_index()
-	 * to filter candidate tuples.  Beware!  There might still be snapshots in
-	 * use that treat some transaction as in-progress that our reference
-	 * snapshot treats as committed.  If such a recently-committed transaction
-	 * deleted tuples in the table, we will not include them in the index; yet
-	 * those transactions which see the deleting one as still-in-progress will
-	 * expect such tuples to be there once we mark the index as valid.
-	 *
-	 * We solve this by waiting for all endangered transactions to exit before
-	 * we mark the index as valid.
-	 *
-	 * We also set ActiveSnapshot to this snap, since functions in indexes may
-	 * need a snapshot.
-	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
-	PushActiveSnapshot(snapshot);
 	/*
 	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
-	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
-	limitXmin = snapshot->xmin;
+	limitXmin = validate_index(tableId, indexRelationId, auxIndexRelationId);
 
-	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1860,8 +1838,8 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * The index is now valid in the sense that it contains all currently
 	 * interesting tuples.  But since it might not contain tuples deleted just
-	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.
+	 * before the last snapshot during validating was taken, we have to wait
+	 * out any transactions that might have older snapshots.
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
@@ -4426,7 +4404,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
-		Snapshot	snapshot;
 
 		StartTransactionCommand();
 
@@ -4441,13 +4418,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		if (newidx->safe)
 			set_indexsafe_procflags();
 
-		/*
-		 * Take the "reference snapshot" that will be used by validate_index()
-		 * to filter candidate tuples.
-		 */
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
-		PushActiveSnapshot(snapshot);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
@@ -4459,16 +4429,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
-
-		/*
-		 * We can now do away with our active snapshot, we still need to save
-		 * the xmin limit to wait for older snapshots.
-		 */
-		limitXmin = snapshot->xmin;
-
-		PopActiveSnapshot();
-		UnregisterSnapshot(snapshot);
+		limitXmin = validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId);
+		Assert(!TransactionIdIsValid(MyProc->xmin));
 
 		/*
 		 * To ensure no deadlocks, we must commit and start yet another
@@ -4481,7 +4443,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		/*
 		 * The index is now valid in the sense that it contains all currently
 		 * interesting tuples.  But since it might not contain tuples deleted
-		 * just before the reference snap was taken, we have to wait out any
+		 * just before the latest snap was taken, we have to wait out any
 		 * transactions that might have older snapshots.
 		 *
 		 * Because we don't take a snapshot or Xid in this transaction,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1a997537800..2380a593d71 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -701,12 +701,11 @@ typedef struct TableAmRoutine
 										   TableScanDesc scan);
 
 	/* see table_index_validate_scan for reference about parameters */
-	void		(*index_validate_scan) (Relation table_rel,
-										Relation index_rel,
-										IndexInfo *index_info,
-										Snapshot snapshot,
-										ValidateIndexState *state,
-										ValidateIndexState *aux_state);
+	TransactionId		(*index_validate_scan) (Relation table_rel,
+												Relation index_rel,
+												IndexInfo *index_info,
+												ValidateIndexState *state,
+												ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1829,20 +1828,18 @@ table_index_build_range_scan(Relation table_rel,
  * Note: it is responsibility of that function to close sortstates in
  * both `state` and `auxstate`.
  */
-static inline void
+static inline TransactionId
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
-						  Snapshot snapshot,
 						  ValidateIndexState *state,
 						  ValidateIndexState *auxstate)
 {
-	table_rel->rd_tableam->index_validate_scan(table_rel,
-											   index_rel,
-											   index_info,
-											   snapshot,
-											   state,
-											   auxstate);
+	return table_rel->rd_tableam->index_validate_scan(table_rel,
+													  index_rel,
+													  index_info,
+													  state,
+													  auxstate);
 }
 
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 6fa91bfcdc0..b33084cb91a 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -417,6 +417,21 @@ NormalTransactionIdOlder(TransactionId a, TransactionId b)
 	return b;
 }
 
+/* return the newer of the two IDs */
+static inline TransactionId
+TransactionIdNewer(TransactionId a, TransactionId b)
+{
+	if (!TransactionIdIsValid(a))
+		return b;
+
+	if (!TransactionIdIsValid(b))
+		return a;
+
+	if (TransactionIdFollows(a, b))
+		return a;
+	return b;
+}
+
 /* return the newer of the two IDs */
 static inline FullTransactionId
 FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 37a390d33de..d1f6411dd78 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -152,7 +152,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
+extern TransactionId validate_index(Oid heapId, Oid indexId, Oid auxIndexId);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
-- 
2.43.0



  [application/x-patch] v30-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch (19.7K, 7-v30-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch)
  download | inline diff:
From c2296f78588a531978e5aa8e96cd6c4eae9a5f65 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 12 Jan 2026 00:57:56 +0300
Subject: [PATCH v30 3/7] Add Datum storage support to tuplestore Extend
 tuplestore to store individual Datum values: - fixed-length datatypes and
 variable-length datatypes: include a length header - by-value types: store
 inline with one extra byte (but without support of random access)

This support enables usages of tuplestore for non-tuple data (TIDs) in the next commit.
---
 src/backend/utils/sort/tuplestore.c | 330 +++++++++++++++++++++++-----
 src/include/utils/tuplestore.h      |  33 +--
 2 files changed, 293 insertions(+), 70 deletions(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index afba82f28a2..692e325eafd 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1,16 +1,19 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.c
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
+ *
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
  * Also, it is possible to support multiple independent read pointers.
  *
  * A temporary file is used to handle the data if it exceeds the
@@ -61,6 +64,8 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "storage/buffile.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
 
@@ -115,16 +120,15 @@ struct Tuplestorestate
 	BufFile    *myfile;			/* underlying file, or NULL if none */
 	MemoryContext context;		/* memory context for holding tuples */
 	ResourceOwner resowner;		/* resowner for holding temp files */
+	Oid			datumType;		/* InvalidOid or oid of Datum's to be stored */
+	int16		datumTypeLen;	/* typelen of that Datum */
+	bool		datumTypeByVal; /* by-value of that Datum */
 
 	/*
 	 * These function pointers decouple the routines that must know what kind
 	 * of tuple we are handling from the routines that don't need to know it.
 	 * They are set up by the tuplestore_begin_xxx routines.
 	 *
-	 * (Although tuplestore.c currently only supports heap tuples, I've copied
-	 * this part of tuplesort.c so that extension to other kinds of objects
-	 * will be easy if it's ever needed.)
-	 *
 	 * Function to copy a supplied input tuple into palloc'd space. (NB: we
 	 * assume that a single pfree() is enough to release the tuple later, so
 	 * the representation must be "flat" in one palloc chunk.) state->availMem
@@ -149,6 +153,12 @@ struct Tuplestorestate
 	 */
 	void	   *(*readtup) (Tuplestorestate *state, unsigned int len);
 
+	/*
+	 * Function to get length of tuple from tape. Used to provide 'len' argument
+	 * for readtup (see above).
+	 */
+	unsigned int(*lentup) (Tuplestorestate *state, bool eofOK);
+
 	/*
 	 * This array holds pointers to tuples in memory if we are in state INMEM.
 	 * In states WRITEFILE and READFILE it's not used.
@@ -185,6 +195,7 @@ struct Tuplestorestate
 #define COPYTUP(state,tup)	((*(state)->copytup) (state, tup))
 #define WRITETUP(state,tup) ((*(state)->writetup) (state, tup))
 #define READTUP(state,len)	((*(state)->readtup) (state, len))
+#define LENTUP(state,eofOK)	((*(state)->lentup) (state, eofOK))
 #define LACKMEM(state)		((state)->availMem < 0)
 #define USEMEM(state,amt)	((state)->availMem -= (amt))
 #define FREEMEM(state,amt)	((state)->availMem += (amt))
@@ -193,9 +204,9 @@ struct Tuplestorestate
  *
  * NOTES about on-tape representation of tuples:
  *
- * We require the first "unsigned int" of a stored tuple to be the total size
- * on-tape of the tuple, including itself (so it is never zero).
- * The remainder of the stored tuple
+ * In case of tuples we use first "unsigned int" of a stored tuple
+ * to be the total size on-tape of the tuple, including itself
+ * (so it is never zero). The remainder of the stored tuple
  * may or may not match the in-memory representation of the tuple ---
  * any conversion needed is the job of the writetup and readtup routines.
  *
@@ -206,10 +217,13 @@ struct Tuplestorestate
  * state->backward is not set, the write/read routines may omit the extra
  * length word.
  *
- * writetup is expected to write both length words as well as the tuple
+ * In the case of Datum with constant length, both "unsigned int" are omitted.
+ *
+ * writetup is expected to write both length words and the tuple
  * data.  When readtup is called, the tape is positioned just after the
- * front length word; readtup must read the tuple data and advance past
- * the back length word (if present).
+ * front length word (if it is not omitted like in case of content-size Datum);
+ * readtup must read the tuple data and advance past the back length word
+ * (if present).
  *
  * The write/read routines can make use of the tuple description data
  * stored in the Tuplestorestate record, if needed. They are also expected
@@ -241,11 +255,16 @@ static Tuplestorestate *tuplestore_begin_common(int eflags,
 static void tuplestore_puttuple_common(Tuplestorestate *state, void *tuple);
 static void dumptuples(Tuplestorestate *state);
 static void tuplestore_updatemax(Tuplestorestate *state);
-static unsigned int getlen(Tuplestorestate *state, bool eofOK);
+
+static unsigned int lentup_heap(Tuplestorestate *state, bool eofOK);
 static void *copytup_heap(Tuplestorestate *state, void *tup);
 static void writetup_heap(Tuplestorestate *state, void *tup);
 static void *readtup_heap(Tuplestorestate *state, unsigned int len);
 
+static unsigned int lentup_datum(Tuplestorestate *state, bool eofOK);
+static void *copytup_datum(Tuplestorestate *state, void *datum);
+static void writetup_datum(Tuplestorestate *state, void *datum);
+static void *readtup_datum(Tuplestorestate *state, unsigned int len);
 
 /*
  *		tuplestore_begin_xxx
@@ -268,6 +287,12 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
 	state->allowedMem = maxKBytes * (int64) 1024;
 	state->availMem = state->allowedMem;
 	state->myfile = NULL;
+	/*
+	 * Set Datum related data to invalid by default.
+	 */
+	state->datumType = InvalidOid;
+	state->datumTypeLen = 0;
+	state->datumTypeByVal = false;
 
 	/*
 	 * The palloc/pfree pattern for tuple memory is in a FIFO pattern.  A
@@ -345,6 +370,37 @@ tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
 	state->copytup = copytup_heap;
 	state->writetup = writetup_heap;
 	state->readtup = readtup_heap;
+	state->lentup = lentup_heap;
+
+	return state;
+}
+
+/*
+ * The same as tuplestore_begin_heap but create store for Datum values.
+ */
+Tuplestorestate *
+tuplestore_begin_datum(Oid datumType, bool randomAccess, bool interXact, int maxKBytes)
+{
+	Tuplestorestate *state;
+	int			eflags;
+
+	/*
+	 * This interpretation of the meaning of randomAccess is compatible with
+	 * the pre-8.3 behavior of tuplestores.
+	 */
+	eflags = randomAccess ?
+		(EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND) :
+		(EXEC_FLAG_REWIND);
+
+	state = tuplestore_begin_common(eflags, interXact, maxKBytes);
+	state->datumType = datumType;
+	get_typlenbyval(state->datumType, &state->datumTypeLen, &state->datumTypeByVal);
+	Assert(!(state->datumTypeByVal && randomAccess));
+
+	state->copytup = copytup_datum;
+	state->writetup = writetup_datum;
+	state->readtup = readtup_datum;
+	state->lentup = lentup_datum;
 
 	return state;
 }
@@ -443,16 +499,19 @@ tuplestore_clear(Tuplestorestate *state)
 	{
 		int64		availMem = state->availMem;
 
-		/*
-		 * Below, we reset the memory context for storing tuples.  To save
-		 * from having to always call GetMemoryChunkSpace() on all stored
-		 * tuples, we adjust the availMem to forget all the tuples and just
-		 * recall USEMEM for the space used by the memtuples array.  Here we
-		 * just Assert that's correct and the memory tracking hasn't gone
-		 * wrong anywhere.
-		 */
-		for (i = state->memtupdeleted; i < state->memtupcount; i++)
-			availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			/*
+			 * Below, we reset the memory context for storing tuples.  To save
+			 * from having to always call GetMemoryChunkSpace() on all stored
+			 * tuples, we adjust the availMem to forget all the tuples and just
+			 * recall USEMEM for the space used by the memtuples array.  Here we
+			 * just Assert that's correct and the memory tracking hasn't gone
+			 * wrong anywhere.
+			 */
+			for (i = state->memtupdeleted; i < state->memtupcount; i++)
+				availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		}
 
 		availMem += GetMemoryChunkSpace(state->memtuples);
 
@@ -776,6 +835,25 @@ tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple)
 	MemoryContextSwitchTo(oldcxt);
 }
 
+/*
+ * Like tuplestore_puttupleslot but for single Datum.
+ */
+void
+tuplestore_putdatum(Tuplestorestate *state, Datum datum)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->context);
+
+	/*
+	 * Copy the Datum.  (Must do this even in WRITEFILE case.  Note that
+	 * COPYTUP includes USEMEM, so we needn't do that here.)
+	 */
+	datum = PointerGetDatum(COPYTUP(state, DatumGetPointer(datum)));
+
+	tuplestore_puttuple_common(state, DatumGetPointer(datum));
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
 /*
  * Similar to tuplestore_puttuple(), but work from values + nulls arrays.
  * This avoids an extra tuple-construction operation.
@@ -1027,10 +1105,10 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 			pg_fallthrough;
 
 		case TSS_READFILE:
-			*should_free = true;
+			*should_free = !state->datumTypeByVal;
 			if (forward)
 			{
-				if ((tuplen = getlen(state, true)) != 0)
+				if ((tuplen = LENTUP(state, true)) != 0)
 				{
 					tup = READTUP(state, tuplen);
 					return tup;
@@ -1042,6 +1120,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				}
 			}
 
+			Assert(!state->datumTypeByVal);
 			/*
 			 * Backward.
 			 *
@@ -1059,7 +1138,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				Assert(!state->truncated);
 				return NULL;
 			}
-			tuplen = getlen(state, false);
+			tuplen = LENTUP(state, false);
 
 			if (readptr->eof_reached)
 			{
@@ -1090,7 +1169,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 					Assert(!state->truncated);
 					return NULL;
 				}
-				tuplen = getlen(state, false);
+				tuplen = LENTUP(state, false);
 			}
 
 			/*
@@ -1152,6 +1231,41 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 	}
 }
 
+bool
+tuplestore_getdatum(Tuplestorestate *state, bool forward,
+					bool *should_free, Datum *result)
+{
+	Datum datum;
+	*should_free = false;
+
+	datum = (Datum) tuplestore_gettuple(state, forward, should_free);
+
+	/* For by-value datum we may receive zero as valid value. */
+	if (state->datumTypeByVal)
+	{
+		/* Return false only on EOF */
+		if (state->readptrs[state->activeptr].eof_reached)
+		{
+			*result = PointerGetDatum(NULL);
+			return false;
+		}
+
+		*result = datum;
+		return true;
+	}
+
+	if (datum)
+	{
+		*result = datum;
+		return true;
+	}
+	else
+	{
+		*result = PointerGetDatum(NULL);
+		return false;
+	}
+}
+
 /*
  * tuplestore_advance - exported function to adjust position without fetching
  *
@@ -1460,8 +1574,11 @@ tuplestore_trim(Tuplestorestate *state)
 	/* Release no-longer-needed tuples */
 	for (i = state->memtupdeleted; i < nremove; i++)
 	{
-		FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
-		pfree(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
+			pfree(state->memtuples[i]);
+		}
 		state->memtuples[i] = NULL;
 	}
 	state->memtupdeleted = nremove;
@@ -1556,25 +1673,6 @@ tuplestore_in_memory(Tuplestorestate *state)
 	return (state->status == TSS_INMEM);
 }
 
-
-/*
- * Tape interface routines
- */
-
-static unsigned int
-getlen(Tuplestorestate *state, bool eofOK)
-{
-	unsigned int len;
-	size_t		nbytes;
-
-	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
-	if (nbytes == 0)
-		return 0;
-	else
-		return len;
-}
-
-
 /*
  * Routines specialized for HeapTuple case
  *
@@ -1585,6 +1683,19 @@ getlen(Tuplestorestate *state, bool eofOK)
  * to write that separately.
  */
 
+static unsigned int
+lentup_heap(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	else
+		return len;
+}
+
 static void *
 copytup_heap(Tuplestorestate *state, void *tup)
 {
@@ -1631,3 +1742,112 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 		BufFileReadExact(state->myfile, &tuplen, sizeof(tuplen));
 	return tuple;
 }
+
+/*
+ * Routines specialized for Datum case.
+ *
+ * Handles both fixed and variable-length Datums efficiently:
+ * - Fixed-length and Variable-length includes length prefix (and suffix if backward scan)
+ * - By-value types handled inline without extra copying, storing single extra byte
+ *   XXX: consider refactoring to avoid it, currently need it for correct rewind logic
+ */
+
+static unsigned int
+lentup_datum(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	Assert(state->datumType != InvalidOid);
+
+	if (state->datumTypeByVal)
+	{
+		uint8_t	junk;
+		nbytes = BufFileReadMaybeEOF(state->myfile, &junk, sizeof(uint8_t), eofOK);
+		if (nbytes == 0)
+			return 0;
+		Assert(junk == (uint8_t) state->datumTypeLen);
+		return state->datumTypeLen;
+	}
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	return len;
+}
+
+static void *
+copytup_datum(Tuplestorestate *state, void *datum)
+{
+	Datum d;
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+		return DatumGetPointer(PointerGetDatum(datum));
+
+	d = datumCopy(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+	USEMEM(state, GetMemoryChunkSpace(DatumGetPointer(d)));
+	return DatumGetPointer(d);
+}
+
+static void
+writetup_datum(Tuplestorestate *state, void *datum)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		uint8_t junk = state->datumTypeLen; /* overflow is ok */
+		Datum v;
+		Assert(state->datumTypeLen > 0);
+
+		/* just marker byte used to track the end of data for rewind logic */
+		BufFileWrite(state->myfile, &junk, sizeof(junk));
+		store_att_byval(&v, PointerGetDatum(datum), state->datumTypeLen);
+		BufFileWrite(state->myfile, &v, state->datumTypeLen);
+		Assert(!state->backward);
+	}
+	else
+	{
+		unsigned int size = state->datumTypeLen;
+		if (state->datumTypeLen < 0)
+			size = datumGetSize(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+
+		BufFileWrite(state->myfile, &size, sizeof(size));
+
+		BufFileWrite(state->myfile, datum, size);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileWrite(state->myfile, &size, sizeof(size));
+
+		FREEMEM(state, GetMemoryChunkSpace(datum));
+		pfree(datum);
+	}
+}
+
+static void *
+readtup_datum(Tuplestorestate *state, unsigned int len)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		Datum datum;
+
+		Assert(state->datumTypeLen > 0);
+		Assert(len == state->datumTypeLen);
+		BufFileReadExact(state->myfile, &datum, state->datumTypeLen);
+
+		Assert(!state->backward);
+		return DatumGetPointer(fetch_att(&datum, true, state->datumTypeLen));
+	}
+	else
+	{
+		Datum *data = palloc(len);
+		BufFileReadExact(state->myfile, data, len);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileReadExact(state->myfile, &len, sizeof(len));
+
+		return data;
+	}
+}
diff --git a/src/include/utils/tuplestore.h b/src/include/utils/tuplestore.h
index 1c08e219e89..665d6d57635 100644
--- a/src/include/utils/tuplestore.h
+++ b/src/include/utils/tuplestore.h
@@ -1,17 +1,18 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.h
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
- * Also, it is possible to support multiple independent read pointers.
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
  * A temporary file is used to handle the data if it exceeds the
  * space limit specified by the caller.
@@ -39,14 +40,13 @@
  */
 typedef struct Tuplestorestate Tuplestorestate;
 
-/*
- * Currently we only need to store MinimalTuples, but it would be easy
- * to support the same behavior for IndexTuples and/or bare Datums.
- */
-
 extern Tuplestorestate *tuplestore_begin_heap(bool randomAccess,
 											  bool interXact,
 											  int maxKBytes);
+extern Tuplestorestate *tuplestore_begin_datum(Oid datumType,
+											   bool randomAccess,
+											   bool interXact,
+											   int maxKBytes);
 
 extern void tuplestore_set_eflags(Tuplestorestate *state, int eflags);
 
@@ -55,6 +55,7 @@ extern void tuplestore_puttupleslot(Tuplestorestate *state,
 extern void tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple);
 extern void tuplestore_putvalues(Tuplestorestate *state, TupleDesc tdesc,
 								 const Datum *values, const bool *isnull);
+extern void tuplestore_putdatum(Tuplestorestate *state, Datum datum);
 
 extern int	tuplestore_alloc_read_pointer(Tuplestorestate *state, int eflags);
 
@@ -72,6 +73,8 @@ extern bool tuplestore_in_memory(Tuplestorestate *state);
 
 extern bool tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 									bool copy, TupleTableSlot *slot);
+extern bool tuplestore_getdatum(Tuplestorestate *state, bool forward,
+								bool *should_free, Datum *result);
 
 extern bool tuplestore_advance(Tuplestorestate *state, bool forward);
 
-- 
2.43.0



  [application/x-patch] v30-0001-Add-stress-tests-for-concurrent-index-builds.patch (9.3K, 8-v30-0001-Add-stress-tests-for-concurrent-index-builds.patch)
  download | inline diff:
From 6f41e75a4d85a8b46d88e3ad87ef184486f8f9a8 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v30 1/7] Add stress tests for concurrent index builds

Introduce stress tests for concurrent index operations:
- test concurrent inserts/updates during CREATE/REINDEX INDEX CONCURRENTLY
- cover various index types (btree, gin, gist, brin, hash, spgist)
- test unique and non-unique indexes
- test with expressions and predicates
- test both parallel and non-parallel operations

These tests verify the behavior of the following commits.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 225 ++++++++++++++++++++++++++++++++
 2 files changed, 226 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 592cef74ecb..51a62dccb7b 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..f160f9d18d7
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,225 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'maintenance_work_mem = 32MB'); # to avoid OOM
+$node->append_conf('postgresql.conf', 'shared_buffers = 32MB'); # to avoid OOM
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp,
+								ia int4[], p point)));
+# uncomment to force non-HOT -> $node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);));
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=15 --jobs=4 --exit-on-abort --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => q(
+			SET debug_parallel_query = off; -- this is because predicate_stable implementation
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=15 --jobs=4 --exit-on-abort --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for unique BTREE',
+	{
+		'concurrent_ops_unique_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE UNIQUE INDEX CONCURRENTLY new_idx ON tbl(i);
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIN with upserts
+$node->pgbench(
+	'--no-vacuum --client=15 --jobs=4 --exit-on-abort --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN/GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_gin_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIN (ia);
+					\sleep 10 ms
+					SELECT gin_index_check('new_idx');
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					SELECT gin_index_check('new_idx');
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIST/BRIN/HASH/SPGIST index concurrently with upserts
+$node->pgbench(
+	'--no-vacuum --client=15 --jobs=4 --exit-on-abort --transactions=1000',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN/GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_other_idx' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\set variant random(0, 3)
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIST (p);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING BRIN (updated_at);
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING HASH (updated_at);
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING SPGIST (p);
+					\endif
+					\sleep 10 ms
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\sleep 10 ms
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now())
+					ON CONFLICT(i) DO UPDATE SET updated_at = now();
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+		)
+	});
+
+$node->stop;
+done_testing();
\ No newline at end of file
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-03-23 22:08  Mihail Nikalayeu <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Mihail Nikalayeu @ 2026-03-23 22:08 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; +Cc: Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Hello!

Fixed compilation, updates stress test, fixed few potential issues
with tuplestore, some style fixes around.

Best regards,
Mikhail.


Attachments:

  [application/octet-stream] v31-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch (21.0K, 2-v31-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch)
  download | inline diff:
From bd1919d3a299ac927322d2c3d5eee1b273ba43a5 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 12 Jan 2026 00:57:56 +0300
Subject: [PATCH v31 3/7] Add Datum storage support to tuplestore Extend
 tuplestore to store individual Datum values: - fixed-length datatypes and
 variable-length datatypes: include a length header - by-value types: store
 inline with one extra byte (but without support of random access)

This support enables usages of tuplestore for non-tuple data (TIDs) in the next commit.
---
 src/backend/utils/sort/tuplestore.c | 361 +++++++++++++++++++++++-----
 src/include/utils/tuplestore.h      |  33 +--
 2 files changed, 321 insertions(+), 73 deletions(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index 273a4c9b02f..3fc54deb0fd 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1,16 +1,19 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.c
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
+ *
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
  * Also, it is possible to support multiple independent read pointers.
  *
  * A temporary file is used to handle the data if it exceeds the
@@ -61,6 +64,8 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "storage/buffile.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
 #include "utils/tuplestore.h"
@@ -116,16 +121,15 @@ struct Tuplestorestate
 	BufFile    *myfile;			/* underlying file, or NULL if none */
 	MemoryContext context;		/* memory context for holding tuples */
 	ResourceOwner resowner;		/* resowner for holding temp files */
+	Oid			datumType;		/* InvalidOid or oid of Datum's to be stored */
+	int16		datumTypeLen;	/* typelen of that Datum */
+	bool		datumTypeByVal; /* by-value of that Datum */
 
 	/*
 	 * These function pointers decouple the routines that must know what kind
 	 * of tuple we are handling from the routines that don't need to know it.
 	 * They are set up by the tuplestore_begin_xxx routines.
 	 *
-	 * (Although tuplestore.c currently only supports heap tuples, I've copied
-	 * this part of tuplesort.c so that extension to other kinds of objects
-	 * will be easy if it's ever needed.)
-	 *
 	 * Function to copy a supplied input tuple into palloc'd space. (NB: we
 	 * assume that a single pfree() is enough to release the tuple later, so
 	 * the representation must be "flat" in one palloc chunk.) state->availMem
@@ -150,6 +154,12 @@ struct Tuplestorestate
 	 */
 	void	   *(*readtup) (Tuplestorestate *state, unsigned int len);
 
+	/*
+	 * Function to get length of tuple from tape. Used to provide 'len' argument
+	 * for readtup (see above).
+	 */
+	unsigned int(*lentup) (Tuplestorestate *state, bool eofOK);
+
 	/*
 	 * This array holds pointers to tuples in memory if we are in state INMEM.
 	 * In states WRITEFILE and READFILE it's not used.
@@ -186,6 +196,7 @@ struct Tuplestorestate
 #define COPYTUP(state,tup)	((*(state)->copytup) (state, tup))
 #define WRITETUP(state,tup) ((*(state)->writetup) (state, tup))
 #define READTUP(state,len)	((*(state)->readtup) (state, len))
+#define LENTUP(state,eofOK)	((*(state)->lentup) (state, eofOK))
 #define LACKMEM(state)		((state)->availMem < 0)
 #define USEMEM(state,amt)	((state)->availMem -= (amt))
 #define FREEMEM(state,amt)	((state)->availMem += (amt))
@@ -194,9 +205,9 @@ struct Tuplestorestate
  *
  * NOTES about on-tape representation of tuples:
  *
- * We require the first "unsigned int" of a stored tuple to be the total size
- * on-tape of the tuple, including itself (so it is never zero).
- * The remainder of the stored tuple
+ * In case of tuples we use first "unsigned int" of a stored tuple
+ * to be the total size on-tape of the tuple, including itself
+ * (so it is never zero). The remainder of the stored tuple
  * may or may not match the in-memory representation of the tuple ---
  * any conversion needed is the job of the writetup and readtup routines.
  *
@@ -207,10 +218,13 @@ struct Tuplestorestate
  * state->backward is not set, the write/read routines may omit the extra
  * length word.
  *
- * writetup is expected to write both length words as well as the tuple
+ * In the case of Datum with constant length, both "unsigned int" are omitted.
+ *
+ * writetup is expected to write both length words and the tuple
  * data.  When readtup is called, the tape is positioned just after the
- * front length word; readtup must read the tuple data and advance past
- * the back length word (if present).
+ * front length word (if it is not omitted like in case of content-size Datum);
+ * readtup must read the tuple data and advance past the back length word
+ * (if present).
  *
  * The write/read routines can make use of the tuple description data
  * stored in the Tuplestorestate record, if needed. They are also expected
@@ -242,11 +256,16 @@ static Tuplestorestate *tuplestore_begin_common(int eflags,
 static void tuplestore_puttuple_common(Tuplestorestate *state, void *tuple);
 static void dumptuples(Tuplestorestate *state);
 static void tuplestore_updatemax(Tuplestorestate *state);
-static unsigned int getlen(Tuplestorestate *state, bool eofOK);
+
+static unsigned int lentup_heap(Tuplestorestate *state, bool eofOK);
 static void *copytup_heap(Tuplestorestate *state, void *tup);
 static void writetup_heap(Tuplestorestate *state, void *tup);
 static void *readtup_heap(Tuplestorestate *state, unsigned int len);
 
+static unsigned int lentup_datum(Tuplestorestate *state, bool eofOK);
+static void *copytup_datum(Tuplestorestate *state, void *datum);
+static void writetup_datum(Tuplestorestate *state, void *datum);
+static void *readtup_datum(Tuplestorestate *state, unsigned int len);
 
 /*
  *		tuplestore_begin_xxx
@@ -269,6 +288,12 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
 	state->allowedMem = maxKBytes * (int64) 1024;
 	state->availMem = state->allowedMem;
 	state->myfile = NULL;
+	/*
+	 * Set Datum related data to invalid by default.
+	 */
+	state->datumType = InvalidOid;
+	state->datumTypeLen = 0;
+	state->datumTypeByVal = false;
 
 	/*
 	 * The palloc/pfree pattern for tuple memory is in a FIFO pattern.  A
@@ -346,6 +371,37 @@ tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
 	state->copytup = copytup_heap;
 	state->writetup = writetup_heap;
 	state->readtup = readtup_heap;
+	state->lentup = lentup_heap;
+
+	return state;
+}
+
+/*
+ * The same as tuplestore_begin_heap but create store for Datum values.
+ */
+Tuplestorestate *
+tuplestore_begin_datum(Oid datumType, bool randomAccess, bool interXact, int maxKBytes)
+{
+	Tuplestorestate *state;
+	int			eflags;
+
+	/*
+	 * This interpretation of the meaning of randomAccess is compatible with
+	 * the pre-8.3 behavior of tuplestores.
+	 */
+	eflags = randomAccess ?
+		(EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND) :
+		(EXEC_FLAG_REWIND);
+
+	state = tuplestore_begin_common(eflags, interXact, maxKBytes);
+	state->datumType = datumType;
+	get_typlenbyval(state->datumType, &state->datumTypeLen, &state->datumTypeByVal);
+	Assert(!(state->datumTypeByVal && randomAccess));
+
+	state->copytup = copytup_datum;
+	state->writetup = writetup_datum;
+	state->readtup = readtup_datum;
+	state->lentup = lentup_datum;
 
 	return state;
 }
@@ -444,16 +500,19 @@ tuplestore_clear(Tuplestorestate *state)
 	{
 		int64		availMem = state->availMem;
 
-		/*
-		 * Below, we reset the memory context for storing tuples.  To save
-		 * from having to always call GetMemoryChunkSpace() on all stored
-		 * tuples, we adjust the availMem to forget all the tuples and just
-		 * recall USEMEM for the space used by the memtuples array.  Here we
-		 * just Assert that's correct and the memory tracking hasn't gone
-		 * wrong anywhere.
-		 */
-		for (i = state->memtupdeleted; i < state->memtupcount; i++)
-			availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			/*
+			 * Below, we reset the memory context for storing tuples.  To save
+			 * from having to always call GetMemoryChunkSpace() on all stored
+			 * tuples, we adjust the availMem to forget all the tuples and just
+			 * recall USEMEM for the space used by the memtuples array.  Here we
+			 * just Assert that's correct and the memory tracking hasn't gone
+			 * wrong anywhere.
+			 */
+			for (i = state->memtupdeleted; i < state->memtupcount; i++)
+				availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		}
 
 		availMem += GetMemoryChunkSpace(state->memtuples);
 
@@ -777,6 +836,25 @@ tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple)
 	MemoryContextSwitchTo(oldcxt);
 }
 
+/*
+ * Like tuplestore_puttupleslot but for single Datum.
+ */
+void
+tuplestore_putdatum(Tuplestorestate *state, Datum datum)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->context);
+
+	/*
+	 * Copy the Datum.  (Must do this even in WRITEFILE case.  Note that
+	 * COPYTUP includes USEMEM, so we needn't do that here.)
+	 */
+	datum = PointerGetDatum(COPYTUP(state, DatumGetPointer(datum)));
+
+	tuplestore_puttuple_common(state, DatumGetPointer(datum));
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
 /*
  * Similar to tuplestore_puttuple(), but work from values + nulls arrays.
  * This avoids an extra tuple-construction operation.
@@ -1028,10 +1106,10 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 			pg_fallthrough;
 
 		case TSS_READFILE:
-			*should_free = true;
+			*should_free = !state->datumTypeByVal;
 			if (forward)
 			{
-				if ((tuplen = getlen(state, true)) != 0)
+				if ((tuplen = LENTUP(state, true)) != 0)
 				{
 					tup = READTUP(state, tuplen);
 					return tup;
@@ -1043,6 +1121,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				}
 			}
 
+			Assert(!state->datumTypeByVal);
 			/*
 			 * Backward.
 			 *
@@ -1060,7 +1139,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				Assert(!state->truncated);
 				return NULL;
 			}
-			tuplen = getlen(state, false);
+			tuplen = LENTUP(state, false);
 
 			if (readptr->eof_reached)
 			{
@@ -1091,7 +1170,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 					Assert(!state->truncated);
 					return NULL;
 				}
-				tuplen = getlen(state, false);
+				tuplen = LENTUP(state, false);
 			}
 
 			/*
@@ -1153,6 +1232,41 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 	}
 }
 
+bool
+tuplestore_getdatum(Tuplestorestate *state, bool forward,
+					bool *should_free, Datum *result)
+{
+	Datum datum;
+	*should_free = false;
+
+	datum = (Datum) tuplestore_gettuple(state, forward, should_free);
+
+	/* For by-value datum we may receive zero as valid value. */
+	if (state->datumTypeByVal)
+	{
+		/* Return false only on EOF */
+		if (state->readptrs[state->activeptr].eof_reached)
+		{
+			*result = PointerGetDatum(NULL);
+			return false;
+		}
+
+		*result = datum;
+		return true;
+	}
+
+	if (datum)
+	{
+		*result = datum;
+		return true;
+	}
+	else
+	{
+		*result = PointerGetDatum(NULL);
+		return false;
+	}
+}
+
 /*
  * tuplestore_advance - exported function to adjust position without fetching
  *
@@ -1173,10 +1287,20 @@ tuplestore_advance(Tuplestorestate *state, bool forward)
 			pfree(tuple);
 		return true;
 	}
-	else
+
+	/*
+	 * A NULL return normally means end-of-data, but for by-value datum
+	 * stores a valid zero-valued datum (e.g., false, 0) is indistinguishable
+	 * from NULL via pointer check.  Use eof_reached to distinguish.
+	 */
+	if (state->datumTypeByVal)
 	{
-		return false;
+		TSReadPointer *readptr = &state->readptrs[state->activeptr];
+
+		return !readptr->eof_reached;
 	}
+
+	return false;
 }
 
 /*
@@ -1239,7 +1363,12 @@ tuplestore_skiptuples(Tuplestorestate *state, int64 ntuples, bool forward)
 				tuple = tuplestore_gettuple(state, forward, &should_free);
 
 				if (tuple == NULL)
-					return false;
+				{
+					/* See tuplestore_advance for why pointer check is insufficient */
+					if (!state->datumTypeByVal ||
+						state->readptrs[state->activeptr].eof_reached)
+						return false;
+				}
 				if (should_free)
 					pfree(tuple);
 				CHECK_FOR_INTERRUPTS();
@@ -1461,8 +1590,11 @@ tuplestore_trim(Tuplestorestate *state)
 	/* Release no-longer-needed tuples */
 	for (i = state->memtupdeleted; i < nremove; i++)
 	{
-		FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
-		pfree(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
+			pfree(state->memtuples[i]);
+		}
 		state->memtuples[i] = NULL;
 	}
 	state->memtupdeleted = nremove;
@@ -1557,25 +1689,6 @@ tuplestore_in_memory(Tuplestorestate *state)
 	return (state->status == TSS_INMEM);
 }
 
-
-/*
- * Tape interface routines
- */
-
-static unsigned int
-getlen(Tuplestorestate *state, bool eofOK)
-{
-	unsigned int len;
-	size_t		nbytes;
-
-	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
-	if (nbytes == 0)
-		return 0;
-	else
-		return len;
-}
-
-
 /*
  * Routines specialized for HeapTuple case
  *
@@ -1586,6 +1699,19 @@ getlen(Tuplestorestate *state, bool eofOK)
  * to write that separately.
  */
 
+static unsigned int
+lentup_heap(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	else
+		return len;
+}
+
 static void *
 copytup_heap(Tuplestorestate *state, void *tup)
 {
@@ -1632,3 +1758,122 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 		BufFileReadExact(state->myfile, &tuplen, sizeof(tuplen));
 	return tuple;
 }
+
+/*
+ * Routines specialized for Datum case.
+ *
+ * Handles both fixed and variable-length Datums efficiently:
+ * - Fixed-length and Variable-length includes length prefix (and suffix if backward scan)
+ * - By-value types handled inline without extra copying, storing single extra byte
+ *   XXX: consider refactoring to avoid it, currently need it for correct rewind logic
+ */
+
+static unsigned int
+lentup_datum(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	Assert(state->datumType != InvalidOid);
+
+	if (state->datumTypeByVal)
+	{
+		uint8_t	junk;
+		nbytes = BufFileReadMaybeEOF(state->myfile, &junk, sizeof(uint8_t), eofOK);
+		if (nbytes == 0)
+			return 0;
+		Assert(junk == (uint8_t) state->datumTypeLen);
+		return state->datumTypeLen;
+	}
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	return len;
+}
+
+static void *
+copytup_datum(Tuplestorestate *state, void *datum)
+{
+	Datum d;
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+		return DatumGetPointer(PointerGetDatum(datum));
+
+	d = datumCopy(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+	USEMEM(state, GetMemoryChunkSpace(DatumGetPointer(d)));
+	return DatumGetPointer(d);
+}
+
+static void
+writetup_datum(Tuplestorestate *state, void *datum)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		uint8_t junk = state->datumTypeLen; /* overflow is ok */
+		Datum v;
+		Assert(state->datumTypeLen > 0);
+
+		/* just marker byte used to track the end of data for rewind logic */
+		BufFileWrite(state->myfile, &junk, sizeof(junk));
+		store_att_byval(&v, PointerGetDatum(datum), state->datumTypeLen);
+		BufFileWrite(state->myfile, &v, state->datumTypeLen);
+		Assert(!state->backward);
+	}
+	else
+	{
+		unsigned int size = state->datumTypeLen;
+		unsigned int tuplen;
+
+		if (state->datumTypeLen < 0)
+			size = datumGetSize(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+
+		/*
+		 * Include sizeof(unsigned int) in the stored length, matching the
+		 * convention used by writetup_heap.  The backward-scan seek
+		 * arithmetic in tuplestore_gettuple assumes this.
+		 */
+		tuplen = size + sizeof(unsigned int);
+		BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		BufFileWrite(state->myfile, datum, size);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		FREEMEM(state, GetMemoryChunkSpace(datum));
+		pfree(datum);
+	}
+}
+
+static void *
+readtup_datum(Tuplestorestate *state, unsigned int len)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		Datum datum;
+
+		Assert(state->datumTypeLen > 0);
+		Assert(len == state->datumTypeLen);
+		BufFileReadExact(state->myfile, &datum, state->datumTypeLen);
+
+		Assert(!state->backward);
+		return DatumGetPointer(fetch_att(&datum, true, state->datumTypeLen));
+	}
+	else
+	{
+		unsigned int datalen = len - sizeof(unsigned int);
+		Datum *data = palloc(datalen);
+
+		BufFileReadExact(state->myfile, data, datalen);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileReadExact(state->myfile, &len, sizeof(len));
+
+		return data;
+	}
+}
diff --git a/src/include/utils/tuplestore.h b/src/include/utils/tuplestore.h
index 1c08e219e89..665d6d57635 100644
--- a/src/include/utils/tuplestore.h
+++ b/src/include/utils/tuplestore.h
@@ -1,17 +1,18 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.h
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
- * Also, it is possible to support multiple independent read pointers.
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
  * A temporary file is used to handle the data if it exceeds the
  * space limit specified by the caller.
@@ -39,14 +40,13 @@
  */
 typedef struct Tuplestorestate Tuplestorestate;
 
-/*
- * Currently we only need to store MinimalTuples, but it would be easy
- * to support the same behavior for IndexTuples and/or bare Datums.
- */
-
 extern Tuplestorestate *tuplestore_begin_heap(bool randomAccess,
 											  bool interXact,
 											  int maxKBytes);
+extern Tuplestorestate *tuplestore_begin_datum(Oid datumType,
+											   bool randomAccess,
+											   bool interXact,
+											   int maxKBytes);
 
 extern void tuplestore_set_eflags(Tuplestorestate *state, int eflags);
 
@@ -55,6 +55,7 @@ extern void tuplestore_puttupleslot(Tuplestorestate *state,
 extern void tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple);
 extern void tuplestore_putvalues(Tuplestorestate *state, TupleDesc tdesc,
 								 const Datum *values, const bool *isnull);
+extern void tuplestore_putdatum(Tuplestorestate *state, Datum datum);
 
 extern int	tuplestore_alloc_read_pointer(Tuplestorestate *state, int eflags);
 
@@ -72,6 +73,8 @@ extern bool tuplestore_in_memory(Tuplestorestate *state);
 
 extern bool tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 									bool copy, TupleTableSlot *slot);
+extern bool tuplestore_getdatum(Tuplestorestate *state, bool forward,
+								bool *should_free, Datum *result);
 
 extern bool tuplestore_advance(Tuplestorestate *state, bool forward);
 
-- 
2.43.0



  [application/octet-stream] v31-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch (36.4K, 3-v31-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch)
  download | inline diff:
From 2b1a423175ca6893b56bda69e1827e595f22df5e Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sun, 11 Jan 2026 19:27:52 +0300
Subject: [PATCH v31 2/7] Add STIR access method and flags related to auxiliary
 indexes

This patch provides infrastructure for following enhancements to concurrent index builds by:
- ii_Auxiliary in IndexInfo: indicates that an index is an auxiliary index used during concurrent index build
- validate_index in IndexVacuumInfo: set if index_bulk_delete called during the validation phase of concurrent index build
- STIR (Short-Term Index Replacement) access method is introduced, intended solely for short-lived, auxiliary usage

STIR functions are designed as an ephemeral helper during concurrent index builds, temporarily storing TIDs without providing the full features of a typical access method. As such, it raises warnings or errors when accessed outside its specialized usage path.

Planned to be used in following commits.
---
 contrib/pgstattuple/pgstattuple.c        |   3 +
 src/backend/access/Makefile              |   1 +
 src/backend/access/heap/vacuumlazy.c     |   2 +
 src/backend/access/meson.build           |   1 +
 src/backend/access/stir/Makefile         |  18 +
 src/backend/access/stir/meson.build      |   5 +
 src/backend/access/stir/stir.c           | 565 +++++++++++++++++++++++
 src/backend/catalog/index.c              |   1 +
 src/backend/catalog/toasting.c           |   1 +
 src/backend/commands/analyze.c           |   1 +
 src/backend/commands/vacuumparallel.c    |   1 +
 src/backend/nodes/makefuncs.c            |   1 +
 src/include/access/genam.h               |   1 +
 src/include/access/reloptions.h          |   3 +-
 src/include/access/stir.h                | 113 +++++
 src/include/catalog/pg_am.dat            |   3 +
 src/include/catalog/pg_opclass.dat       |   4 +
 src/include/catalog/pg_opfamily.dat      |   2 +
 src/include/catalog/pg_proc.dat          |   4 +
 src/include/nodes/execnodes.h            |   7 +-
 src/include/utils/index_selfuncs.h       |   8 +
 src/test/regress/expected/amutils.out    |   8 +-
 src/test/regress/expected/opr_sanity.out |   7 +-
 src/test/regress/expected/psql.out       |  24 +-
 24 files changed, 766 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/stir/Makefile
 create mode 100644 src/backend/access/stir/meson.build
 create mode 100644 src/backend/access/stir/stir.c
 create mode 100644 src/include/access/stir.h

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 6a7f8cb4a7c..5b5984e3aa2 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -285,6 +285,9 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 			case SPGIST_AM_OID:
 				err = "spgist index";
 				break;
+			case STIR_AM_OID:
+				err = "stir index";
+				break;
 			case BRIN_AM_OID:
 				err = "brin index";
 				break;
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index e88d72ea039..ebbcfa90715 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -19,6 +19,7 @@ SUBDIRS	    = \
 	nbtree \
 	rmgrdesc \
 	spgist \
+	stir \
 	sequence \
 	table \
 	tablesample \
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..f1785b9a456 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3138,6 +3138,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
@@ -3189,6 +3190,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
diff --git a/src/backend/access/meson.build b/src/backend/access/meson.build
index 5fd18de74f9..7219c65f365 100644
--- a/src/backend/access/meson.build
+++ b/src/backend/access/meson.build
@@ -11,6 +11,7 @@ subdir('nbtree')
 subdir('rmgrdesc')
 subdir('sequence')
 subdir('spgist')
+subdir('stir')
 subdir('table')
 subdir('tablesample')
 subdir('transam')
diff --git a/src/backend/access/stir/Makefile b/src/backend/access/stir/Makefile
new file mode 100644
index 00000000000..8785dab37bd
--- /dev/null
+++ b/src/backend/access/stir/Makefile
@@ -0,0 +1,18 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/stir
+#
+# IDENTIFICATION
+#    src/backend/access/stir/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/stir
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	stir.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/stir/meson.build b/src/backend/access/stir/meson.build
new file mode 100644
index 00000000000..4b7ad15346c
--- /dev/null
+++ b/src/backend/access/stir/meson.build
@@ -0,0 +1,5 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+backend_sources += files(
+	'stir.c',
+)
diff --git a/src/backend/access/stir/stir.c b/src/backend/access/stir/stir.c
new file mode 100644
index 00000000000..f21b229de42
--- /dev/null
+++ b/src/backend/access/stir/stir.c
@@ -0,0 +1,565 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.c
+ *	  Implementation of Short-Term Index Replacement.
+ *
+ * STIR is a specialized access method type designed for temporary storage
+ * of TID values during concurrent index build operations.
+ *
+ * The typical lifecycle of a STIR index is:
+ * 1. created as an auxiliary index for CIC/RIC
+ * 2. accepts inserts for a period
+ * 3. stirbulkdelete called during index validation phase
+ * 4. gets dropped
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/stir/stir.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amvalidate.h"
+#include "access/htup_details.h"
+#include "access/stir.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_amop.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_opfamily.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "utils/catcache.h"
+#include "utils/fmgrprotos.h"
+#include "utils/index_selfuncs.h"
+#include "utils/memutils.h"
+#include "utils/regproc.h"
+#include "utils/syscache.h"
+
+/*
+ * Stir handler function: return IndexAmRoutine with access method parameters
+ * and callbacks.
+ */
+Datum
+stirhandler(PG_FUNCTION_ARGS)
+{
+	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
+
+	/* Set STIR-specific strategy and procedure numbers */
+	amroutine->amstrategies = STIR_NSTRATEGIES;
+	amroutine->amsupport = STIR_NPROC;
+	amroutine->amoptsprocnum = STIR_OPTIONS_PROC;
+
+	/* STIR doesn't support most index operations */
+	amroutine->amcanorder = false;
+	amroutine->amcanorderbyop = false;
+	amroutine->amcanbackward = false;
+	amroutine->amcanunique = false;
+	amroutine->amcanmulticol = true;
+	amroutine->amoptionalkey = true;
+	amroutine->amsearcharray = false;
+	amroutine->amsearchnulls = false;
+	amroutine->amstorage = false;
+	amroutine->amclusterable = false;
+	amroutine->ampredlocks = false;
+	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
+	amroutine->amcaninclude = true;
+	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amparallelvacuumoptions = VACUUM_OPTION_NO_PARALLEL;
+	amroutine->amkeytype = InvalidOid;
+
+	/* Set up function callbacks */
+	amroutine->ambuild = stirbuild;
+	amroutine->ambuildempty = stirbuildempty;
+	amroutine->aminsert = stirinsert;
+	amroutine->aminsertcleanup = NULL;
+	amroutine->ambulkdelete = stirbulkdelete;
+	amroutine->amvacuumcleanup = stirvacuumcleanup;
+	amroutine->amcanreturn = NULL;
+	amroutine->amcostestimate = stircostestimate;
+	amroutine->amoptions = stiroptions;
+	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
+	amroutine->amvalidate = stirvalidate;
+	amroutine->amadjustmembers = NULL;
+	amroutine->ambeginscan = stirbeginscan;
+	amroutine->amrescan = stirrescan;
+	amroutine->amgettuple = NULL;
+	amroutine->amgetbitmap = NULL;
+	amroutine->amendscan = stirendscan;
+	amroutine->ammarkpos = NULL;
+	amroutine->amrestrpos = NULL;
+	amroutine->amestimateparallelscan = NULL;
+	amroutine->aminitparallelscan = NULL;
+	amroutine->amparallelrescan = NULL;
+
+	PG_RETURN_POINTER(amroutine);
+}
+
+/*
+ * Validates operator class for STIR index.
+ *
+ * STIR is not a real index, so validate may be skipped.
+ * But we do it just for consistency.
+ */
+bool
+stirvalidate(Oid opclassoid)
+{
+	bool result = true;
+	HeapTuple classtup;
+	Form_pg_opclass classform;
+	Oid opfamilyoid;
+	HeapTuple familytup;
+	Form_pg_opfamily familyform;
+	char *opfamilyname;
+	CatCList *oprlist;
+	int i;
+
+	/* Fetch opclass information */
+	classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
+	if (!HeapTupleIsValid(classtup))
+		elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
+	classform = (Form_pg_opclass) GETSTRUCT(classtup);
+
+	opfamilyoid = classform->opcfamily;
+
+	/* Fetch opfamily information */
+	familytup = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfamilyoid));
+	if (!HeapTupleIsValid(familytup))
+		elog(ERROR, "cache lookup failed for operator family %u", opfamilyoid);
+	familyform = (Form_pg_opfamily) GETSTRUCT(familytup);
+
+	opfamilyname = NameStr(familyform->opfname);
+
+	/* Fetch all operators and support functions of the opfamily */
+	oprlist = SearchSysCacheList1(AMOPSTRATEGY, ObjectIdGetDatum(opfamilyoid));
+
+	/* Check individual operators */
+	for (i = 0; i < oprlist->n_members; i++)
+	{
+		HeapTuple oprtup = &oprlist->members[i]->tuple;
+		Form_pg_amop oprform = (Form_pg_amop) GETSTRUCT(oprtup);
+
+		/* Check it's allowed strategy for stir */
+		if (oprform->amopstrategy < 1 ||
+			oprform->amopstrategy > STIR_NSTRATEGIES)
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with invalid strategy number %d",
+					        opfamilyname,
+					        format_operator(oprform->amopopr),
+					        oprform->amopstrategy)));
+			result = false;
+		}
+
+		/* stir doesn't support ORDER BY operators */
+		if (oprform->amoppurpose != AMOP_SEARCH ||
+			OidIsValid(oprform->amopsortfamily))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains invalid ORDER BY specification for operator %s",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+
+		/* Check operator signature --- same for all stir strategies */
+		if (!check_amop_signature(oprform->amopopr, BOOLOID,
+		                          oprform->amoplefttype,
+		                          oprform->amoprighttype))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with wrong signature",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+	}
+
+	ReleaseCatCacheList(oprlist);
+	ReleaseSysCache(familytup);
+	ReleaseSysCache(classtup);
+
+	return result;
+}
+
+/*
+ * Initialize meta-page of a STIR index.
+ * The skipInserts flag determines if new inserts will be accepted or skipped.
+ */
+void
+StirFillMetapage(Relation index, Page metaPage, bool skipInserts)
+{
+	StirMetaPageData *metadata;
+
+	StirInitPage(metaPage, STIR_META);
+	metadata = StirPageGetMeta(metaPage);
+	memset(metadata, 0, sizeof(StirMetaPageData));
+	metadata->magicNumber = STIR_MAGIC_NUMBER;
+	metadata->skipInserts = skipInserts;
+	((PageHeader) metaPage)->pd_lower = ((char *) metadata + sizeof(StirMetaPageData)) - (char *) metaPage;
+}
+
+/*
+ * Create and initialize the metapage for a STIR index.
+ * This is called during index creation.
+ */
+void
+StirInitMetapage(Relation index, ForkNumber forknum)
+{
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	/*
+	 * Make a new page; since it is the first page it should be associated with
+	 * block number 0 (STIR_METAPAGE_BLKNO).  No need to hold the extension
+	 * lock because there cannot be concurrent inserters yet.
+	 */
+	metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	START_CRIT_SECTION();
+	Assert(BufferGetBlockNumber(metaBuffer) == STIR_METAPAGE_BLKNO);
+
+	metaPage = BufferGetPage(metaBuffer);
+	StirFillMetapage(index, metaPage, forknum == INIT_FORKNUM);
+
+	MarkBufferDirty(metaBuffer);
+	END_CRIT_SECTION();
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * Initialize any page of a stir index.
+ */
+void
+StirInitPage(Page page, uint16 flags)
+{
+	StirPageOpaque opaque;
+
+	PageInit(page, BLCKSZ, sizeof(StirPageOpaqueData));
+
+	opaque = StirPageGetOpaque(page);
+	opaque->flags = flags;
+	opaque->stir_page_id = STIR_PAGE_ID;
+}
+
+/*
+ * Add a tuple to a STIR page. Returns false if the tuple doesn't fit.
+ * The tuple is added to the end of the page.
+ */
+static bool
+StirPageAddItem(Page page, StirTuple *tuple)
+{
+	StirTuple *itup;
+	StirPageOpaque opaque;
+	char *ptr;
+
+	/* We shouldn't be pointed to an invalid page */
+	Assert(!PageIsNew(page));
+
+	/* Does the new tuple fit on the page? */
+	if (StirPageGetFreeSpace(page) < sizeof(StirTuple))
+		return false;
+
+	/* Copy a new tuple to the end of the page */
+	opaque = StirPageGetOpaque(page);
+	itup = StirPageGetTuple(page, opaque->maxoff + 1);
+	memcpy(itup, tuple, sizeof(StirTuple));
+
+	/* Adjust maxoff and pd_lower */
+	opaque->maxoff++;
+	ptr = (char *) StirPageGetTuple(page, opaque->maxoff + 1);
+	((PageHeader) page)->pd_lower = ptr - page;
+
+	/* Assert we didn't overrun available space */
+	Assert(((PageHeader) page)->pd_lower <= ((PageHeader) page)->pd_upper);
+	return true;
+}
+
+/*
+ * Insert a new tuple into a STIR index.
+ */
+bool
+stirinsert(Relation index, Datum *values, bool *isnull,
+		  ItemPointer ht_ctid, Relation heapRel,
+		  IndexUniqueCheck checkUnique,
+		  bool indexUnchanged,
+		  struct IndexInfo *indexInfo)
+{
+	StirTuple itup;
+	StirMetaPageData *metaData;
+	Buffer buffer,
+			metaBuffer;
+	Page page;
+	BlockNumber blkNo;
+
+	itup.heapPtr = *ht_ctid;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+
+	for (;;)
+	{
+		LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if inserts are allowed */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+		blkNo = metaData->lastBlkNo;
+		/* Don't hold metabuffer lock while doing insert */
+		LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+
+		if (blkNo > 0)
+		{
+			buffer = ReadBuffer(index, blkNo);
+			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+			START_CRIT_SECTION();
+
+			page = BufferGetPage(buffer);
+
+			Assert(!PageIsNew(page));
+
+			/* Try to add tuple to the existing page */
+			if (StirPageAddItem(page, &itup))
+			{
+				/* Success!  Apply the change, clean up, and exit */
+				MarkBufferDirty(buffer);
+				END_CRIT_SECTION();
+
+				UnlockReleaseBuffer(buffer);
+				ReleaseBuffer(metaBuffer);
+				return false;
+			}
+
+			END_CRIT_SECTION();
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Need to add a new page - get exclusive lock on meta-page */
+		LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+
+		/* Re-check after acquiring exclusive lock */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+
+		/* Check if another backend already extended the index */
+		if (blkNo != metaData->lastBlkNo)
+		{
+			Assert(blkNo < metaData->lastBlkNo);
+			/* Someone else inserted the new page into the index, let's try again */
+			LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+			continue;
+		}
+		else
+		{
+			/* Must extend the file */
+			buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
+									   EB_LOCK_FIRST);
+			page = BufferGetPage(buffer);
+			START_CRIT_SECTION();
+
+			StirInitPage(page, 0);
+
+			if (!StirPageAddItem(page, &itup))
+			{
+				/* We shouldn't be here since we're inserting to an empty page */
+				elog(ERROR, "could not add new stir tuple to empty page");
+			}
+
+			/* Update meta-page with new last block number */
+			metaData->lastBlkNo = BufferGetBlockNumber(buffer);
+
+			MarkBufferDirty(metaBuffer);
+			MarkBufferDirty(buffer);
+
+			END_CRIT_SECTION();
+
+			UnlockReleaseBuffer(buffer);
+			UnlockReleaseBuffer(metaBuffer);
+
+			return false;
+		}
+	}
+}
+
+/*
+ * STIR doesn't support scans - these functions all error out
+ */
+IndexScanDesc
+stirbeginscan(Relation r, int nkeys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void
+stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+		  ScanKey orderbys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void stirendscan(IndexScanDesc scan)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+/*
+ * Build a STIR index - only allowed for auxiliary indexes.
+ * Just initializes the meta-page without any heap scans.
+ */
+IndexBuildResult *
+stirbuild(Relation heap, Relation index,
+						   struct IndexInfo *indexInfo)
+{
+	IndexBuildResult *result;
+
+	if (!indexInfo->ii_Auxiliary)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Building STIR indexes is not supported")));
+
+	StirInitMetapage(index, MAIN_FORKNUM);
+
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	result->heap_tuples = 0;
+	result->index_tuples = 0;
+	return result;
+}
+
+void stirbuildempty(Relation index)
+{
+	StirInitMetapage(index, INIT_FORKNUM);
+}
+
+IndexBulkDeleteResult *
+stirbulkdelete(IndexVacuumInfo *info,
+									 IndexBulkDeleteResult *stats,
+									 IndexBulkDeleteCallback callback,
+									 void *callback_state)
+{
+	Relation index = info->index;
+	BlockNumber blkno, npages;
+	Buffer buffer;
+	Page page;
+
+	/* For normal VACUUM, mark to skip inserts and warn about an index drop needed */
+	if (!info->validate_index)
+	{
+		StirMarkAsSkipInserts(index);
+
+		ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+		return NULL;
+	}
+
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	/*
+	 * Iterate over the pages. We don't care about concurrently added pages,
+	 * because the index is marked as not-ready for that moment and the index is not
+	 * used for insert.
+	 */
+	npages = RelationGetNumberOfBlocks(index);
+	for (blkno = STIR_HEAD_BLKNO; blkno < npages; blkno++)
+	{
+		StirTuple *itup, *itupEnd;
+
+		vacuum_delay_point(false);
+
+		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
+									RBM_NORMAL, info->strategy);
+
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+		page = BufferGetPage(buffer);
+
+		if (PageIsNew(page))
+		{
+			UnlockReleaseBuffer(buffer);
+			continue;
+		}
+
+		itup = StirPageGetTuple(page, FirstOffsetNumber);
+		itupEnd = StirPageGetTuple(page, OffsetNumberNext(StirPageGetMaxOffset(page)));
+		while (itup < itupEnd)
+		{
+			/* Do we have to delete this tuple? */
+			if (callback(&itup->heapPtr, callback_state))
+			{
+				ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("we never delete in stir")));
+			}
+
+			itup = StirPageGetNextTuple(itup);
+		}
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	return stats;
+}
+
+/*
+ * Mark a STIR index to skip future inserts
+ */
+void
+StirMarkAsSkipInserts(Relation index)
+{
+	StirMetaPageData *metaData;
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	START_CRIT_SECTION();
+
+	metaPage = BufferGetPage(metaBuffer);
+	metaData = StirPageGetMeta(metaPage);
+
+	if (!metaData->skipInserts)
+	{
+		metaData->skipInserts = true;
+		MarkBufferDirty(metaBuffer);
+	}
+	END_CRIT_SECTION();
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+IndexBulkDeleteResult *
+stirvacuumcleanup(IndexVacuumInfo *info,
+				  IndexBulkDeleteResult *stats)
+{
+	StirMarkAsSkipInserts(info->index);
+	ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+	return NULL;
+}
+
+bytea *
+stiroptions(Datum reloptions, bool validate)
+{
+	return NULL;
+}
+
+void
+stircostestimate(PlannerInfo *root, IndexPath *path,
+					 double loop_count, Cost *indexStartupCost,
+					 Cost *indexTotalCost, Selectivity *indexSelectivity,
+					 double *indexCorrelation, double *indexPages)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8b3c60d91f9..f5484c59d18 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3412,6 +3412,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
+	ivinfo.validate_index = true;
 
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 078a1cf5127..c33e43df1ec 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -313,6 +313,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_ParallelWorkers = 0;
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
+	indexInfo->ii_Auxiliary = false;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index eeed91be266..1fbe70d187c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -726,6 +726,7 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 			ivinfo.message_level = elevel;
 			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
 			ivinfo.strategy = vac_strategy;
+			ivinfo.validate_index = false;
 
 			stats = index_vacuum_cleanup(&ivinfo, NULL);
 
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 279108ca89f..dfdccfaf991 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -885,6 +885,7 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
 	ivinfo.estimated_count = pvs->shared->estimated_count;
 	ivinfo.num_heap_tuples = pvs->shared->reltuples;
 	ivinfo.strategy = pvs->bstrategy;
+	ivinfo.validate_index = false;
 
 	/* Update error traceback information */
 	pvs->indname = pstrdup(RelationGetRelationName(indrel));
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 3cd35c5c457..5359dab1176 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -875,6 +875,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
+	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..0356901ee10 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -58,6 +58,7 @@ typedef struct IndexVacuumInfo
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
 	double		num_heap_tuples;	/* tuples remaining in heap */
+	bool		validate_index; /* validating concurrently built index? */
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index 0bd17b30ca7..e2966165e6f 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -52,8 +52,9 @@ typedef enum relopt_kind
 	RELOPT_KIND_VIEW = (1 << 9),
 	RELOPT_KIND_BRIN = (1 << 10),
 	RELOPT_KIND_PARTITIONED = (1 << 11),
+	RELOPT_KIND_STIR = (1 << 12),
 	/* if you add a new kind, make sure you update "last_default" too */
-	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_STIR,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/access/stir.h b/src/include/access/stir.h
new file mode 100644
index 00000000000..18ee36506fd
--- /dev/null
+++ b/src/include/access/stir.h
@@ -0,0 +1,113 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.h
+ *	  header file for postgres stir access method implementation.
+ *
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * src/include/access/stir.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STIR_H
+#define STIR_H
+
+#include "access/amapi.h"
+#include "access/xlog.h"
+#include "access/generic_xlog.h"
+#include "access/itup.h"
+#include "nodes/pathnodes.h"
+
+/* Support procedures numbers */
+#define STIR_NPROC				0
+
+/* Scan strategies */
+#define STIR_NSTRATEGIES		1
+
+#define STIR_OPTIONS_PROC				0
+
+/* Macros for accessing stir page structures */
+#define StirPageGetOpaque(page) ((StirPageOpaque) PageGetSpecialPointer(page))
+#define StirPageGetMaxOffset(page) (StirPageGetOpaque(page)->maxoff)
+#define StirPageIsMeta(page) \
+	((StirPageGetOpaque(page)->flags & STIR_META) != 0)
+#define StirPageGetData(page)		((StirTuple *)PageGetContents(page))
+#define StirPageGetTuple(page, offset) \
+	((StirTuple *)(PageGetContents(page) \
+		+ sizeof(StirTuple) * ((offset) - 1)))
+#define StirPageGetNextTuple(tuple) \
+	((StirTuple *)((char *)(tuple) + sizeof(StirTuple)))
+
+
+
+/* Preserved page numbers */
+#define STIR_METAPAGE_BLKNO	(0)
+#define STIR_HEAD_BLKNO		(1) /* first data page */
+
+
+/* Opaque for stir pages */
+typedef struct StirPageOpaqueData
+{
+	OffsetNumber maxoff;		/* number of index tuples on the page */
+	uint16		flags;			/* see bit definitions below */
+	uint16		stir_page_id;	/* for identification of STIR indexes */
+} StirPageOpaqueData;
+
+/* Stir page flags */
+#define STIR_META		(1<<0)
+
+typedef StirPageOpaqueData *StirPageOpaque;
+
+#define STIR_PAGE_ID		0xFF84
+
+/* Metadata of stir index */
+typedef struct StirMetaPageData
+{
+	uint32		magicNumber;
+	BlockNumber	lastBlkNo;
+	bool		skipInserts;	/* should we just exit without any inserts? */
+} StirMetaPageData;
+
+/* Magic number to distinguish stir pages from others */
+#define STIR_MAGIC_NUMBER (0xDBAC0DEF)
+
+#define StirPageGetMeta(page)	((StirMetaPageData *) PageGetContents(page))
+
+typedef struct StirTuple
+{
+	ItemPointerData heapPtr;
+} StirTuple;
+
+#define StirPageGetFreeSpace(page) \
+	(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+		- StirPageGetMaxOffset(page) * (sizeof(StirTuple)) \
+		- MAXALIGN(sizeof(StirPageOpaqueData)))
+
+extern void StirFillMetapage(Relation index, Page metaPage, bool skipInserts);
+extern void StirInitMetapage(Relation index, ForkNumber forknum);
+extern void StirInitPage(Page page, uint16 flags);
+extern void StirMarkAsSkipInserts(Relation index);
+
+/* index access method interface functions */
+extern bool stirvalidate(Oid opclassoid);
+extern bool stirinsert(Relation index, Datum *values, bool *isnull,
+					 ItemPointer ht_ctid, Relation heapRel,
+					 IndexUniqueCheck checkUnique,
+					 bool indexUnchanged,
+					 struct IndexInfo *indexInfo);
+extern IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys);
+extern void stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+					 ScanKey orderbys, int norderbys);
+extern void stirendscan(IndexScanDesc scan);
+extern IndexBuildResult *stirbuild(Relation heap, Relation index,
+								 struct IndexInfo *indexInfo);
+extern void stirbuildempty(Relation index);
+extern IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+										   IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+										   void *callback_state);
+extern IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+											  IndexBulkDeleteResult *stats);
+extern bytea *stiroptions(Datum reloptions, bool validate);
+
+#endif			/* STIR_H */
diff --git a/src/include/catalog/pg_am.dat b/src/include/catalog/pg_am.dat
index 46d361047fe..8bd2c2b46ba 100644
--- a/src/include/catalog/pg_am.dat
+++ b/src/include/catalog/pg_am.dat
@@ -33,5 +33,8 @@
 { oid => '3580', oid_symbol => 'BRIN_AM_OID',
   descr => 'block range index (BRIN) access method',
   amname => 'brin', amhandler => 'brinhandler', amtype => 'i' },
+{ oid => '5555', oid_symbol => 'STIR_AM_OID',
+  descr => 'short term index replacement access method',
+  amname => 'stir', amhandler => 'stirhandler', amtype => 'i' },
 
 ]
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index df170b80840..a3457e749db 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -492,4 +492,8 @@
 
 # no brin opclass for the geometric types except box
 
+# allow any types for STIR
+{ opcmethod => 'stir', oid_symbol => 'ANY_STIR_OPS_OID', opcname => 'stir_ops',
+  opcfamily => 'stir/any_ops', opcintype => 'any'},
+
 ]
diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat
index 7a027c4810e..6ffc20a061c 100644
--- a/src/include/catalog/pg_opfamily.dat
+++ b/src/include/catalog/pg_opfamily.dat
@@ -308,5 +308,7 @@
   opfmethod => 'hash', opfname => 'multirange_ops' },
 { oid => '6158',
   opfmethod => 'gist', opfname => 'multirange_ops' },
+{ oid => '5558',
+  opfmethod => 'stir', opfname => 'any_ops' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fc8d82665b8..bac9a148700 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -935,6 +935,10 @@
   proname => 'brinhandler', provolatile => 'v',
   prorettype => 'index_am_handler', proargtypes => 'internal',
   prosrc => 'brinhandler' },
+{ oid => '5556', descr => 'short term index replacement access method handler',
+  proname => 'stirhandler', provolatile => 'v',
+  prorettype => 'index_am_handler', proargtypes => 'internal',
+  prosrc => 'stirhandler' },
 { oid => '3952', descr => 'brin: standalone scan new table pages',
   proname => 'brin_summarize_new_values', provolatile => 'v',
   proparallel => 'u', prorettype => 'int4', proargtypes => 'regclass',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0716c5a9aed..0f834889912 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -166,8 +166,8 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
- * during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
+ * are used only during index build; they're conventionally zeroed otherwise.
  * ----------------
  */
 typedef struct IndexInfo
@@ -227,7 +227,8 @@ typedef struct IndexInfo
 	bool		ii_WithoutOverlaps;
 	/* # of workers requested (excludes leader) */
 	int			ii_ParallelWorkers;
-
+	/* is auxiliary for concurrent index build? */
+	bool		ii_Auxiliary;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/utils/index_selfuncs.h b/src/include/utils/index_selfuncs.h
index 74793a1a19d..bf0e30dabe9 100644
--- a/src/include/utils/index_selfuncs.h
+++ b/src/include/utils/index_selfuncs.h
@@ -62,6 +62,14 @@ extern void spgcostestimate(struct PlannerInfo *root,
 							Selectivity *indexSelectivity,
 							double *indexCorrelation,
 							double *indexPages);
+extern void stircostestimate(struct PlannerInfo *root,
+							struct IndexPath *path,
+							double loop_count,
+							Cost *indexStartupCost,
+							Cost *indexTotalCost,
+							Selectivity *indexSelectivity,
+							double *indexCorrelation,
+							double *indexPages);
 extern void gincostestimate(struct PlannerInfo *root,
 							struct IndexPath *path,
 							double loop_count,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index 7ab6113c619..92c033a2010 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -173,7 +173,13 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
  spgist | can_exclude   | t
  spgist | can_include   | t
  spgist | bogus         | 
-(36 rows)
+ stir   | can_order     | f
+ stir   | can_unique    | f
+ stir   | can_multi_col | t
+ stir   | can_exclude   | f
+ stir   | can_include   | t
+ stir   | bogus         | 
+(42 rows)
 
 --
 -- additional checks for pg_index_column_has_property
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 6ff4d7ee901..9259679eea2 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2129,9 +2129,10 @@ FROM pg_opclass AS c1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS a1
                  WHERE a1.amopfamily = c1.opcfamily
                    AND binary_coercible(c1.opcintype, a1.amoplefttype));
- opcname | opcfamily 
----------+-----------
-(0 rows)
+ opcname  | opcfamily 
+----------+-----------
+ stir_ops |      5558
+(1 row)
 
 -- Check that each operator listed in pg_amop has an associated opclass,
 -- that is one whose opcintype matches oprleft (possibly by coercion).
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index c8f3932edf0..ecc2c2a6049 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5171,7 +5171,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA *
 List of access methods
@@ -5185,7 +5186,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA h*
 List of access methods
@@ -5210,9 +5212,9 @@ List of access methods
 
 \dA: extra argument "bar" ignored
 \dA+
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5221,12 +5223,13 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ *
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5235,7 +5238,8 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ h*
                      List of access methods
-- 
2.43.0



  [application/octet-stream] v31-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch (94.9K, 4-v31-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch)
  download | inline diff:
From a1b4fb5ced0e25ab86dfbb628b94ce0b69c23019 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 15:03:10 +0100
Subject: [PATCH v31 4/7] Use auxiliary indexes for concurrent index operations

Replace the second table full scan in concurrent index builds with an auxiliary index approach:
- create a STIR auxiliary index with the same predicate (if exists) as in main index
- use it to track tuples inserted during the first phase
- merge auxiliary index with main index during validation to catch up new index with any tuples missed during the first phase
- automatically drop auxiliary when main index is ready

To merge main and auxiliary indexes:
- index_bulk_delete called for both, TIDs put into tuplesort
- both tuplesort are being sorted
- both tuplesort scanned with two pointers looking for the TIDs present in auxiliary index, but absent in main one
- all such TIDs are put into tuplestore
- all TIDs in tuplestore are fetched using the stream, tuplestore used in heapam_index_validate_scan_read_stream_next to provide the next page to prefetch
- if fetched tuple is alive - it is inserted into the main index

This eliminates the need for a second full table scan during validation, improving performance, especially for large tables. Affects both CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY operations.
---
 doc/src/sgml/monitoring.sgml               |  26 +-
 doc/src/sgml/ref/create_index.sgml         |  34 +-
 doc/src/sgml/ref/reindex.sgml              |  40 +-
 src/backend/access/heap/README.HOT         |  13 +-
 src/backend/access/heap/heapam_handler.c   | 553 ++++++++++++++-------
 src/backend/catalog/index.c                | 308 ++++++++++--
 src/backend/catalog/system_views.sql       |  17 +-
 src/backend/commands/indexcmds.c           | 344 +++++++++++--
 src/backend/nodes/makefuncs.c              |   4 +-
 src/include/access/tableam.h               |  12 +-
 src/include/catalog/index.h                |   9 +-
 src/include/commands/progress.h            |  13 +-
 src/include/nodes/makefuncs.h              |   3 +-
 src/test/regress/expected/create_index.out |  42 ++
 src/test/regress/expected/indexing.out     |   3 +-
 src/test/regress/expected/rules.out        |  17 +-
 src/test/regress/sql/create_index.sql      |  21 +
 17 files changed, 1123 insertions(+), 336 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 462019a972c..b8031a3cb39 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6677,6 +6677,18 @@ FROM pg_stat_get_backend_idset() AS backendid;
        information for this phase.
       </entry>
      </row>
+     <row>
+      <entry><literal>waiting for writers to use auxiliary index</literal></entry>
+      <entry>
+       <command>CREATE INDEX CONCURRENTLY</command> or <command>REINDEX CONCURRENTLY</command> is waiting for transactions
+       with write locks that can potentially see the table to finish, to ensure use of auxiliary index for new tuples in
+       future transactions.
+       This phase is skipped when not in concurrent mode.
+       Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
+       and <structname>current_locker_pid</structname> contain the progress
+       information for this phase.
+      </entry>
+     </row>
      <row>
       <entry><literal>building index</literal></entry>
       <entry>
@@ -6717,13 +6729,12 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </entry>
      </row>
      <row>
-      <entry><literal>index validation: scanning table</literal></entry>
+      <entry><literal>index validation: merging indexes</literal></entry>
       <entry>
-       <command>CREATE INDEX CONCURRENTLY</command> is scanning the table
-       to validate the index tuples collected in the previous two phases.
+       <command>CREATE INDEX CONCURRENTLY</command> is merging content of auxiliary index with the target index.
        This phase is skipped when not in concurrent mode.
-       Columns <structname>blocks_total</structname> (set to the total size of the table)
-       and <structname>blocks_done</structname> contain the progress information for this phase.
+       Columns <structname>tuples_total</structname> (set to the number of tuples to be merged)
+       and <structname>tuples_done</structname> contain the progress information for this phase.
       </entry>
      </row>
      <row>
@@ -6740,8 +6751,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
      <row>
       <entry><literal>waiting for readers before marking dead</literal></entry>
       <entry>
-       <command>REINDEX CONCURRENTLY</command> is waiting for transactions
-       with read locks on the table to finish, before marking the old index dead.
+       <command>CREATE INDEX CONCURRENTLY</command> is waiting for transactions
+        with read locks on the table to finish, before marking the auxiliary index as dead.
+       <command>REINDEX CONCURRENTLY</command> is also waiting before marking the old index as dead.
        This phase is skipped when not in concurrent mode.
        Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
        and <structname>current_locker_pid</structname> contain the progress
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index bb7505d171b..12c88587a79 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -620,10 +620,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     out writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>CREATE INDEX</command>.
     When this option is used,
-    <productname>PostgreSQL</productname> must perform two scans of the table, and in
-    addition it must wait for all existing transactions that could potentially
-    modify or use the index to terminate.  Thus
-    this method requires more total work than a standard index build and takes
+    <productname>PostgreSQL</productname> must perform table scan followed by
+    validation phase, and in addition it must wait for all existing transactions
+    that could potentially modify or use the index to terminate.  Thus
+    this method requires more total work than a standard index build and may take
     significantly longer to complete.  However, since it allows normal
     operations to continue while the index is built, this method is useful for
     adding new indexes in a production environment.  Of course, the extra CPU
@@ -631,14 +631,14 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </para>
 
    <para>
-    In a concurrent index build, the index is actually entered as an
-    <quote>invalid</quote> index into
-    the system catalogs in one transaction, then two table scans occur in
-    two more transactions.  Before each table scan, the index build must
+    In a concurrent index build, the main and auxiliary indexes are actually
+    entered as an <quote>invalid</quote> index into
+    the system catalogs in one transaction, then two phases occur in
+    multiple transactions.  Before each phase, the index build must
     wait for existing transactions that have modified the table to terminate.
-    After the second scan, the index build must wait for any transactions
+    After the second phase, the index build must wait for any transactions
     that have a snapshot (see <xref linkend="mvcc"/>) predating the second
-    scan to terminate, including transactions used by any phase of concurrent
+    phase to terminate, including transactions used by any phase of concurrent
     index builds on other tables, if the indexes involved are partial or have
     columns that are not simple column references.
     Then finally the index can be marked <quote>valid</quote> and ready for use,
@@ -651,10 +651,11 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    <para>
     If a problem arises while scanning the table, such as a deadlock or a
     uniqueness violation in a unique index, the <command>CREATE INDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> index. This index
-    will be ignored for querying purposes because it might be incomplete;
-    however it will still consume update overhead. The <application>psql</application>
-    <command>\d</command> command will report such an index as <literal>INVALID</literal>:
+    command will fail but leave behind an <quote>invalid</quote> index and its
+    associated auxiliary index. These indexes
+    will be ignored for querying purposes because they might be incomplete;
+    however they will still consume update overhead. The <application>psql</application>
+    <command>\d</command> command will report such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -664,11 +665,12 @@ postgres=# \d tab
  col    | integer |           |          |
 Indexes:
     "idx" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     The recommended recovery
-    method in such cases is to drop the index and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>.  (Another possibility is
+    method in such cases is to drop these indexes and try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
     to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
    </para>
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 185cd75ca30..9e0248261ae 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -368,9 +368,8 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <productname>PostgreSQL</productname> supports rebuilding indexes with minimum locking
     of writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>REINDEX</command>. When this option
-    is used, <productname>PostgreSQL</productname> must perform two scans of the table
-    for each index that needs to be rebuilt and wait for termination of
-    all existing transactions that could potentially use the index.
+    is used, <productname>PostgreSQL</productname> must perform several steps to ensure data
+    consistency while allowing normal operations to continue.
     This method requires more total work than a standard index
     rebuild and takes significantly longer to complete as it needs to wait
     for unfinished transactions that might modify the index. However, since
@@ -388,7 +387,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <orderedlist>
      <listitem>
       <para>
-       A new transient index definition is added to the catalog
+       A new transient index definition and an auxiliary index are added to the catalog
        <literal>pg_index</literal>.  This definition will be used to replace
        the old index.  A <literal>SHARE UPDATE EXCLUSIVE</literal> lock at
        session level is taken on the indexes being reindexed as well as their
@@ -398,7 +397,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       A first pass to build the index is done for each new index.  Once the
+       The auxiliary index is marked as "ready for inserts", making
+       it visible to other sessions. This index efficiently tracks all new
+       tuples during the reindex process.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The new main index is built by scanning the table.  Once the
        index is built, its flag <literal>pg_index.indisready</literal> is
        switched to <quote>true</quote> to make it ready for inserts, making it
        visible to other sessions once the transaction that performed the build
@@ -409,9 +416,9 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       Then a second pass is performed to add tuples that were added while the
-       first pass was running.  This step is also done in a separate
-       transaction for each index.
+       A validation phase merges any missing entries from the auxiliary index
+       into the main index, ensuring all concurrent changes are captured.
+       This step is also done in a separate transaction for each index.
       </para>
      </listitem>
 
@@ -428,7 +435,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes have <literal>pg_index.indisready</literal> switched to
+       The old and auxiliary indexes have <literal>pg_index.indisready</literal> switched to
        <quote>false</quote> to prevent any new tuple insertions, after waiting
        for running queries that might reference the old index to complete.
       </para>
@@ -436,7 +443,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes are dropped.  The <literal>SHARE UPDATE
+       The old and auxiliary indexes are dropped.  The <literal>SHARE UPDATE
        EXCLUSIVE</literal> session locks for the indexes and the table are
        released.
       </para>
@@ -447,11 +454,11 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
    <para>
     If a problem arises while rebuilding the indexes, such as a
     uniqueness violation in a unique index, the <command>REINDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> new index in addition to
-    the pre-existing one. This index will be ignored for querying purposes
-    because it might be incomplete; however it will still consume update
+    command will fail but leave behind an <quote>invalid</quote> new index and its auxiliary index in addition to
+    the pre-existing one. These indexes will be ignored for querying purposes
+    because they might be incomplete; however they will still consume update
     overhead. The <application>psql</application> <command>\d</command> command will report
-    such an index as <literal>INVALID</literal>:
+    such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -462,12 +469,13 @@ postgres=# \d tab
 Indexes:
     "idx" btree (col)
     "idx_ccnew" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     If the index marked <literal>INVALID</literal> is suffixed
-    <literal>_ccnew</literal>, then it corresponds to the transient
+    <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
-    recovery method is to drop it using <literal>DROP INDEX</literal>,
+    recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
     then attempt <command>REINDEX CONCURRENTLY</command> again.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 74e407f375a..b1c797517ee 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -375,6 +375,11 @@ constraint on which updates can be HOT.  Other transactions must include
 such an index when determining HOT-safety of updates, even though they
 must ignore it for both insertion and searching purposes.
 
+Also, special auxiliary index is created the same way. It is marked as
+"ready for inserts" without any actual table scan. Its purpose is to collect
+new tuples inserted into table while our target index is still "not ready
+for inserts".
+
 We must do this to avoid making incorrect index entries.  For example,
 suppose we are building an index on column X and we make an index entry for
 a non-HOT tuple with X=1.  Then some other backend, unaware that X is an
@@ -394,10 +399,10 @@ entry at the root of the HOT-update chain but we use the key value from the
 live tuple.
 
 We mark the index open for inserts (but still not ready for reads) then
-we again wait for transactions which have the table open.  Then we take
-a second reference snapshot and validate the index.  This searches for
-tuples missing from the index, and inserts any missing ones.  Again,
-the index entries have to have TIDs equal to HOT-chain root TIDs, but
+we again wait for transactions which have the table open.  Then validate
+the index.  This searches for tuples missing from the index in auxiliary
+index, and inserts any missing ones if they are visible to reference snapshot.
+Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..f90310a1ab8 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -41,15 +41,17 @@
 #include "storage/bufpage.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/tuplesort.h"
+#include "utils/tuplestore.h"
 
 static void reform_and_rewrite_tuple(HeapTuple tuple,
-									 Relation OldHeap, Relation NewHeap,
-									 Datum *values, bool *isnull, RewriteState rwstate);
+                                     Relation OldHeap, Relation NewHeap,
+                                     Datum *values, bool *isnull, RewriteState rwstate);
 
 static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer,
 								   HeapTuple tuple,
@@ -1768,242 +1770,409 @@ heapam_index_build_range_scan(Relation heapRelation,
 	return reltuples;
 }
 
+/*
+ * Calculate set difference (relative complement) of main and aux
+ * sets.
+ *
+ * All records which are present in auxiliary tuplesort but not in
+ * main are added to the store.
+ *
+ * In set theory notation store = aux - main or store = aux / main.
+ *
+ * returns number of items added to store
+ */
+static int64
+heapam_index_validate_tuplesort_difference(Tuplesortstate *main,
+										   Tuplesortstate *aux,
+										   Tuplestorestate *store)
+{
+	int64		num = 0;
+	/* state variables for the merge */
+	ItemPointer	indexcursor = NULL,
+					auxindexcursor = NULL;
+	ItemPointerData decoded,
+					auxdecoded;
+	bool			tuplesort_empty = false,
+					auxtuplesort_empty = false;
+
+	/* Initialize pointers. */
+	ItemPointerSetInvalid(&decoded);
+	ItemPointerSetInvalid(&auxdecoded);
+
+	/*
+	 * Main loop: we step through the auxiliary sort (auxState->tuplesort),
+	 * which holds TIDs that must compared to those from the "main" sort
+	 * (state->tuplesort).
+	 */
+	while (!auxtuplesort_empty)
+	{
+		Datum		ts_val;
+		bool		ts_isnull;
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Attempt to fetch the next TID from the auxiliary sort. If it's
+		 * empty, we set auxindexcursor to NULL.
+		 */
+		auxtuplesort_empty = !tuplesort_getdatum(aux, true,
+												 false, &ts_val, &ts_isnull,
+												 NULL);
+		Assert(auxtuplesort_empty || !ts_isnull);
+		if (!auxtuplesort_empty)
+		{
+			itemptr_decode(&auxdecoded, DatumGetInt64(ts_val));
+			auxindexcursor = &auxdecoded;
+		}
+		else
+		{
+			auxindexcursor = NULL;
+		}
+
+		/*
+		 * If the auxiliary sort is not yet empty, we now try to synchronize
+		 * the "main" sort cursor (indexcursor) with auxindexcursor. We advance
+		 * the main sort cursor until we've reached or passed the auxiliary TID.
+		 */
+		if (!auxtuplesort_empty)
+		{
+			/*
+			 * Move the main sort forward while:
+			 *   (1) It's not exhausted (tuplesort_empty == false), and
+			 *   (2) Either indexcursor is NULL (first iteration) or
+			 *       indexcursor < auxindexcursor in TID order.
+			 */
+			while (!tuplesort_empty && (indexcursor == NULL || /* null on first time here */
+						ItemPointerCompare(indexcursor, auxindexcursor) < 0))
+			{
+				/*
+				 * Get the next TID from the main sort. If it's empty,
+				 * we set indexcursor to NULL.
+				 */
+				tuplesort_empty = !tuplesort_getdatum(main, true,
+													  false, &ts_val, &ts_isnull,
+													  NULL);
+				Assert(tuplesort_empty || !ts_isnull);
+
+				if (!tuplesort_empty)
+				{
+					itemptr_decode(&decoded, DatumGetInt64(ts_val));
+					indexcursor = &decoded;
+				}
+				else
+				{
+					indexcursor = NULL;
+				}
+
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * Now, if either:
+			 *  - the main sort is empty, or
+			 *  - indexcursor > auxindexcursor,
+			 *
+			 * then auxindexcursor identifies a TID that doesn't appear in
+			 * the main sort. We likely need to insert it
+			 * into the target index if it’s visible in the heap.
+			 */
+			if (tuplesort_empty || ItemPointerCompare(indexcursor, auxindexcursor) > 0)
+			{
+				tuplestore_putdatum(store, Int64GetDatum(itemptr_encode(auxindexcursor)));
+				num++;
+			}
+		}
+	}
+
+	return num;
+}
+
+typedef struct ValidateIndexScanState
+{
+	Tuplestorestate		*store;
+	BlockNumber			prev_block_number;
+	OffsetNumber		prev_offset_number;
+} ValidateIndexScanState;
+
+/*
+ * This is ReadStreamBlockNumberCB implementation which works as follows:
+ *
+ * 1) It iterates over a sorted tuplestore, where each element is an encoded
+ *    ItemPointer
+ *
+ * 2) It returns the current BlockNumber and collects all OffsetNumbers
+ *    for that block in per_buffer_data.
+ *
+ * 3) Once the code encounters a new BlockNumber, it stops reading more
+ *    offsets and saves the OffsetNumber of the new block for the next call.
+ *
+ * 4) The list of offsets for a block is always terminated with InvalidOffsetNumber.
+ *
+ * This function is intended to be repeatedly called, each time returning
+ * the next block and its corresponding set of offsets.
+ */
+static BlockNumber
+heapam_index_validate_scan_read_stream_next(
+								  ReadStream *stream,
+								  void *void_callback_private_data,
+								  void *void_per_buffer_data
+								  )
+{
+	bool should_free;
+	Datum datum;
+	BlockNumber result = InvalidBlockNumber;
+	int i = 0;
+
+	/*
+	 * Retrieve the specialized callback state and the output buffer.
+	 * callback_private_data keeps track of the previous block and offset
+	 * from a prior invocation, if any.
+	 */
+	ValidateIndexScanState *callback_private_data = void_callback_private_data;
+	OffsetNumber *per_buffer_data = void_per_buffer_data;
+
+	/*
+	 * If there is a "leftover" offset number from the previous invocation,
+	 * it means we had switched to a new block in the middle of the last call.
+	 * We place that leftover offset number into the buffer first.
+	 */
+	if (callback_private_data->prev_offset_number != InvalidOffsetNumber)
+	{
+		Assert(callback_private_data->prev_block_number != InvalidBlockNumber);
+		/*
+		 * 'result' is the block number to return. We set it to the block
+		 * from the previous leftover offset.
+		 */
+		result = callback_private_data->prev_block_number;
+		/* Place leftover offset number in the output buffer. */
+		per_buffer_data[i++] = callback_private_data->prev_offset_number;
+		/*
+		 * Clear the leftover offset number so it won't be reused unless
+		 * we encounter another block change.
+		 */
+		callback_private_data->prev_offset_number = InvalidOffsetNumber;
+	}
+
+	/*
+	 * Read from the tuplestore until we either run out of tuples or we
+	 * encounter a block change. For each tuple:
+	 *
+	 *   1) Decode its block/offset from the Datum.
+	 *   2) If it's the first time in this call (prev_block_number == InvalidBlockNumber),
+	 *      initialize prev_block_number.
+	 *   3) If the block number matches the current block, collect the offset.
+	 *   4) If the block number differs, save that offset as leftover and break
+	 *      so that the next call can handle the new block.
+	 */
+	while (tuplestore_getdatum(callback_private_data->store, true, &should_free, &datum))
+	{
+		BlockNumber next_block_number;
+		ItemPointerData next_data;
+
+		/* Decode the datum into an ItemPointer (block + offset). */
+		itemptr_decode(&next_data, DatumGetInt64(datum));
+		next_block_number = ItemPointerGetBlockNumber(&next_data);
+
+		/*
+		 * If we haven't set a block number yet this round, initialize it
+		 * using the first tuple we read.
+		 */
+		if (callback_private_data->prev_block_number == InvalidBlockNumber)
+			callback_private_data->prev_block_number = next_block_number;
+
+		/*
+		 * Always set the result to be the "current" block number
+		 * we are filling offsets for.
+		 */
+		result = callback_private_data->prev_block_number;
+
+		/*
+		 * If this tuple is from the same block, just store its offset
+		 * in our per_buffer_data array.
+		 */
+		if (next_block_number == callback_private_data->prev_block_number)
+		{
+			per_buffer_data[i++] = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+		}
+		else
+		{
+			/*
+			 * If the block just changed, store the offset of the new block
+			 * as leftover for the next invocation and break out.
+			 */
+			callback_private_data->prev_block_number = next_block_number;
+			callback_private_data->prev_offset_number = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+
+			/* Break to let the next call handle the new block. */
+			break;
+		}
+	}
+
+	/*
+	 * Terminate the list of offsets for this block with an InvalidOffsetNumber.
+	 */
+	per_buffer_data[i] = InvalidOffsetNumber;
+	return result;
+}
+
 static void
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
 						   Snapshot snapshot,
-						   ValidateIndexState *state)
+						   ValidateIndexState *state,
+						   ValidateIndexState *auxState)
 {
-	TableScanDesc scan;
-	HeapScanDesc hscan;
-	HeapTuple	heapTuple;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-	BlockNumber previous_blkno = InvalidBlockNumber;
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
+
+	TupleTableSlot  *slot;
+	EState			*estate;
+	ExprContext		*econtext;
+	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	int64			num_to_check;
+	Tuplestorestate *tuples_for_check;
+	ValidateIndexScanState callback_private_data;
+
+	Buffer buf;
+	OffsetNumber* tuples;
+	ReadStream *read_stream;
+
+	/* Use 10% of memory for tuple store. */
+	int		store_work_mem_part = maintenance_work_mem / 10;
+
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting
+	 * item pointers.  This can be significantly faster, primarily because TID
+	 * is a pass-by-reference type on all platforms, whereas int8 is
+	 * pass-by-value on most platforms.
+	 */
+	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
 	/*
 	 * sanity checks
 	 */
 	Assert(OidIsValid(indexRelation->rd_rel->relam));
 
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
+	num_to_check = heapam_index_validate_tuplesort_difference(state->tuplesort,
+														 auxState->tuplesort,
+														 tuples_for_check);
+
+	/* It is our responsibility to close tuple sort as fast as we can */
+	tuplesort_end(state->tuplesort);
+	tuplesort_end(auxState->tuplesort);
+
+	state->tuplesort = auxState->tuplesort = NULL;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+									&TTSOpsBufferHeapTuple);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
 
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+	callback_private_data.prev_block_number = InvalidBlockNumber;
+	callback_private_data.store = tuples_for_check;
+	callback_private_data.prev_offset_number = InvalidOffsetNumber;
 
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	scan = table_beginscan_strat(heapRelation,	/* relation */
-								 snapshot,	/* snapshot */
-								 0, /* number of keys */
-								 NULL,	/* scan key */
-								 true,	/* buffer access strategy OK */
-								 false);	/* syncscan not OK */
-	hscan = (HeapScanDesc) scan;
+	read_stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING,
+														 bstrategy,
+														 heapRelation, MAIN_FORKNUM,
+														 heapam_index_validate_scan_read_stream_next,
+														 &callback_private_data,
+														 (MaxHeapTuplesPerPage + 1) * sizeof(OffsetNumber));
 
-	pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_TOTAL,
-								 hscan->rs_nblocks);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_TOTAL, num_to_check);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE, 0);
 
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while ((buf = read_stream_next_buffer(read_stream, (void*) &tuples)) != InvalidBuffer)
 	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
+		HeapTupleData	heap_tuple_data[MaxHeapTuplesPerPage];
+		int i;
+		OffsetNumber off;
+		BlockNumber block_number;
 
 		CHECK_FOR_INTERRUPTS();
 
-		state->htups += 1;
+		LockBuffer(buf, BUFFER_LOCK_SHARE);
+		block_number = BufferGetBlockNumber(buf);
 
-		if ((previous_blkno == InvalidBlockNumber) ||
-			(hscan->rs_cblock != previous_blkno))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
-										 hscan->rs_cblock);
-			previous_blkno = hscan->rs_cblock;
+			ItemPointerData tid;
+			bool		all_dead, found;
+			ItemPointerSet(&tid, block_number, off);
+
+			found = heap_hot_search_buffer(&tid, heapRelation, buf, snapshot,
+										   &heap_tuple_data[i], &all_dead, true);
+			if (!found)
+				ItemPointerSetInvalid(&heap_tuple_data[i].t_self);
+			i++;
+			state->htups += 1;
 		}
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		/*
-		 * As commented in table_index_build_scan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
-		 */
-		if (hscan->rs_cblock != root_blkno)
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
-		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
-		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
-		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
+			if (ItemPointerIsValid(&heap_tuple_data[i].t_self))
 			{
+				ItemPointerData root_tid;
+				ItemPointerSet(&root_tid, block_number, off);
+
+				/* Reset the per-tuple memory context for the next fetch. */
+				MemoryContextReset(econtext->ecxt_per_tuple_memory);
+				ExecStoreBufferHeapTuple(&heap_tuple_data[i], slot, buf);
+
+				/* Compute the key values and null flags for this tuple. */
+				FormIndexDatum(indexInfo,
+							   slot,
+							   estate,
+							   values,
+							   isnull);
+
 				/*
-				 * Remember index items seen earlier on the current heap page
+				 * Insert the tuple into the target index.
 				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+				index_insert(indexRelation,
+							 values,
+							 isnull,
+							 &root_tid, /* insert root tuple */
+							 heapRelation,
+							 indexInfo->ii_Unique ?
+							 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+							 false,
+							 indexInfo);
+
+				state->tups_inserted += 1;
 			}
 
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  false, &ts_val, &ts_isnull,
-												  NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_TUPLES_DONE, 1);
+			i++;
 		}
 
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 false,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
+		ReleaseBuffer(buf);
 	}
 
-	table_endscan(scan);
-
 	ExecDropSingleTupleTableSlot(slot);
 
 	FreeExecutorState(estate);
 
+	read_stream_end(read_stream);
+	tuplestore_end(tuples_for_check);
+
+	FreeAccessStrategy(bstrategy);
+
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f5484c59d18..31f92b97580 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -715,11 +715,16 @@ UpdateIndexRelation(Oid indexoid,
  *			already exists.
  *		INDEX_CREATE_PARTITIONED:
  *			create a partitioned index (table must be partitioned)
+ *		INDEX_CREATE_AUXILIARY:
+ *			mark index as auxiliary index
  * constr_flags: flags passed to index_constraint_create
  *		(only if INDEX_CREATE_ADD_CONSTRAINT is set)
  * allow_system_table_mods: allow table to be a system catalog
  * is_internal: if true, post creation hook for new index
  * constraintId: if not NULL, receives OID of created constraint
+ * relpersistence: persistence level to use for index. In most of the
+ *		cases it should be equal to the persistence level of the table,
+ *		auxiliary indexes are only exception here.
  *
  * Returns the OID of the created index.
  */
@@ -760,6 +765,7 @@ index_create(Relation heapRelation,
 	bool		invalid = (flags & INDEX_CREATE_INVALID) != 0;
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
+	bool		auxiliary = (flags & INDEX_CREATE_AUXILIARY) != 0;
 	char		relkind;
 	TransactionId relfrozenxid;
 	MultiXactId relminmxid;
@@ -785,7 +791,10 @@ index_create(Relation heapRelation,
 	namespaceId = RelationGetNamespace(heapRelation);
 	shared_relation = heapRelation->rd_rel->relisshared;
 	mapped_relation = RelationIsMapped(heapRelation);
-	relpersistence = heapRelation->rd_rel->relpersistence;
+	if (auxiliary)
+		relpersistence = RELPERSISTENCE_UNLOGGED; /* aux indexes are always unlogged */
+	else
+		relpersistence = heapRelation->rd_rel->relpersistence;
 
 	/*
 	 * check parameters
@@ -793,6 +802,11 @@ index_create(Relation heapRelation,
 	if (indexInfo->ii_NumIndexAttrs < 1)
 		elog(ERROR, "must index at least one column");
 
+	if (indexInfo->ii_Am == STIR_AM_OID && !auxiliary)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("user-defined indexes with STIR access method are not supported")));
+
 	if (!allow_system_table_mods &&
 		IsSystemRelation(heapRelation) &&
 		IsNormalProcessingMode())
@@ -1398,7 +1412,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							false,	/* not ready for inserts */
 							true,
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							false);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1473,6 +1488,154 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	return newIndexId;
 }
 
+/*
+ * index_concurrently_create_aux
+ *
+ * Create concurrently an auxiliary index based on the definition of the one
+ * provided by caller.  The index is inserted into catalogs and needs to be
+ * built later on. This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ */
+Oid
+index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
+							   Oid tablespaceOid, const char *newName)
+{
+	Relation	indexRelation;
+	IndexInfo  *oldInfo,
+			*newInfo;
+	Oid			newIndexId = InvalidOid;
+	HeapTuple	indexTuple;
+
+	List	   *indexColNames = NIL;
+	List	   *indexExprs = NIL;
+	List	   *indexPreds = NIL;
+
+	Oid *auxOpclassIds;
+	int16 *auxColoptions;
+
+	indexRelation = index_open(mainIndexId, RowExclusiveLock);
+
+	/* The new index needs some information from the old index */
+	oldInfo = BuildIndexInfo(indexRelation);
+
+	/*
+	 * Build of an auxiliary index with exclusion constraints is not
+	 * supported.
+	 */
+	if (oldInfo->ii_ExclusionOps != NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("auxiliary index creation for exclusion constraints is not supported")));
+
+	/* Get the array of class and column options IDs from index info */
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(mainIndexId));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", mainIndexId);
+
+
+	/*
+	 * Fetch the list of expressions and predicates directly from the
+	 * catalogs.  This cannot rely on the information from IndexInfo of the
+	 * old index as these have been flattened for the planner.
+	 */
+	if (oldInfo->ii_Expressions != NIL)
+	{
+		Datum		exprDatum;
+		char	   *exprString;
+
+		exprDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indexprs);
+		exprString = TextDatumGetCString(exprDatum);
+		indexExprs = (List *) stringToNode(exprString);
+		pfree(exprString);
+	}
+	if (oldInfo->ii_Predicate != NIL)
+	{
+		Datum		predDatum;
+		char	   *predString;
+
+		predDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indpred);
+		predString = TextDatumGetCString(predDatum);
+		indexPreds = (List *) stringToNode(predString);
+
+		/* Also convert to implicit-AND format */
+		indexPreds = make_ands_implicit((Expr *) indexPreds);
+		pfree(predString);
+	}
+
+	/*
+	 * Build the index information for the new index.  Note that rebuild of
+	 * indexes with exclusion constraints is not supported, hence there is no
+	 * need to fill all the ii_Exclusion* fields.
+	 */
+	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
+							oldInfo->ii_NumIndexKeyAttrs,
+							STIR_AM_OID, /* special AM for aux indexes */
+							indexExprs,
+							indexPreds,
+							false,	/* aux index are not unique */
+							oldInfo->ii_NullsNotDistinct,
+							false,	/* not ready for inserts */
+							true,
+							false,	/* aux are not summarizing */
+							false,	/* aux are not without overlaps */
+							true	/* auxiliary */);
+
+	/*
+	 * Extract the list of column names and the column numbers for the new
+	 * index information.  All this information will be used for the index
+	 * creation.
+	 */
+	for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+	{
+		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
+		Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+		indexColNames = lappend(indexColNames, NameStr(att->attname));
+		newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+	}
+
+	auxOpclassIds = palloc0(sizeof(Oid) * newInfo->ii_NumIndexAttrs);
+	auxColoptions = palloc0(sizeof(int16) * newInfo->ii_NumIndexAttrs);
+
+	/* Fill with "any ops" */
+	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
+	{
+		auxOpclassIds[i] = ANY_STIR_OPS_OID;
+		auxColoptions[i] = 0;
+	}
+
+	newIndexId = index_create(heapRelation,
+							  newName,
+							  InvalidOid,    /* indexRelationId */
+							  InvalidOid,    /* parentIndexRelid */
+							  InvalidOid,    /* parentConstraintId */
+							  InvalidRelFileNumber, /* relFileNumber */
+							  newInfo,
+							  indexColNames,
+							  STIR_AM_OID,
+							  tablespaceOid,
+							  indexRelation->rd_indcollation,
+							  auxOpclassIds,
+							  NULL,
+							  auxColoptions,
+							  NULL,
+							  (Datum) 0,
+							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT | INDEX_CREATE_AUXILIARY,
+							  0,
+							  true, /* allow table to be a system catalog? */
+							  false,    /* is_internal? */
+							  NULL);
+
+	/* Close the relations used and clean up */
+	index_close(indexRelation, NoLock);
+	ReleaseSysCache(indexTuple);
+
+	return newIndexId;
+}
+
 /*
  * index_concurrently_build
  *
@@ -2453,7 +2616,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2513,7 +2677,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3289,12 +3454,21 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * We do a concurrent index build by first inserting the catalog entry for the
  * index via index_create(), marking it not indisready and not indisvalid.
+ * Then we create special auxiliary index the same way. It based on STIR AM.
  * Then we commit our transaction and start a new one, then we wait for all
  * transactions that could have been modifying the table to terminate.  Now
- * we know that any subsequently-started transactions will see the index and
+ * we know that any subsequently-started transactions will see indexes and
  * honor its constraints on HOT updates; so while existing HOT-chains might
  * be broken with respect to the index, no currently live tuple will have an
- * incompatible HOT update done to it.  We now build the index normally via
+ * incompatible HOT update done to it.
+ *
+ * After that, we build the auxiliary index. It is fast operation without any actual
+ * table scan. As result, we have empty STIR index. We commit transaction and
+ * again wait for all transactions that could have been modifying the table
+ * to terminate. At that moment all new tuples are going to be inserted into
+ * auxiliary index.
+ *
+ * We now build the index normally via
  * index_build(), while holding a weak lock that allows concurrent
  * insert/update/delete.  Also, we index only tuples that are valid
  * as of the start of the scan (see table_index_build_scan), whereas a normal
@@ -3304,14 +3478,17 @@ IndexCheckExclusion(Relation heapRelation,
  * bogus unique-index failures due to concurrent UPDATEs (we might see
  * different versions of the same row as being valid when we pass over them,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
- * does not contain any tuples added to the table while we built the index.
+ * does not contain any tuples added to the table while we built the index
+ * (but these tuples contained in auxiliary index).
  *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
- * commit the second transaction and start a third.  Again we wait for all
+ * commit the third transaction and start a fourth.  Again we wait for all
  * transactions that could have been modifying the table to terminate.  Now
  * we know that any subsequently-started transactions will see the index and
- * insert their new tuples into it.  We then take a new reference snapshot
- * which is passed to validate_index().  Any tuples that are valid according
+ * insert their new tuples into it. At the same moment we clear "indisready" for
+ * auxiliary index, since it is no more required to be updated.
+ *
+ * We then take a new reference snapshot, any tuples that are valid according
  * to this snap, but are not in the index, must be added to the index.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
@@ -3319,12 +3496,14 @@ IndexCheckExclusion(Relation heapRelation,
  * that might care about them before we mark the index valid.)
  *
  * validate_index() works by first gathering all the TIDs currently in the
- * index, using a bulkdelete callback that just stores the TIDs and doesn't
+ * indexes, using a bulkdelete callback that just stores the TIDs and doesn't
  * ever say "delete it".  (This should be faster than a plain indexscan;
  * also, not all index AMs support full-index indexscan.)  Then we sort the
- * TIDs, and finally scan the table doing a "merge join" against the TID list
- * to see which tuples are missing from the index.  Thus we will ensure that
- * all tuples valid according to the reference snapshot are in the index.
+ * TIDs of both auxiliary and target indexes, and doing a "merge join" against
+ * the TID lists to see which tuples from auxiliary index are missing from the
+ * target index.  Thus we will ensure that all tuples valid according to the
+ * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
  * tuple that is already dead or is in process of being deleted, and we
@@ -3342,22 +3521,26 @@ IndexCheckExclusion(Relation heapRelation,
  * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
  * transactions will be able to use it for queries.
  *
- * Doing two full table scans is a brute-force strategy.  We could try to be
- * cleverer, eg storing new tuples in a special area of the table (perhaps
- * making the table append-only by setting use_fsm).  However that would
- * add yet more locking issues.
+ * Also, some actions to concurrent drop the auxiliary index are performed.
  */
 void
-validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 {
 	Relation	heapRelation,
-				indexRelation;
+				indexRelation,
+				auxIndexRelation;
 	IndexInfo  *indexInfo;
-	IndexVacuumInfo ivinfo;
-	ValidateIndexState state;
+	IndexVacuumInfo ivinfo, auxivinfo;
+	ValidateIndexState state, auxState;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	/* Use 80% of maintenance_work_mem to target index sorting and
+	 * 10% rest for auxiliary.
+	 *
+	 * Rest 10% will be used for tuplestore later. */
+	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
+	int			aux_work_mem_part = maintenance_work_mem / 10;
 
 	{
 		const int	progress_index[] = {
@@ -3390,6 +3573,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	RestrictSearchPath();
 
 	indexRelation = index_open(indexId, RowExclusiveLock);
+	auxIndexRelation = index_open(auxIndexId, RowExclusiveLock);
 
 	/*
 	 * Fetch info needed for index_insert.  (You might think this should be
@@ -3414,15 +3598,49 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.strategy = NULL;
 	ivinfo.validate_index = true;
 
+	/*
+	 * Copy all info to auxiliary info, changing only relation.
+	 */
+	auxivinfo = ivinfo;
+	auxivinfo.index = auxIndexRelation;
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
 	 * is a pass-by-reference type on all platforms, whereas int8 is
 	 * pass-by-value on most platforms.
 	 */
+	auxState.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
+										   InvalidOid, false,
+										   aux_work_mem_part,
+										   NULL, TUPLESORT_NONE);
+	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
+
+	(void) index_bulk_delete(&auxivinfo, NULL,
+							 validate_index_callback, &auxState);
+	/* If aux index is empty, merge may be skipped */
+	if (auxState.itups == 0)
+	{
+		tuplesort_end(auxState.tuplesort);
+		auxState.tuplesort = NULL;
+
+		/* Roll back any GUC changes executed by index functions */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+
+		/* Close rels, but keep locks */
+		index_close(auxIndexRelation, NoLock);
+		index_close(indexRelation, NoLock);
+		table_close(heapRelation, NoLock);
+
+		return;
+	}
+
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
 											InvalidOid, false,
-											maintenance_work_mem,
+											(int) main_work_mem_part,
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
@@ -3445,27 +3663,30 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	tuplesort_performsort(auxState.tuplesort);
 
 	/*
-	 * Now scan the heap and "merge" it with the index
+	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN);
+								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
 	table_index_validate_scan(heapRelation,
 							  indexRelation,
 							  indexInfo,
 							  snapshot,
-							  &state);
+							  &state,
+							  &auxState);
 
-	/* Done with tuplesort object */
-	tuplesort_end(state.tuplesort);
+	/* Tuple sort closed by table_index_validate_scan */
+	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
 
 	/* Make sure to release resources cached in indexInfo (if needed). */
 	index_insert_cleanup(indexRelation, indexInfo);
 
 	elog(DEBUG2,
-		 "validate_index found %.0f heap tuples, %.0f index tuples; inserted %.0f missing tuples",
-		 state.htups, state.itups, state.tups_inserted);
+		 "validate_index fetched %.0f heap tuples, %.0f index tuples;"
+						" %.0f aux index tuples; inserted %.0f missing tuples",
+		 state.htups, state.itups, auxState.itups, state.tups_inserted);
 
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
@@ -3474,6 +3695,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
 	/* Close rels, but keep locks */
+	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
 }
@@ -3534,6 +3756,12 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			Assert(!indexForm->indisvalid);
 			indexForm->indisvalid = true;
 			break;
+		case INDEX_DROP_CLEAR_READY:
+			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
+			Assert(indexForm->indisready);
+			Assert(!indexForm->indisvalid);
+			indexForm->indisready = false;
+			break;
 		case INDEX_DROP_CLEAR_VALID:
 
 			/*
@@ -3805,6 +4033,13 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		indexInfo->ii_ExclusionStrats = NULL;
 	}
 
+	/* Auxiliary indexes are not allowed to be rebuilt */
+	if (indexInfo->ii_Auxiliary)
+		ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("reindex of auxiliary index \"%s\" not supported",
+					RelationGetRelationName(iRel))));
+
 	/* Suppress use of the target index while rebuilding it */
 	SetReindexProcessing(heapId, indexId);
 
@@ -4047,6 +4282,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
+		Oid			indexAm = get_rel_relam(indexOid);
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4072,6 +4308,18 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
+		if (indexAm == STIR_AM_OID)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+							get_namespace_name(indexNamespaceId),
+							get_rel_name(indexOid))));
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			continue;
+		}
+
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f1ed7b58f13..0dfa46a9b74 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1379,16 +1379,17 @@ CREATE VIEW pg_stat_progress_create_index AS
                       END AS command,
         CASE S.param10 WHEN 0 THEN 'initializing'
                        WHEN 1 THEN 'waiting for writers before build'
-                       WHEN 2 THEN 'building index' ||
+                       WHEN 2 THEN 'waiting for writers to use auxiliary index'
+                       WHEN 3 THEN 'building index' ||
                            COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)),
                                     '')
-                       WHEN 3 THEN 'waiting for writers before validation'
-                       WHEN 4 THEN 'index validation: scanning index'
-                       WHEN 5 THEN 'index validation: sorting tuples'
-                       WHEN 6 THEN 'index validation: scanning table'
-                       WHEN 7 THEN 'waiting for old snapshots'
-                       WHEN 8 THEN 'waiting for readers before marking dead'
-                       WHEN 9 THEN 'waiting for readers before dropping'
+                       WHEN 4 THEN 'waiting for writers before validation'
+                       WHEN 5 THEN 'index validation: scanning index'
+                       WHEN 6 THEN 'index validation: sorting tuples'
+                       WHEN 7 THEN 'index validation: merging indexes'
+                       WHEN 8 THEN 'waiting for old snapshots'
+                       WHEN 9 THEN 'waiting for readers before marking dead'
+                       WHEN 10 THEN 'waiting for readers before dropping'
                        END as phase,
         S.param4 AS lockers_total,
         S.param5 AS lockers_done,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index cbd76066f74..dc4af0409df 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -183,6 +183,7 @@ CheckIndexCompatible(Oid oldId,
 					 bool isWithoutOverlaps)
 {
 	bool		isconstraint;
+	bool		isauxiliary;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
 	Oid		   *opclassIds;
@@ -233,6 +234,7 @@ CheckIndexCompatible(Oid oldId,
 
 	amcanorder = amRoutine->amcanorder;
 	amsummarizing = amRoutine->amsummarizing;
+	isauxiliary = accessMethodId == STIR_AM_OID;
 
 	/*
 	 * Compute the operator classes, collations, and exclusion operators for
@@ -244,7 +246,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing,
+							  isWithoutOverlaps, isauxiliary);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -557,6 +560,7 @@ DefineIndex(ParseState *pstate,
 {
 	bool		concurrent;
 	char	   *indexRelationName;
+	char	   *auxIndexRelationName = NULL;
 	char	   *accessMethodName;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
@@ -566,6 +570,7 @@ DefineIndex(ParseState *pstate,
 	Oid			namespaceId;
 	Oid			tablespaceId;
 	Oid			createdConstraintId = InvalidOid;
+	Oid			auxIndexRelationId = InvalidOid;
 	List	   *indexColNames;
 	List	   *allIndexParams;
 	Relation	rel;
@@ -587,6 +592,7 @@ DefineIndex(ParseState *pstate,
 	int			numberOfKeyAttributes;
 	TransactionId limitXmin;
 	ObjectAddress address;
+	ObjectAddress auxAddress;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
@@ -837,6 +843,15 @@ DefineIndex(ParseState *pstate,
 											stmt->excludeOpNames,
 											stmt->primary,
 											stmt->isconstraint);
+	/*
+	 * Select name for auxiliary index
+	 */
+	if (concurrent)
+		auxIndexRelationName = ChooseRelationName(indexRelationName,
+												  NULL,
+												  "ccaux",
+												  namespaceId,
+												  false);
 
 	/*
 	 * look up the access method, verify it can handle the requested features
@@ -931,7 +946,8 @@ DefineIndex(ParseState *pstate,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  false);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -1601,6 +1617,16 @@ DefineIndex(ParseState *pstate,
 		return address;
 	}
 
+	/*
+	 * In case of concurrent build - create auxiliary index record.
+	 */
+	if (concurrent)
+	{
+		auxIndexRelationId = index_concurrently_create_aux(rel, indexRelationId,
+											tablespaceId, auxIndexRelationName);
+		ObjectAddressSet(auxAddress, RelationRelationId, auxIndexRelationId);
+	}
+
 	AtEOXact_GUC(false, root_save_nestlevel);
 	SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 
@@ -1629,11 +1655,11 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * For a concurrent build, it's important to make the catalog entries
 	 * visible to other transactions before we start to build the index. That
-	 * will prevent them from making incompatible HOT updates.  The new index
-	 * will be marked not indisready and not indisvalid, so that no one else
-	 * tries to either insert into it or use it for queries.
+	 * will prevent them from making incompatible HOT updates. New indexes
+	 * (main and auxiliary) will be marked not indisready and not indisvalid,
+	 * so that no one else tries to either insert into it or use it for queries.
 	 *
-	 * We must commit our current transaction so that the index becomes
+	 * We must commit our current transaction so that the indexes becomes
 	 * visible; then start another.  Note that all the data structures we just
 	 * built are lost in the commit.  The only data we keep past here are the
 	 * relation IDs.
@@ -1643,7 +1669,7 @@ DefineIndex(ParseState *pstate,
 	 * cannot block, even if someone else is waiting for access, because we
 	 * already have the same lock within our transaction.
 	 *
-	 * Note: we don't currently bother with a session lock on the index,
+	 * Note: we don't currently bother with a session lock on the indexes,
 	 * because there are no operations that could change its state while we
 	 * hold lock on the parent table.  This might need to change later.
 	 */
@@ -1682,7 +1708,7 @@ DefineIndex(ParseState *pstate,
 	 * with the old list of indexes.  Use ShareLock to consider running
 	 * transactions that hold locks that permit writing to the table.  Note we
 	 * do not need to worry about xacts that open the table for writing after
-	 * this point; they will see the new index when they open it.
+	 * this point; they will see the new indexes when they open it.
 	 *
 	 * Note: the reason we use actual lock acquisition here, rather than just
 	 * checking the ProcArray and sleeping, is that deadlock is possible if
@@ -1694,14 +1720,44 @@ DefineIndex(ParseState *pstate,
 
 	/*
 	 * At this moment we are sure that there are no transactions with the
-	 * table open for write that don't have this new index in their list of
+	 * table open for write that don't have this new indexes in their list of
 	 * indexes.  We have waited out all the existing transactions and any new
-	 * transaction will have the new index in its list, but the index is still
-	 * marked as "not-ready-for-inserts".  The index is consulted while
+	 * transaction will have both new indexes in its list, but indexes are still
+	 * marked as "not-ready-for-inserts". The indexes are consulted while
 	 * deciding HOT-safety though.  This arrangement ensures that no new HOT
 	 * chains can be created where the new tuple and the old tuple in the
 	 * chain have different index keys.
 	 *
+	 * Now call build on auxiliary index. Index will be created empty without
+	 * any actual heap scan, but marked as "ready-for-inserts". The goal of
+	 * that index is accumulate new tuples while main index is actually built.
+	 */
+
+	/* Set ActiveSnapshot since functions in the indexes may need it */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	index_concurrently_build(tableId, auxIndexRelationId);
+	/* we can do away with our snapshot */
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Now we need to ensure are no transactions with the with auxiliary index
+	 * marked as "not-ready-for-inserts".
+	 */
+	WaitForLockers(heaplocktag, ShareLock, true);
+
+	/*
+	 * At this moment we are sure that all new tuples in table are inserted into
+	 * the auxiliary index. Now it is time to build the target index itself.
+	 *
 	 * We now take a new snapshot, and build the index using all tuples that
 	 * are visible in this snapshot.  We can be sure that any HOT updates to
 	 * these tuples will be compatible with the index, since any updates made
@@ -1736,9 +1792,28 @@ DefineIndex(ParseState *pstate,
 	 * the index marked as read-only for updates.
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForLockers(heaplocktag, ShareLock, true);
 
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/*
+	 * Now target index is marked as "ready" for all transactions. So, auxiliary
+	 * index is no longer needed. So, start removing process by reverting "ready"
+	 * flag.
+	 */
+	index_set_state_flags(auxIndexRelationId, INDEX_DROP_CLEAR_READY);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * Now take the "reference snapshot" that will be used by validate_index()
 	 * to filter candidate tuples.  Beware!  There might still be snapshots in
@@ -1756,24 +1831,14 @@ DefineIndex(ParseState *pstate,
 	 */
 	snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	PushActiveSnapshot(snapshot);
-
 	/*
-	 * Scan the index and the heap, insert any missing index entries.
-	 */
-	validate_index(tableId, indexRelationId, snapshot);
-
-	/*
-	 * Drop the reference snapshot.  We must do this before waiting out other
-	 * snapshot holders, else we will deadlock against other processes also
-	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
-	 * they must wait for.  But first, save the snapshot's xmin to use as
-	 * limitXmin for GetCurrentVirtualXIDs().
+	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
+	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
 	limitXmin = snapshot->xmin;
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
-
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1800,7 +1865,7 @@ DefineIndex(ParseState *pstate,
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForOlderSnapshots(limitXmin, true);
 
 	/*
@@ -1825,6 +1890,53 @@ DefineIndex(ParseState *pstate,
 	 * to replan; so relcache flush on the index itself was sufficient.)
 	 */
 	CacheInvalidateRelcacheByRelid(heaprelid.relId);
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+	/* Now wait for all transaction to see auxiliary as "non-ready for inserts" */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/* Now it is time to mark auxiliary index as dead */
+	index_concurrently_set_dead(tableId, auxIndexRelationId);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_6);
+	/* Now wait for all transaction to ignore auxiliary because it is dead */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Drop auxiliary index.
+	 *
+	 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
+	 * right lock level.
+	 */
+	performDeletion(&auxAddress, DROP_RESTRICT,
+							 PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
 
 	/*
 	 * Last thing to do is release the session-level lock on the parent table.
@@ -3596,6 +3708,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	typedef struct ReindexIndexInfo
 	{
 		Oid			indexId;
+		Oid			auxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -3701,8 +3814,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
 														   ShareUpdateExclusiveLock);
+					IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-					if (!indexRelation->rd_index->indisvalid)
+
+					if (indexInfo->ii_Auxiliary)
+						ereport(WARNING,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+									get_namespace_name(get_rel_namespace(cellOid)),
+									get_rel_name(cellOid))));
+					else if (!indexRelation->rd_index->indisvalid)
 						ereport(WARNING,
 								(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 								 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3754,8 +3874,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
 															   ShareUpdateExclusiveLock);
+						IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-						if (!indexRelation->rd_index->indisvalid)
+						if (indexInfo->ii_Auxiliary)
+							ereport(WARNING,
+									(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+									 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+											get_namespace_name(get_rel_namespace(cellOid)),
+											get_rel_name(cellOid))));
+						else if (!indexRelation->rd_index->indisvalid)
 							ereport(WARNING,
 									(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 									 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3816,6 +3943,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot reindex invalid index on TOAST table")));
 
+				/* Auxiliary indexes are not allowed to be rebuilt */
+				if (get_rel_relam(relationOid) == STIR_AM_OID)
+					ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("reindex of auxiliary index \"%s\" not supported",
+								get_rel_name(relationOid))));
+
 				/*
 				 * Check if parent relation can be locked and if it exists,
 				 * this needs to be done at this stage as the list of indexes
@@ -3919,15 +4053,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
+		char	   *auxConcurrentName;
 		ReindexIndexInfo *idx = lfirst(lc);
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
+		Oid			auxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
 		int			save_sec_context;
 		int			save_nestlevel;
 		Relation	newIndexRel;
+		Relation	auxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -3978,6 +4115,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 											"ccnew",
 											get_rel_namespace(indexRel->rd_index->indrelid),
 											false);
+		auxConcurrentName = ChooseRelationName(get_rel_name(idx->indexId),
+											NULL,
+											"ccaux",
+											get_rel_namespace(indexRel->rd_index->indrelid),
+											false);
 
 		/* Choose the new tablespace, indexes of toast tables are not moved */
 		if (OidIsValid(params->tablespaceOid) &&
@@ -3991,12 +4133,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 													idx->indexId,
 													tablespaceid,
 													concurrentName);
+		auxIndexId = index_concurrently_create_aux(heapRel,
+												   newIndexId,
+												   tablespaceid,
+												   auxConcurrentName);
 
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
+		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4005,6 +4152,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
+		newidx->auxIndexId = auxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4023,10 +4171,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		lockrelid = palloc_object(LockRelId);
+		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
+		relationLocks = lappend(relationLocks, lockrelid);
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
 		/* Roll back any GUC changes executed by index functions */
@@ -4107,13 +4259,60 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * doing that, wait until no running transactions could have the table of
 	 * the index open with the old list of indexes.  See "phase 2" in
 	 * DefineIndex() for more details.
+	*/
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * Now build all auxiliary indexes and mark them as "ready-for-inserts".
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		StartTransactionCommand();
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/* Set ActiveSnapshot since functions in the indexes may need it */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Build auxiliary index, it is fast - without any actual heap scan, just an empty index. */
+		index_concurrently_build(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	StartTransactionCommand();
+
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Wait until all auxiliary indexes are taken into account by all
+	 * transactions.
+	 */
 	WaitForLockersMultiple(lockTags, ShareLock, true);
 	CommitTransactionCommand();
 
+	/* Now it is time to perform target index build. */
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4160,6 +4359,41 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * need to set the PROC_IN_SAFE_IC flag here.
 	 */
 
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * At this moment all target indexes are marked as "ready-to-insert". So,
+	 * we are free to start process of dropping auxiliary indexes.
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+		StartTransactionCommand();
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		PopActiveSnapshot();
+
+		CommitTransactionCommand();
+	}
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -4167,12 +4401,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * were created during the previous phase.  See "phase 3" in DefineIndex()
 	 * for more details.
 	 */
-
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
-	WaitForLockersMultiple(lockTags, ShareLock, true);
-	CommitTransactionCommand();
-
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4210,7 +4438,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -4239,7 +4467,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * there's no need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-									 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+									 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 		WaitForOlderSnapshots(limitXmin, true);
 
 		CommitTransactionCommand();
@@ -4330,14 +4558,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
-	 * Mark the old indexes as dead.  First we must wait until no running
-	 * transaction could be using the index for a query.  See also
+	 * Mark the old and auxiliary indexes as dead. First we must wait until no
+	 * running transaction could be using the index for a query.  See also
 	 * index_drop() for more details.
 	 */
 
 	INJECTION_POINT("reindex-relation-concurrently-before-set-dead", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	foreach(lc, indexIds)
@@ -4362,6 +4590,28 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PopActiveSnapshot();
 	}
 
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+	}
+
 	/* Commit this transaction to make the updates visible. */
 	CommitTransactionCommand();
 	StartTransactionCommand();
@@ -4375,11 +4625,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
-	 * Drop the old indexes.
+	 * Drop the old and auxiliary indexes.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_6);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -4399,6 +4649,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			add_exact_object_address(&object, objects);
 		}
 
+		foreach(lc, newIndexIds)
+		{
+			ReindexIndexInfo *idx = lfirst(lc);
+			ObjectAddress object;
+
+			object.classId = RelationRelationId;
+			object.objectId = idx->auxIndexId;
+			object.objectSubId = 0;
+
+			add_exact_object_address(&object, objects);
+		}
+
 		/*
 		 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
 		 * right lock level.
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 5359dab1176..84f7cf9824e 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, bool auxiliary)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -850,6 +850,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Concurrent = concurrent;
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
+	n->ii_Auxiliary = auxiliary;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
@@ -875,7 +876,6 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
-	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..1a997537800 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -705,7 +705,8 @@ typedef struct TableAmRoutine
 										Relation index_rel,
 										IndexInfo *index_info,
 										Snapshot snapshot,
-										ValidateIndexState *state);
+										ValidateIndexState *state,
+										ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1824,19 +1825,24 @@ table_index_build_range_scan(Relation table_rel,
  * table_index_validate_scan - second table scan for concurrent index build
  *
  * See validate_index() for an explanation.
+ *
+ * Note: it is responsibility of that function to close sortstates in
+ * both `state` and `auxstate`.
  */
 static inline void
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
 						  Snapshot snapshot,
-						  ValidateIndexState *state)
+						  ValidateIndexState *state,
+						  ValidateIndexState *auxstate)
 {
 	table_rel->rd_tableam->index_validate_scan(table_rel,
 											   index_rel,
 											   index_info,
 											   snapshot,
-											   state);
+											   state,
+											   auxstate);
 }
 
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 36b70689254..727993d1a5a 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -31,6 +31,7 @@ typedef enum
 {
 	INDEX_CREATE_SET_READY,
 	INDEX_CREATE_SET_VALID,
+	INDEX_DROP_CLEAR_READY,
 	INDEX_DROP_CLEAR_VALID,
 	INDEX_DROP_SET_DEAD,
 } IndexStateFlagsAction;
@@ -71,6 +72,7 @@ extern void index_check_primary_key(Relation heapRel,
 #define	INDEX_CREATE_IF_NOT_EXISTS			(1 << 4)
 #define	INDEX_CREATE_PARTITIONED			(1 << 5)
 #define INDEX_CREATE_INVALID				(1 << 6)
+#define INDEX_CREATE_AUXILIARY				(1 << 7)
 
 extern Oid	index_create(Relation heapRelation,
 						 const char *indexRelationName,
@@ -106,6 +108,11 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid tablespaceOid,
 										   const char *newName);
 
+extern Oid	index_concurrently_create_aux(Relation heapRelation,
+										  Oid mainIndexId,
+										  Oid tablespaceOid,
+										  const char *newName);
+
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
 
@@ -151,7 +158,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
+extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 9c40772706c..8e5f98c6fad 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -117,14 +117,15 @@
 
 /* Phases of CREATE INDEX (as advertised via PROGRESS_CREATEIDX_PHASE) */
 #define PROGRESS_CREATEIDX_PHASE_WAIT_1			1
-#define PROGRESS_CREATEIDX_PHASE_BUILD			2
-#define PROGRESS_CREATEIDX_PHASE_WAIT_2			3
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	4
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		5
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN	6
-#define PROGRESS_CREATEIDX_PHASE_WAIT_3			7
+#define PROGRESS_CREATEIDX_PHASE_WAIT_2			2
+#define PROGRESS_CREATEIDX_PHASE_BUILD			3
+#define PROGRESS_CREATEIDX_PHASE_WAIT_3			4
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	5
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		6
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE	7
 #define PROGRESS_CREATEIDX_PHASE_WAIT_4			8
 #define PROGRESS_CREATEIDX_PHASE_WAIT_5			9
+#define PROGRESS_CREATEIDX_PHASE_WAIT_6			10
 
 /*
  * Subphases of CREATE INDEX, for index_build.
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index bf54d39feb0..cd7f1eb0592 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,8 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								bool auxiliary);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 55538c4c41e..d1723f47e89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1437,6 +1437,7 @@ DETAIL:  Key (f1)=(b) already exists.
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
 ERROR:  could not create unique index "concur_index3"
 DETAIL:  Key (f2)=(b) is duplicated.
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -3211,6 +3212,7 @@ INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
 ERROR:  could not create unique index "concur_reindex_ind5"
 DETAIL:  Key (c1)=(1) is duplicated.
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
@@ -3223,8 +3225,10 @@ DETAIL:  Key (c1)=(1) is duplicated.
  c1     | integer |           |          | 
 Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1) INVALID
+    "concur_reindex_ind5_ccaux" stir (c1) INVALID
     "concur_reindex_ind5_ccnew" UNIQUE, btree (c1) INVALID
 
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -3252,6 +3256,44 @@ Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1)
 
 DROP TABLE concur_reindex_tab4;
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+ERROR:  relation "concur_reindex_tab4" does not exist
+LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+                    ^
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
+HINT:  Use DROP INDEX or REINDEX INDEX.
+WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index dc629928c8f..9b06ddc87a2 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1585,10 +1585,11 @@ select indexrelid::regclass, indisvalid,
 --------------------------------+------------+-----------------------+-------------------------------
  parted_isvalid_idx             | f          | parted_isvalid_tab    | 
  parted_isvalid_idx_11          | f          | parted_isvalid_tab_11 | parted_isvalid_tab_1_expr_idx
+ parted_isvalid_idx_11_ccaux    | f          | parted_isvalid_tab_11 | 
  parted_isvalid_tab_12_expr_idx | t          | parted_isvalid_tab_12 | parted_isvalid_tab_1_expr_idx
  parted_isvalid_tab_1_expr_idx  | f          | parted_isvalid_tab_1  | parted_isvalid_idx
  parted_isvalid_tab_2_expr_idx  | t          | parted_isvalid_tab_2  | parted_isvalid_idx
-(5 rows)
+(6 rows)
 
 drop table parted_isvalid_tab;
 -- Check state of replica indexes when attaching a partition.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 32bea58db2c..b80d5c2ed65 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2058,14 +2058,15 @@ pg_stat_progress_create_index| SELECT s.pid,
         CASE s.param10
             WHEN 0 THEN 'initializing'::text
             WHEN 1 THEN 'waiting for writers before build'::text
-            WHEN 2 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
-            WHEN 3 THEN 'waiting for writers before validation'::text
-            WHEN 4 THEN 'index validation: scanning index'::text
-            WHEN 5 THEN 'index validation: sorting tuples'::text
-            WHEN 6 THEN 'index validation: scanning table'::text
-            WHEN 7 THEN 'waiting for old snapshots'::text
-            WHEN 8 THEN 'waiting for readers before marking dead'::text
-            WHEN 9 THEN 'waiting for readers before dropping'::text
+            WHEN 2 THEN 'waiting for writers to use auxiliary index'::text
+            WHEN 3 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
+            WHEN 4 THEN 'waiting for writers before validation'::text
+            WHEN 5 THEN 'index validation: scanning index'::text
+            WHEN 6 THEN 'index validation: sorting tuples'::text
+            WHEN 7 THEN 'index validation: merging indexes'::text
+            WHEN 8 THEN 'waiting for old snapshots'::text
+            WHEN 9 THEN 'waiting for readers before marking dead'::text
+            WHEN 10 THEN 'waiting for readers before dropping'::text
             ELSE NULL::text
         END AS phase,
     s.param4 AS lockers_total,
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 82e4062a215..c2c1b031527 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -503,6 +503,7 @@ CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS concur_index2 ON concur_heap(f1);
 INSERT INTO concur_heap VALUES ('b','x');
 -- check if constraint is enforced properly at build time
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -1315,10 +1316,12 @@ CREATE TABLE concur_reindex_tab4 (c1 int);
 INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 -- This trick creates an invalid index.
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -1330,6 +1333,24 @@ REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
 DROP TABLE concur_reindex_tab4;
 
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+\d aux_index_tab5
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+DROP TABLE aux_index_tab5;
+
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
-- 
2.43.0



  [application/octet-stream] v31-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch (30.9K, 5-v31-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch)
  download | inline diff:
From 40c6d37815c25c63d3ec1e0b4e119e193795fa02 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 14:36:31 +0100
Subject: [PATCH v31 5/7] Track and drop auxiliary indexes in DROP/REINDEX

During concurrent index operations, auxiliary indexes may be left as orphaned objects when errors occur (junk auxiliary indexes).

This patch improves the handling of such auxiliary indexes:
- add auxiliaryForIndexId parameter to index_create() to track dependencies between main and auxiliary indexes
- automatically drop auxiliary indexes when the main index is dropped
- delete junk auxiliary indexes properly during REINDEX operations
---
 doc/src/sgml/ref/create_index.sgml         |  14 ++-
 doc/src/sgml/ref/reindex.sgml              |   8 +-
 src/backend/catalog/dependency.c           |   2 +-
 src/backend/catalog/index.c                |  71 ++++++++++----
 src/backend/catalog/pg_depend.c            |  58 ++++++++++++
 src/backend/catalog/toasting.c             |   1 +
 src/backend/commands/indexcmds.c           |  37 +++++++-
 src/backend/commands/tablecmds.c           |  52 +++++++++-
 src/backend/nodes/makefuncs.c              |   3 +-
 src/include/catalog/dependency.h           |   1 +
 src/include/nodes/execnodes.h              |   2 +
 src/include/nodes/makefuncs.h              |   2 +-
 src/test/regress/expected/create_index.out | 105 +++++++++++++++++++--
 src/test/regress/sql/create_index.sql      |  57 ++++++++++-
 14 files changed, 371 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 12c88587a79..7f751453317 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -668,10 +668,16 @@ Indexes:
     "idx_ccaux" stir (col) INVALID
 </programlisting>
 
-    The recommended recovery
-    method in such cases is to drop these indexes and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
-    to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
+    The recommended recovery method in such cases is to drop the index with
+    <command>DROP INDEX</command>. The auxiliary index (suffixed with
+    <literal>_ccaux</literal>) will be automatically dropped when the main
+    index is dropped. After dropping the indexes, you can try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is to
+    rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>,
+    which will also handle cleanup of any invalid auxiliary indexes.)
+    If the only invalid index is one suffixed <literal>_ccaux</literal>,
+    recommended recovery method is just <literal>DROP INDEX</literal>
+    for that index.
    </para>
 
    <para>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 9e0248261ae..54f7b36efa2 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -476,11 +476,15 @@ Indexes:
     <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
     recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
-    then attempt <command>REINDEX CONCURRENTLY</command> again.
+    then attempt <command>REINDEX CONCURRENTLY</command> again. The auxiliary index
+    (suffixed with <literal>_ccaux</literal>) will be automatically dropped
+    along with its main index.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
     the recommended recovery method is to just drop said index, since the
-    rebuild proper has been successful.
+    rebuild proper has been successful. If the only
+    invalid index is one suffixed <literal>_ccaux</literal>, recommended
+    recovery method is just <literal>DROP INDEX</literal> for that index.
     A nonzero number may be appended to the suffix of the invalid index
     names to keep them unique, like <literal>_ccnew1</literal>,
     <literal>_ccold2</literal>, etc.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fdb8e67e1f5..c6941fb19d1 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -292,7 +292,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.  (Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object, 0);
+	AcquireDeletionLock(object, flags);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31f92b97580..4b6a0f76c81 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -776,6 +776,8 @@ index_create(Relation heapRelation,
 		   ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
 	/* partitioned indexes must never be "built" by themselves */
 	Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
+	/* ii_AuxiliaryForIndexId and INDEX_CREATE_AUXILIARY are required both or neither */
+	Assert(OidIsValid(indexInfo->ii_AuxiliaryForIndexId) == auxiliary);
 
 	relkind = partitioned ? RELKIND_PARTITIONED_INDEX : RELKIND_INDEX;
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -1181,6 +1183,15 @@ index_create(Relation heapRelation,
 			recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_SEC);
 		}
 
+		/*
+		 * Record dependency on the main index in case of auxiliary index.
+		 */
+		if (OidIsValid(indexInfo->ii_AuxiliaryForIndexId))
+		{
+			ObjectAddressSet(referenced, RelationRelationId, indexInfo->ii_AuxiliaryForIndexId);
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+		}
+
 		/* placeholder for normal dependencies */
 		addrs = new_object_addresses();
 
@@ -1413,7 +1424,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							true,
 							indexRelation->rd_indam->amsummarizing,
 							oldInfo->ii_WithoutOverlaps,
-							false);
+							false,
+							InvalidOid);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1581,7 +1593,8 @@ index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
 							true,
 							false,	/* aux are not summarizing */
 							false,	/* aux are not without overlaps */
-							true	/* auxiliary */);
+							true	/* auxiliary */,
+							mainIndexId /* auxiliaryForIndexId */);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -2617,7 +2630,8 @@ BuildIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid /* auxiliary_for_index_id is set only during build */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2678,7 +2692,8 @@ BuildDummyIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3843,6 +3858,7 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 				heapRelation;
 	Oid			heapId;
 	Oid			save_userid;
+	Oid			junkAuxIndexId;
 	int			save_sec_context;
 	int			save_nestlevel;
 	IndexInfo  *indexInfo;
@@ -3899,6 +3915,19 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
 	}
 
+	/* Check for the auxiliary index for that index, it needs to be dropped */
+	junkAuxIndexId = get_auxiliary_index(indexId);
+	if (OidIsValid(junkAuxIndexId))
+	{
+		ObjectAddress object;
+		object.classId = RelationRelationId;
+		object.objectId = junkAuxIndexId;
+		object.objectSubId = 0;
+		performDeletion(&object, DROP_RESTRICT,
+								 PERFORM_DELETION_INTERNAL |
+								 PERFORM_DELETION_QUIETLY);
+	}
+
 	/*
 	 * Open the target index relation and get an exclusive lock on it, to
 	 * ensure that no one else is touching this particular index.
@@ -4187,7 +4216,8 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 {
 	Relation	rel;
 	Oid			toast_relid;
-	List	   *indexIds;
+	List	   *indexIds,
+			   *auxIndexIds = NIL;
 	char		persistence;
 	bool		result = false;
 	ListCell   *indexId;
@@ -4276,13 +4306,30 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	else
 		persistence = rel->rd_rel->relpersistence;
 
+	foreach(indexId, indexIds)
+	{
+		Oid			indexOid = lfirst_oid(indexId);
+		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* All STIR indexes are auxiliary indexes */
+		if (indexAm == STIR_AM_OID)
+		{
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			auxIndexIds = lappend_oid(auxIndexIds, indexOid);
+		}
+	}
+
 	/* Reindex all the indexes. */
 	i = 1;
 	foreach(indexId, indexIds)
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
-		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* Auxiliary indexes are going to be dropped during main index rebuild */
+		if (list_member_oid(auxIndexIds, indexOid))
+			continue;
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4308,18 +4355,6 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
-		if (indexAm == STIR_AM_OID)
-		{
-			ereport(WARNING,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
-							get_namespace_name(indexNamespaceId),
-							get_rel_name(indexOid))));
-			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
-				RemoveReindexPending(indexOid);
-			continue;
-		}
-
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 07c2d41c189..7e0e29bdb5b 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -20,6 +20,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am_d.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_depend.h"
 #include "catalog/pg_extension.h"
@@ -1108,6 +1109,63 @@ get_index_constraint(Oid indexId)
 	return constraintId;
 }
 
+/*
+ * get_auxiliary_index
+ *		Given the OID of an index, return the OID of its auxiliary
+ *		index, or InvalidOid if there is no auxiliary index.
+ */
+Oid
+get_auxiliary_index(Oid indexId)
+{
+	Oid			auxiliaryIndexOid = InvalidOid;
+	Relation	depRel;
+	ScanKeyData key[3];
+	SysScanDesc scan;
+	HeapTuple	tup;
+
+	/* Search the dependency table for the index */
+	depRel = table_open(DependRelationId, AccessShareLock);
+
+	ScanKeyInit(&key[0],
+				Anum_pg_depend_refclassid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationRelationId));
+	ScanKeyInit(&key[1],
+				Anum_pg_depend_refobjid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indexId));
+	ScanKeyInit(&key[2],
+				Anum_pg_depend_refobjsubid,
+				BTEqualStrategyNumber, F_INT4EQ,
+				Int32GetDatum(0));
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+							  NULL, 3, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend deprec = (Form_pg_depend) GETSTRUCT(tup);
+
+		/*
+		 * We assume AUTO dependency on index with rel_kind
+		 * of RELKIND_INDEX and AM eq STIR is that we are looking for.
+		 */
+		if (deprec->classid == RelationRelationId &&
+			(deprec->deptype == DEPENDENCY_AUTO) &&
+			get_rel_relkind(deprec->objid) == RELKIND_INDEX &&
+			get_rel_relam(deprec->objid) == STIR_AM_OID)
+		{
+			auxiliaryIndexOid = deprec->objid;
+			break;
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(depRel, AccessShareLock);
+
+	return auxiliaryIndexOid;
+}
+
 /*
  * get_index_ref_constraints
  *		Given the OID of an index, return the OID of all foreign key
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index c33e43df1ec..b16eac0357f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -314,6 +314,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
 	indexInfo->ii_Auxiliary = false;
+	indexInfo->ii_AuxiliaryForIndexId = InvalidOid;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index dc4af0409df..b430d4a5b34 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -247,7 +247,7 @@ CheckIndexCompatible(Oid oldId,
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
 							  false, false, amsummarizing,
-							  isWithoutOverlaps, isauxiliary);
+							  isWithoutOverlaps, isauxiliary, InvalidOid);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -947,7 +947,8 @@ DefineIndex(ParseState *pstate,
 							  concurrent,
 							  amissummarizing,
 							  stmt->iswithoutoverlaps,
-							  false);
+							  false,
+							  InvalidOid);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -3709,6 +3710,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		Oid			indexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -4058,6 +4060,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
@@ -4065,6 +4068,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		int			save_nestlevel;
 		Relation	newIndexRel;
 		Relation	auxIndexRel;
+		Relation	junkAuxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -4138,12 +4142,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 												   tablespaceid,
 												   auxConcurrentName);
 
+		/* Search for auxiliary index for reindexed index, to drop it */
+		junkAuxIndexId = get_auxiliary_index(idx->indexId);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
+		if (OidIsValid(junkAuxIndexId))
+			junkAuxIndexRel = index_open(junkAuxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4153,6 +4162,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
 		newidx->auxIndexId = auxIndexId;
+		newidx->junkAuxIndexId = junkAuxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4174,10 +4184,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		if (OidIsValid(junkAuxIndexId))
+		{
+			lockrelid = palloc_object(LockRelId);
+			*lockrelid = junkAuxIndexRel->rd_lockInfo.lockRelId;
+			relationLocks = lappend(relationLocks, lockrelid);
+		}
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		if (OidIsValid(junkAuxIndexId))
+			index_close(junkAuxIndexRel, NoLock);
 		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
@@ -4366,7 +4384,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 	/*
 	 * At this moment all target indexes are marked as "ready-to-insert". So,
-	 * we are free to start process of dropping auxiliary indexes.
+	 * we are free to start process of dropping auxiliary indexes - including
+	 * junk indexes detected earlier.
 	 */
 	foreach(lc, newIndexIds)
 	{
@@ -4389,6 +4408,9 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 */
 		PushActiveSnapshot(GetTransactionSnapshot());
 		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		/* Ensure the junk index is marked as non-ready */
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_set_state_flags(newidx->junkAuxIndexId, INDEX_DROP_CLEAR_READY);
 		PopActiveSnapshot();
 
 		CommitTransactionCommand();
@@ -4608,6 +4630,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_concurrently_set_dead(newidx->tableId, newidx->junkAuxIndexId);
 
 		PopActiveSnapshot();
 	}
@@ -4659,6 +4683,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			object.objectSubId = 0;
 
 			add_exact_object_address(&object, objects);
+
+			if (OidIsValid(idx->junkAuxIndexId))
+			{
+				object.objectId = idx->junkAuxIndexId;
+				object.objectSubId = 0;
+				add_exact_object_address(&object, objects);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 67e42e5df29..87aba245b85 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1567,6 +1567,8 @@ RemoveRelations(DropStmt *drop)
 	ListCell   *cell;
 	int			flags = 0;
 	LOCKMODE	lockmode = AccessExclusiveLock;
+	MemoryContext private_context,
+				  oldcontext;
 
 	/* DROP CONCURRENTLY uses a weaker lock, and has some restrictions */
 	if (drop->concurrent)
@@ -1631,9 +1633,20 @@ RemoveRelations(DropStmt *drop)
 			relkind = 0;		/* keep compiler quiet */
 			break;
 	}
+	/*
+	 * Create a memory context that will survive forced transaction commits we
+	 * may need to do below (in case of concurrent index drop).
+	 * Since it is a child of PortalContext, it will go away eventually even if
+	 * we suffer an error; there's no need for special abort cleanup logic.
+	 */
+	private_context = AllocSetContextCreate(PortalContext,
+											"RemoveRelations",
+											ALLOCSET_SMALL_SIZES);
 
+	oldcontext = MemoryContextSwitchTo(private_context);
 	/* Lock and validate each relation; build a list of object addresses */
 	objects = new_object_addresses();
+	MemoryContextSwitchTo(oldcontext);
 
 	foreach(cell, drop->objects)
 	{
@@ -1685,6 +1698,38 @@ RemoveRelations(DropStmt *drop)
 			flags |= PERFORM_DELETION_CONCURRENTLY;
 		}
 
+		/*
+		 * Concurrent index drop requires it to be the first transaction. But in
+		 * case we have junk auxiliary index - we want to drop it too (and also
+		 * in a concurrent way). In this case perform silent internal deletion
+		 * of auxiliary index, and restore transaction state. It is fine to do it
+		 * in the loop because there is only single element in drop->objects.
+		 */
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
+			state.actual_relkind == RELKIND_INDEX)
+		{
+			Oid junkAuxIndexOid = get_auxiliary_index(relOid);
+			if (OidIsValid(junkAuxIndexOid))
+			{
+				ObjectAddress object;
+				object.classId = RelationRelationId;
+				object.objectId = junkAuxIndexOid;
+				object.objectSubId = 0;
+				performDeletion(&object, DROP_RESTRICT,
+										 PERFORM_DELETION_CONCURRENTLY |
+										 PERFORM_DELETION_INTERNAL |
+										 PERFORM_DELETION_QUIETLY);
+				CommitTransactionCommand();
+				MemoryContextDelete(private_context);
+
+				/* And start again - now without auxiliary index. */
+				StartTransactionCommand();
+				PushActiveSnapshot(GetTransactionSnapshot());
+				RemoveRelations(drop);
+				return;
+			}
+		}
+
 		/*
 		 * Concurrent index drop cannot be used with partitioned indexes,
 		 * either.
@@ -1713,12 +1758,17 @@ RemoveRelations(DropStmt *drop)
 		obj.objectId = relOid;
 		obj.objectSubId = 0;
 
+		oldcontext = MemoryContextSwitchTo(private_context);
 		add_exact_object_address(&obj, objects);
+		MemoryContextSwitchTo(oldcontext);
 	}
 
+	/* Deletion may involve multiple commits, so, switch to memory context */
+	oldcontext = MemoryContextSwitchTo(private_context);
 	performMultipleDeletions(objects, drop->behavior, flags);
+	MemoryContextSwitchTo(oldcontext);
 
-	free_object_addresses(objects);
+	MemoryContextDelete(private_context);
 }
 
 /*
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 84f7cf9824e..c54748ff644 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps, bool auxiliary)
+			  bool withoutoverlaps, bool auxiliary, Oid auxiliary_for_index_id)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -851,6 +851,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
 	n->ii_Auxiliary = auxiliary;
+	n->ii_AuxiliaryForIndexId = auxiliary_for_index_id;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 2f3c1eae3c7..6ae210c584e 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -193,6 +193,7 @@ extern List *getOwnedSequences(Oid relid);
 extern Oid	getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok);
 
 extern Oid	get_index_constraint(Oid indexId);
+extern Oid	get_auxiliary_index(Oid indexId);
 
 extern List *get_index_ref_constraints(Oid indexId);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0f834889912..f97fcb7872c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -229,6 +229,8 @@ typedef struct IndexInfo
 	int			ii_ParallelWorkers;
 	/* is auxiliary for concurrent index build? */
 	bool		ii_Auxiliary;
+	/* if creating an auxiliary index, the OID of the main index; otherwise InvalidOid. */
+	Oid			ii_AuxiliaryForIndexId;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index cd7f1eb0592..3a704781c8b 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -100,7 +100,7 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
 								bool summarizing, bool withoutoverlaps,
-								bool auxiliary);
+								bool auxiliary, Oid auxiliary_for_index_id);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index d1723f47e89..2d6abb15a89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3279,20 +3279,109 @@ ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-ERROR:  relation "concur_reindex_tab4" does not exist
-LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-                    ^
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
-ERROR:  could not create unique index "aux_index_ind6"
-DETAIL:  Key (c1)=(1) is duplicated.
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
 WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
 HINT:  Use DROP INDEX or REINDEX INDEX.
 WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
 NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+-- Make sure it is still exists
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
+DROP INDEX aux_index_ind6;
+ERROR:  index "aux_index_ind6" does not exist
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index c2c1b031527..fd96d80abbc 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1344,11 +1344,62 @@ REINDEX INDEX aux_index_ind6_ccaux;
 -- Concurrently also
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
+-- Make sure it is still exists
+\d aux_index_tab5
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.43.0



  [application/octet-stream] v31-0001-Add-stress-tests-for-concurrent-index-builds.patch (11.9K, 6-v31-0001-Add-stress-tests-for-concurrent-index-builds.patch)
  download | inline diff:
From b6a36e045192906f72cb805f33f4cccafd780f89 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v31 1/7] Add stress tests for concurrent index builds

Introduce stress tests for concurrent index operations:
- test concurrent inserts/updates during CREATE/REINDEX INDEX CONCURRENTLY
- cover various index types (btree, gin, gist, brin, hash, spgist)
- test unique and non-unique indexes
- test with expressions and predicates
- test both parallel and non-parallel operations

These tests verify the behavior of the following commits.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 273 ++++++++++++++++++++++++++++++++
 2 files changed, 274 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 592cef74ecb..51a62dccb7b 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..0495ac10263
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,273 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+use constant STRESS_PGBENCH_CLIENTS => 30;
+use constant STRESS_PGBENCH_JOBS => 8;
+use constant STRESS_PGBENCH_TRANSACTIONS => 10000;
+use constant STRESS_MAX_SLEEP_MS => 10;
+
+use constant DEFAULT_PGBENCH_CLIENTS => 15;
+use constant DEFAULT_PGBENCH_JOBS => 4;
+use constant DEFAULT_PGBENCH_TRANSACTIONS => 500;
+use constant DEFAULT_MAX_SLEEP_MS => 1;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my ($node, $result);
+my $pg_test_extra = $ENV{PG_TEST_EXTRA} // '';
+my $is_stress = $pg_test_extra =~ /\bstress\b/ ? 1 : 0;
+my $pgbench_clients =
+  $is_stress ? STRESS_PGBENCH_CLIENTS : DEFAULT_PGBENCH_CLIENTS;
+my $pgbench_jobs = $is_stress ? STRESS_PGBENCH_JOBS : DEFAULT_PGBENCH_JOBS;
+my $pgbench_transactions =
+  $is_stress ? STRESS_PGBENCH_TRANSACTIONS : DEFAULT_PGBENCH_TRANSACTIONS;
+my $max_sleep_ms = $is_stress ? STRESS_MAX_SLEEP_MS : DEFAULT_MAX_SLEEP_MS;
+my $pgbench_options = sprintf(
+	'--no-vacuum --client=%d --jobs=%d --exit-on-abort --transactions=%d',
+	$pgbench_clients,
+	$pgbench_jobs,
+	$pgbench_transactions);
+my $no_hot = $is_stress ? int(rand(2)) : 0;
+
+print(
+		sprintf(
+		'settings: PG_TEST_EXTRA=%s stress=%d clients=%d jobs=%d transactions=%d max_sleep_ms=%d no_hot=%d',
+		defined($ENV{PG_TEST_EXTRA})
+		? ($pg_test_extra eq '' ? '(empty)' : $pg_test_extra)
+		: '(undef)',
+		$is_stress,
+		$pgbench_clients,
+		$pgbench_jobs,
+		$pgbench_transactions,
+		$max_sleep_ms,
+		$no_hot));
+print "\n";
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'maintenance_work_mem = 32MB'); # to avoid OOM
+$node->append_conf('postgresql.conf', 'shared_buffers = 32MB'); # to avoid OOM
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE UNLOGGED TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp,
+								ia int4[], p point)));
+
+if ($no_hot) { $node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);)); }
+
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => sprintf(q(
+			SET debug_parallel_query = off; -- this is because predicate_stable implementation
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for unique BTREE',
+	{
+		'concurrent_ops_unique_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE UNIQUE INDEX CONCURRENTLY new_idx ON tbl(i);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIN with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN/GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_gin_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIN (ia);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIST/BRIN/HASH/SPGIST index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_other_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\set variant random(0, 3)
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIST (p);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING BRIN (updated_at);
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING HASH (updated_at);
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING SPGIST (p);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+		});
+
+$node->stop;
+done_testing();
-- 
2.43.0



  [application/octet-stream] v31-0006-Optimize-auxiliary-index-handling.patch (2.1K, 7-v31-0006-Optimize-auxiliary-index-handling.patch)
  download | inline diff:
From bb56f91df5a44c7865e6f599738cdec476497021 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 30 Dec 2024 16:37:12 +0100
Subject: [PATCH v31 6/7] Optimize auxiliary index handling

Skip unnecessary computations for auxiliary indices by:
- in the index-insert path, detect auxiliary indexes and bypass Datum value computation
- set indexUnchanged=false for auxiliary indices to avoid redundant checks

These optimizations reduce overhead during concurrent index operations.
---
 src/backend/catalog/index.c         | 11 +++++++++++
 src/backend/executor/execIndexing.c |  5 ++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4b6a0f76c81..2d7d25f1986 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2917,6 +2917,17 @@ FormIndexDatum(IndexInfo *indexInfo,
 	ListCell   *indexpr_item;
 	int			i;
 
+	/* Auxiliary index does not need any values to be computed */
+	if (unlikely(indexInfo->ii_Auxiliary))
+	{
+		for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+		{
+			values[i] = PointerGetDatum(NULL);
+			isnull[i] = true;
+		}
+		return;
+	}
+
 	if (indexInfo->ii_Expressions != NIL &&
 		indexInfo->ii_ExpressionsState == NIL)
 	{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..ce76a213556 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -438,8 +438,11 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
 		 * There's definitely going to be an index_insert() call for this
 		 * index.  If we're being called as part of an UPDATE statement,
 		 * consider if the 'indexUnchanged' = true hint should be passed.
+		 *
+		 * For auxiliary indexes, always pass false to skip value comparison checks,
+		 * since auxiliary indexes only store TIDs and don't track value changes.
 		 */
-		indexUnchanged = ((flags & EIIT_IS_UPDATE) &&
+		indexUnchanged = ((flags & EIIT_IS_UPDATE) && likely(!indexInfo->ii_Auxiliary) &&
 						  index_unchanged_by_update(resultRelInfo,
 													estate,
 													indexInfo,
-- 
2.43.0



  [application/octet-stream] v31-0007-Refresh-snapshot-periodically-during-index-valid.patch (22.5K, 8-v31-0007-Refresh-snapshot-periodically-during-index-valid.patch)
  download | inline diff:
From 76c15aa5624a9dd861862bd42956cebf042459bc Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 21 Apr 2025 14:11:53 +0200
Subject: [PATCH v31 7/7] Refresh snapshot periodically during index validation

Enhances validation phase of concurrently built indexes by periodically refreshing snapshots rather than using a single reference snapshot. This addresses issues with xmin propagation during long-running validations.

The validation now takes a fresh snapshot every few pages, allowing the xmin horizon to advance. This restores feature of commit d9d076222f5b, which was reverted in commit e28bb8851969. New STIR-based approach does not depend on single reference snapshot anymore.
---
 src/backend/access/heap/README.HOT       |  4 +-
 src/backend/access/heap/heapam_handler.c | 65 +++++++++++++++++++++++-
 src/backend/access/spgist/spgvacuum.c    | 12 +++--
 src/backend/catalog/index.c              | 63 ++++++++++++++++-------
 src/backend/commands/indexcmds.c         | 50 +++---------------
 src/include/access/tableam.h             | 25 ++++-----
 src/include/access/transam.h             | 15 ++++++
 src/include/catalog/index.h              |  2 +-
 8 files changed, 153 insertions(+), 83 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index b1c797517ee..382fe1723a5 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -401,12 +401,12 @@ live tuple.
 We mark the index open for inserts (but still not ready for reads) then
 we again wait for transactions which have the table open.  Then validate
 the index.  This searches for tuples missing from the index in auxiliary
-index, and inserts any missing ones if they are visible to reference snapshot.
+index, and inserts any missing ones if they are visible to a fresh snapshot.
 Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
-the second reference snapshot is finished.  This ensures that nobody is
+the latest used snapshot is finished.  This ensures that nobody is
 alive any longer who could need to see any tuples that might be missing
 from the index, as well as ensuring that no one can see any inconsistent
 rows in a broken HOT chain (the first condition is stronger than the
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index f90310a1ab8..78bc1bff70e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2022,23 +2022,26 @@ heapam_index_validate_scan_read_stream_next(
 	return result;
 }
 
-static void
+static TransactionId
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
-						   Snapshot snapshot,
 						   ValidateIndexState *state,
 						   ValidateIndexState *auxState)
 {
+	TransactionId limitXmin;
+
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Snapshot		snapshot;
 	TupleTableSlot  *slot;
 	EState			*estate;
 	ExprContext		*econtext;
 	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	int64			num_to_check;
+	BlockNumber		page_read_counter = 1; /* set to 1 to skip snapshot reset at start */
 	Tuplestorestate *tuples_for_check;
 	ValidateIndexScanState callback_private_data;
 
@@ -2049,6 +2052,8 @@ heapam_index_validate_scan(Relation heapRelation,
 	/* Use 10% of memory for tuple store. */
 	int		store_work_mem_part = maintenance_work_mem / 10;
 
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
@@ -2057,6 +2062,12 @@ heapam_index_validate_scan(Relation heapRelation,
 	 */
 	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
+	PopActiveSnapshot();
+	InvalidateCatalogSnapshot();
+
+	Assert(!HaveRegisteredOrActiveSnapshot());
+	Assert(!TransactionIdIsValid(MyProc->xmin));
+
 	/*
 	 * sanity checks
 	 */
@@ -2072,6 +2083,29 @@ heapam_index_validate_scan(Relation heapRelation,
 
 	state->tuplesort = auxState->tuplesort = NULL;
 
+	/*
+	 * Now take the first snapshot that will be used to filter candidate
+	 * tuples. We are going to replace it by newer snapshot every so often
+	 * to propagate horizon.
+	 *
+	 * Beware!  There might still be snapshots in use that treat some transaction
+	 * as in-progress that our temporary snapshot treats as committed.
+	 *
+	 * If such a recently-committed transaction deleted tuples in the table,
+	 * we will not include them in the index; yet those transactions which
+	 * see the deleting one as still-in-progress will expect such tuples to
+	 * be there once we mark the index as valid.
+	 *
+	 * We solve this by waiting for all endangered transactions to exit before
+	 * we mark the index as valid, for that reason limitXmin is supported.
+	 *
+	 * We also set ActiveSnapshot to this snap, since functions in indexes may
+	 * need a snapshot.
+	 */
+	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	PushActiveSnapshot(snapshot);
+	limitXmin = snapshot->xmin;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
@@ -2105,6 +2139,7 @@ heapam_index_validate_scan(Relation heapRelation,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		block_number = BufferGetBlockNumber(buf);
+		page_read_counter++;
 
 		i = 0;
 		while ((off = tuples[i]) != InvalidOffsetNumber)
@@ -2162,6 +2197,20 @@ heapam_index_validate_scan(Relation heapRelation,
 		}
 
 		ReleaseBuffer(buf);
+#define VALIDATE_INDEX_RESET_SNAPSHOT_EACH_N_PAGE 4096
+		if (page_read_counter % VALIDATE_INDEX_RESET_SNAPSHOT_EACH_N_PAGE == 0)
+		{
+			PopActiveSnapshot();
+			UnregisterSnapshot(snapshot);
+			/* to make sure we propagate xmin */
+			InvalidateCatalogSnapshot();
+			Assert(!TransactionIdIsValid(MyProc->xmin));
+
+			snapshot = RegisterSnapshot(GetLatestSnapshot());
+			PushActiveSnapshot(snapshot);
+			/* xmin should not go backwards, but just in case */
+			limitXmin = TransactionIdNewer(limitXmin, snapshot->xmin);
+		}
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -2171,11 +2220,23 @@ heapam_index_validate_scan(Relation heapRelation,
 	read_stream_end(read_stream);
 	tuplestore_end(tuples_for_check);
 
+	/*
+	 * Drop the latest snapshot.  We must do this before waiting out other
+	 * snapshot holders, else we will deadlock against other processes also
+	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
+	 * they must wait for.
+	 */
+	PopActiveSnapshot();
+	UnregisterSnapshot(snapshot);
+	InvalidateCatalogSnapshot();
+	Assert(MyProc->xmin == InvalidTransactionId);
 	FreeAccessStrategy(bstrategy);
 
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
+
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..7ea60c18e6f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -191,14 +191,16 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 			 * Add target TID to pending list if the redirection could have
 			 * happened since VACUUM started.  (If xid is invalid, assume it
 			 * must have happened before VACUUM started, since REINDEX
-			 * CONCURRENTLY locks out VACUUM.)
+			 * CONCURRENTLY locks out VACUUM, if myXmin is invalid it is
+			 * validation scan.)
 			 *
 			 * Note: we could make a tighter test by seeing if the xid is
 			 * "running" according to the active snapshot; but snapmgr.c
 			 * doesn't currently export a suitable API, and it's not entirely
 			 * clear that a tighter test is worth the cycles anyway.
 			 */
-			if (TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
+			if (!TransactionIdIsValid(bds->myXmin) ||
+					TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
 				spgAddPendingTID(bds, &dt->pointer);
 		}
 		else
@@ -808,7 +810,6 @@ spgvacuumscan(spgBulkDeleteState *bds)
 	/* Finish setting up spgBulkDeleteState */
 	initSpGistState(&bds->spgstate, index);
 	bds->pendingList = NULL;
-	bds->myXmin = GetActiveSnapshot()->xmin;
 	bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
 
 	/*
@@ -959,6 +960,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	bds.stats = stats;
 	bds.callback = callback;
 	bds.callback_state = callback_state;
+	if (info->validate_index)
+		bds.myXmin = InvalidTransactionId;
+	else
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 	spgvacuumscan(&bds);
 
@@ -999,6 +1004,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.stats = stats;
 		bds.callback = dummy_callback;
 		bds.callback_state = NULL;
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 		spgvacuumscan(&bds);
 	}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2d7d25f1986..c37a786dafd 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -69,6 +69,7 @@
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -3514,8 +3515,9 @@ IndexCheckExclusion(Relation heapRelation,
  * insert their new tuples into it. At the same moment we clear "indisready" for
  * auxiliary index, since it is no more required to be updated.
  *
- * We then take a new reference snapshot, any tuples that are valid according
- * to this snap, but are not in the index, must be added to the index.
+ * We then take a new snapshot, any tuples that are valid according
+ * to this snap, but are not in the index, must be added to the index. In
+ * order to propagate xmin we reset that snapshot every so often.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
  * the snap need not be indexed, because we will wait out all transactions
@@ -3528,7 +3530,7 @@ IndexCheckExclusion(Relation heapRelation,
  * TIDs of both auxiliary and target indexes, and doing a "merge join" against
  * the TID lists to see which tuples from auxiliary index are missing from the
  * target index.  Thus we will ensure that all tuples valid according to the
- * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * latest snapshot are in the index. Notice we need to do bulkdelete in the
  * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
@@ -3541,21 +3543,24 @@ IndexCheckExclusion(Relation heapRelation,
  * before it declares a uniqueness error.
  *
  * After completing validate_index(), we wait until all transactions that
- * were alive at the time of the reference snapshot are gone; this is
- * necessary to be sure there are none left with a transaction snapshot
- * older than the reference (and hence possibly able to see tuples we did
- * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
- * transactions will be able to use it for queries.
+ * were alive at the time of the latest snapshot used during validation are
+ * gone; this is necessary to be sure there are none left with a transaction
+ * snapshot older than that (and hence possibly able to see tuples we did
+ * not index).  The snapshot is periodically refreshed during the heap scan
+ * to propagate the xmin horizon, so limitXmin tracks the most recent one.
+ * Then we mark the index "indisvalid" and commit.  Subsequent transactions
+ * will be able to use it for queries.
  *
  * Also, some actions to concurrent drop the auxiliary index are performed.
  */
-void
-validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
+TransactionId
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId)
 {
 	Relation	heapRelation,
 				indexRelation,
 				auxIndexRelation;
 	IndexInfo  *indexInfo;
+	TransactionId limitXmin;
 	IndexVacuumInfo ivinfo, auxivinfo;
 	ValidateIndexState state, auxState;
 	Oid			save_userid;
@@ -3605,8 +3610,12 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	 * Fetch info needed for index_insert.  (You might think this should be
 	 * passed in from DefineIndex, but its copy is long gone due to having
 	 * been built in a previous transaction.)
+	 *
+	 * We might need snapshot for index expressions or predicates.
 	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	indexInfo = BuildIndexInfo(indexRelation);
+	PopActiveSnapshot();
 
 	/* mark build is concurrent just for consistency */
 	indexInfo->ii_Concurrent = true;
@@ -3642,6 +3651,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 										   NULL, TUPLESORT_NONE);
 	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	(void) index_bulk_delete(&auxivinfo, NULL,
 							 validate_index_callback, &auxState);
 	/* If aux index is empty, merge may be skipped */
@@ -3661,7 +3673,13 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		index_close(indexRelation, NoLock);
 		table_close(heapRelation, NoLock);
 
-		return;
+		PushActiveSnapshot(GetTransactionSnapshot());
+		limitXmin = GetActiveSnapshot()->xmin;
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		Assert(!TransactionIdIsValid(MyProc->xmin));
+		return limitXmin;
 	}
 
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
@@ -3670,6 +3688,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	/* ambulkdelete updates progress metrics */
 	(void) index_bulk_delete(&ivinfo, NULL,
 							 validate_index_callback, &state);
@@ -3689,19 +3710,24 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	tuplesort_performsort(auxState.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+	Assert(!TransactionIdIsValid(MyProc->xmin));
 
 	/*
 	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
-	table_index_validate_scan(heapRelation,
-							  indexRelation,
-							  indexInfo,
-							  snapshot,
-							  &state,
-							  &auxState);
+	limitXmin = table_index_validate_scan(heapRelation,
+										  indexRelation,
+										  indexInfo,
+										  &state,
+										  &auxState);
 
 	/* Tuple sort closed by table_index_validate_scan */
 	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
@@ -3724,6 +3750,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
+
+	Assert(!TransactionIdIsValid(MyProc->xmin));
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b430d4a5b34..0e7b961b170 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -596,7 +596,6 @@ DefineIndex(ParseState *pstate,
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -1814,32 +1813,11 @@ DefineIndex(ParseState *pstate,
 	/* Tell concurrent index builds to ignore us, if index qualifies */
 	if (safe_index)
 		set_indexsafe_procflags();
-
-	/*
-	 * Now take the "reference snapshot" that will be used by validate_index()
-	 * to filter candidate tuples.  Beware!  There might still be snapshots in
-	 * use that treat some transaction as in-progress that our reference
-	 * snapshot treats as committed.  If such a recently-committed transaction
-	 * deleted tuples in the table, we will not include them in the index; yet
-	 * those transactions which see the deleting one as still-in-progress will
-	 * expect such tuples to be there once we mark the index as valid.
-	 *
-	 * We solve this by waiting for all endangered transactions to exit before
-	 * we mark the index as valid.
-	 *
-	 * We also set ActiveSnapshot to this snap, since functions in indexes may
-	 * need a snapshot.
-	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
-	PushActiveSnapshot(snapshot);
 	/*
 	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
-	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
-	limitXmin = snapshot->xmin;
+	limitXmin = validate_index(tableId, indexRelationId, auxIndexRelationId);
 
-	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1861,8 +1839,8 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * The index is now valid in the sense that it contains all currently
 	 * interesting tuples.  But since it might not contain tuples deleted just
-	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.
+	 * before the last snapshot during validating was taken, we have to wait
+	 * out any transactions that might have older snapshots.
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
@@ -4427,7 +4405,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
-		Snapshot	snapshot;
 
 		StartTransactionCommand();
 
@@ -4442,13 +4419,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		if (newidx->safe)
 			set_indexsafe_procflags();
 
-		/*
-		 * Take the "reference snapshot" that will be used by validate_index()
-		 * to filter candidate tuples.
-		 */
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
-		PushActiveSnapshot(snapshot);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
@@ -4460,16 +4430,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
-
-		/*
-		 * We can now do away with our active snapshot, we still need to save
-		 * the xmin limit to wait for older snapshots.
-		 */
-		limitXmin = snapshot->xmin;
-
-		PopActiveSnapshot();
-		UnregisterSnapshot(snapshot);
+		limitXmin = validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId);
+		Assert(!TransactionIdIsValid(MyProc->xmin));
 
 		/*
 		 * To ensure no deadlocks, we must commit and start yet another
@@ -4482,7 +4444,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		/*
 		 * The index is now valid in the sense that it contains all currently
 		 * interesting tuples.  But since it might not contain tuples deleted
-		 * just before the reference snap was taken, we have to wait out any
+		 * just before the latest snap was taken, we have to wait out any
 		 * transactions that might have older snapshots.
 		 *
 		 * Because we don't take a snapshot or Xid in this transaction,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1a997537800..2380a593d71 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -701,12 +701,11 @@ typedef struct TableAmRoutine
 										   TableScanDesc scan);
 
 	/* see table_index_validate_scan for reference about parameters */
-	void		(*index_validate_scan) (Relation table_rel,
-										Relation index_rel,
-										IndexInfo *index_info,
-										Snapshot snapshot,
-										ValidateIndexState *state,
-										ValidateIndexState *aux_state);
+	TransactionId		(*index_validate_scan) (Relation table_rel,
+												Relation index_rel,
+												IndexInfo *index_info,
+												ValidateIndexState *state,
+												ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1829,20 +1828,18 @@ table_index_build_range_scan(Relation table_rel,
  * Note: it is responsibility of that function to close sortstates in
  * both `state` and `auxstate`.
  */
-static inline void
+static inline TransactionId
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
-						  Snapshot snapshot,
 						  ValidateIndexState *state,
 						  ValidateIndexState *auxstate)
 {
-	table_rel->rd_tableam->index_validate_scan(table_rel,
-											   index_rel,
-											   index_info,
-											   snapshot,
-											   state,
-											   auxstate);
+	return table_rel->rd_tableam->index_validate_scan(table_rel,
+													  index_rel,
+													  index_info,
+													  state,
+													  auxstate);
 }
 
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 6fa91bfcdc0..b33084cb91a 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -417,6 +417,21 @@ NormalTransactionIdOlder(TransactionId a, TransactionId b)
 	return b;
 }
 
+/* return the newer of the two IDs */
+static inline TransactionId
+TransactionIdNewer(TransactionId a, TransactionId b)
+{
+	if (!TransactionIdIsValid(a))
+		return b;
+
+	if (!TransactionIdIsValid(b))
+		return a;
+
+	if (TransactionIdFollows(a, b))
+		return a;
+	return b;
+}
+
 /* return the newer of the two IDs */
 static inline FullTransactionId
 FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 727993d1a5a..91666663834 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -158,7 +158,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
+extern TransactionId validate_index(Oid heapId, Oid indexId, Oid auxIndexId);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-03-28 19:17  Mihail Nikalayeu <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Mihail Nikalayeu @ 2026-03-28 19:17 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; +Cc: Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Hello!

Small fixes, comments, support for high isolation level, etc.


Attachments:

  [application/octet-stream] v32-0001-Add-stress-tests-for-concurrent-index-builds.patch (11.9K, 3-v32-0001-Add-stress-tests-for-concurrent-index-builds.patch)
  download | inline diff:
From 44293162407526557b77eb5a783d916a3648c474 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v32 1/7] Add stress tests for concurrent index builds

Introduce stress tests for concurrent index operations:
- test concurrent inserts/updates during CREATE/REINDEX INDEX CONCURRENTLY
- cover various index types (btree, gin, gist, brin, hash, spgist)
- test unique and non-unique indexes
- test with expressions and predicates
- test both parallel and non-parallel operations

These tests verify the behavior of the following commits.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 273 ++++++++++++++++++++++++++++++++
 2 files changed, 274 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 592cef74ecb..51a62dccb7b 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..47fc65b9dab
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,273 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+use constant STRESS_PGBENCH_CLIENTS => 30;
+use constant STRESS_PGBENCH_JOBS => 8;
+use constant STRESS_PGBENCH_TRANSACTIONS => 10000;
+use constant STRESS_MAX_SLEEP_MS => 10;
+
+use constant DEFAULT_PGBENCH_CLIENTS => 15;
+use constant DEFAULT_PGBENCH_JOBS => 4;
+use constant DEFAULT_PGBENCH_TRANSACTIONS => 500;
+use constant DEFAULT_MAX_SLEEP_MS => 1;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my $node;
+my $pg_test_extra = $ENV{PG_TEST_EXTRA} // '';
+my $is_stress = $pg_test_extra =~ /\bstress\b/ ? 1 : 0;
+my $pgbench_clients =
+  $is_stress ? STRESS_PGBENCH_CLIENTS : DEFAULT_PGBENCH_CLIENTS;
+my $pgbench_jobs = $is_stress ? STRESS_PGBENCH_JOBS : DEFAULT_PGBENCH_JOBS;
+my $pgbench_transactions =
+  $is_stress ? STRESS_PGBENCH_TRANSACTIONS : DEFAULT_PGBENCH_TRANSACTIONS;
+my $max_sleep_ms = $is_stress ? STRESS_MAX_SLEEP_MS : DEFAULT_MAX_SLEEP_MS;
+my $pgbench_options = sprintf(
+	'--no-vacuum --client=%d --jobs=%d --exit-on-abort --transactions=%d',
+	$pgbench_clients,
+	$pgbench_jobs,
+	$pgbench_transactions);
+my $no_hot = $is_stress ? int(rand(2)) : 0;
+
+print(
+		sprintf(
+		'settings: PG_TEST_EXTRA=%s stress=%d clients=%d jobs=%d transactions=%d max_sleep_ms=%d no_hot=%d',
+		defined($ENV{PG_TEST_EXTRA})
+		? ($pg_test_extra eq '' ? '(empty)' : $pg_test_extra)
+		: '(undef)',
+		$is_stress,
+		$pgbench_clients,
+		$pgbench_jobs,
+		$pgbench_transactions,
+		$max_sleep_ms,
+		$no_hot));
+print "\n";
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'maintenance_work_mem = 32MB'); # to avoid OOM
+$node->append_conf('postgresql.conf', 'shared_buffers = 32MB'); # to avoid OOM
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE UNLOGGED TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp,
+								ia int4[], p point)));
+
+if ($no_hot) { $node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);)); }
+
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => sprintf(q(
+			SET debug_parallel_query = off; -- this is because predicate_stable implementation
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for unique BTREE',
+	{
+		'concurrent_ops_unique_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE UNIQUE INDEX CONCURRENTLY new_idx ON tbl(i);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIN with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN',
+	{
+		'concurrent_ops_gin_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIN (ia);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIST/BRIN/HASH/SPGIST index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_other_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\set variant random(0, 3)
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIST (p);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING BRIN (updated_at);
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING HASH (updated_at);
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING SPGIST (p);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+		});
+
+$node->stop;
+done_testing();
-- 
2.43.0



  [application/octet-stream] v32-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch (31.7K, 4-v32-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch)
  download | inline diff:
From 73d3dee9747f968b31c877c2e62ec6d4609671de Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 14:36:31 +0100
Subject: [PATCH v32 5/7] Track and drop auxiliary indexes in DROP/REINDEX

During concurrent index operations, auxiliary indexes may be left as orphaned objects when errors occur (junk auxiliary indexes).

This patch improves the handling of such auxiliary indexes:
- add auxiliaryForIndexId parameter to index_create() to track dependencies between main and auxiliary indexes
- automatically drop auxiliary indexes when the main index is dropped
- delete junk auxiliary indexes properly during REINDEX operations
---
 doc/src/sgml/ref/create_index.sgml         |  14 ++-
 doc/src/sgml/ref/reindex.sgml              |   8 +-
 src/backend/catalog/dependency.c           |   2 +-
 src/backend/catalog/index.c                |  78 +++++++++++----
 src/backend/catalog/pg_depend.c            |  62 ++++++++++++
 src/backend/catalog/toasting.c             |   1 +
 src/backend/commands/indexcmds.c           |  37 +++++++-
 src/backend/commands/tablecmds.c           |  52 +++++++++-
 src/backend/nodes/makefuncs.c              |   3 +-
 src/include/catalog/dependency.h           |   1 +
 src/include/nodes/execnodes.h              |   2 +
 src/include/nodes/makefuncs.h              |   2 +-
 src/test/regress/expected/create_index.out | 105 +++++++++++++++++++--
 src/test/regress/sql/create_index.sql      |  57 ++++++++++-
 14 files changed, 380 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 12c88587a79..406c02e866e 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -668,10 +668,16 @@ Indexes:
     "idx_ccaux" stir (col) INVALID
 </programlisting>
 
-    The recommended recovery
-    method in such cases is to drop these indexes and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
-    to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
+    The recommended recovery method in such cases is to drop the index with
+    <command>DROP INDEX</command>. The auxiliary index (suffixed with
+    <literal>_ccaux</literal>) will be automatically dropped when the main
+    index is dropped. After dropping the indexes, you can try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is to
+    rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>,
+    which will also handle cleanup of any invalid auxiliary indexes.)
+    If the only invalid index is one suffixed <literal>_ccaux</literal>,
+    the recommended recovery method is just <literal>DROP INDEX</literal>
+    for that index.
    </para>
 
    <para>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 9e0248261ae..ac9cfec5c55 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -476,11 +476,15 @@ Indexes:
     <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
     recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
-    then attempt <command>REINDEX CONCURRENTLY</command> again.
+    then attempt <command>REINDEX CONCURRENTLY</command> again. The auxiliary index
+    (suffixed with <literal>_ccaux</literal>) will be automatically dropped
+    along with its main index.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
     the recommended recovery method is to just drop said index, since the
-    rebuild proper has been successful.
+    rebuild proper has been successful. If the only
+    invalid index is one suffixed <literal>_ccaux</literal>, the recommended
+    recovery method is just <literal>DROP INDEX</literal> for that index.
     A nonzero number may be appended to the suffix of the invalid index
     names to keep them unique, like <literal>_ccnew1</literal>,
     <literal>_ccold2</literal>, etc.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fdb8e67e1f5..c6941fb19d1 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -292,7 +292,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.  (Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object, 0);
+	AcquireDeletionLock(object, flags);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7f29dfa0b28..5bf7fe131c0 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -776,6 +776,8 @@ index_create(Relation heapRelation,
 		   ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
 	/* partitioned indexes must never be "built" by themselves */
 	Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
+	/* ii_AuxiliaryForIndexId and INDEX_CREATE_AUXILIARY are required both or neither */
+	Assert(OidIsValid(indexInfo->ii_AuxiliaryForIndexId) == auxiliary);
 
 	relkind = partitioned ? RELKIND_PARTITIONED_INDEX : RELKIND_INDEX;
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -1181,6 +1183,15 @@ index_create(Relation heapRelation,
 			recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_SEC);
 		}
 
+		/*
+		 * Record dependency on the main index in case of auxiliary index.
+		 */
+		if (OidIsValid(indexInfo->ii_AuxiliaryForIndexId))
+		{
+			ObjectAddressSet(referenced, RelationRelationId, indexInfo->ii_AuxiliaryForIndexId);
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+		}
+
 		/* placeholder for normal dependencies */
 		addrs = new_object_addresses();
 
@@ -1413,7 +1424,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							true,
 							indexRelation->rd_indam->amsummarizing,
 							oldInfo->ii_WithoutOverlaps,
-							false);
+							false,
+							InvalidOid);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1584,7 +1596,8 @@ index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
 							true,
 							false,	/* aux are not summarizing */
 							false,	/* aux are not without overlaps */
-							true	/* auxiliary */);
+							true	/* auxiliary */,
+							mainIndexId /* auxiliaryForIndexId */);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -2623,7 +2636,8 @@ BuildIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid /* auxiliary_for_index_id is set only during build */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2684,7 +2698,8 @@ BuildDummyIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3763,8 +3778,11 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			indexForm->indisvalid = true;
 			break;
 		case INDEX_DROP_CLEAR_READY:
-			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
-			Assert(indexForm->indisready);
+			/*
+			 * Clear indisready during a CREATE INDEX CONCURRENTLY sequence.
+			 * indisready may already be false if the CIC failed before
+			 * index_concurrently_build had a chance to set it.
+			 */
 			Assert(!indexForm->indisvalid);
 			indexForm->indisready = false;
 			break;
@@ -3849,6 +3867,7 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 				heapRelation;
 	Oid			heapId;
 	Oid			save_userid;
+	Oid			junkAuxIndexId;
 	int			save_sec_context;
 	int			save_nestlevel;
 	IndexInfo  *indexInfo;
@@ -3905,6 +3924,19 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
 	}
 
+	/* Check for the auxiliary index for that index, it needs to be dropped */
+	junkAuxIndexId = get_auxiliary_index(indexId);
+	if (OidIsValid(junkAuxIndexId))
+	{
+		ObjectAddress object;
+		object.classId = RelationRelationId;
+		object.objectId = junkAuxIndexId;
+		object.objectSubId = 0;
+		performDeletion(&object, DROP_RESTRICT,
+								 PERFORM_DELETION_INTERNAL |
+								 PERFORM_DELETION_QUIETLY);
+	}
+
 	/*
 	 * Open the target index relation and get an exclusive lock on it, to
 	 * ensure that no one else is touching this particular index.
@@ -4193,7 +4225,8 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 {
 	Relation	rel;
 	Oid			toast_relid;
-	List	   *indexIds;
+	List	   *indexIds,
+			   *auxIndexIds = NIL;
 	char		persistence;
 	bool		result = false;
 	ListCell   *indexId;
@@ -4282,13 +4315,30 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	else
 		persistence = rel->rd_rel->relpersistence;
 
+	foreach(indexId, indexIds)
+	{
+		Oid			indexOid = lfirst_oid(indexId);
+		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* All STIR indexes are auxiliary indexes */
+		if (indexAm == STIR_AM_OID)
+		{
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			auxIndexIds = lappend_oid(auxIndexIds, indexOid);
+		}
+	}
+
 	/* Reindex all the indexes. */
 	i = 1;
 	foreach(indexId, indexIds)
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
-		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* Auxiliary indexes are going to be dropped during main index rebuild */
+		if (list_member_oid(auxIndexIds, indexOid))
+			continue;
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4314,18 +4364,6 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
-		if (indexAm == STIR_AM_OID)
-		{
-			ereport(WARNING,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
-							get_namespace_name(indexNamespaceId),
-							get_rel_name(indexOid))));
-			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
-				RemoveReindexPending(indexOid);
-			continue;
-		}
-
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 07c2d41c189..deacd2f7c95 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -20,6 +20,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am_d.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_depend.h"
 #include "catalog/pg_extension.h"
@@ -1108,6 +1109,67 @@ get_index_constraint(Oid indexId)
 	return constraintId;
 }
 
+/*
+ * get_auxiliary_index
+ *		Given the OID of an index, return the OID of its auxiliary
+ *		index, or InvalidOid if there is no auxiliary index.
+ */
+Oid
+get_auxiliary_index(Oid indexId)
+{
+	Oid			auxiliaryIndexOid = InvalidOid;
+	Relation	depRel;
+	ScanKeyData key[3];
+	SysScanDesc scan;
+	HeapTuple	tup;
+
+	/* Search the dependency table for the index */
+	depRel = table_open(DependRelationId, AccessShareLock);
+
+	ScanKeyInit(&key[0],
+				Anum_pg_depend_refclassid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationRelationId));
+	ScanKeyInit(&key[1],
+				Anum_pg_depend_refobjid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indexId));
+	ScanKeyInit(&key[2],
+				Anum_pg_depend_refobjsubid,
+				BTEqualStrategyNumber, F_INT4EQ,
+				Int32GetDatum(0));
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+							  NULL, 3, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend deprec = (Form_pg_depend) GETSTRUCT(tup);
+
+		/*
+		 * Look for an AUTO dependency on a STIR index.  There can be at most
+		 * one STIR auxiliary per index, so we stop at the first match.
+		 * Transitive auxiliaries (e.g. ccnew_ccaux from a failed REINDEX
+		 * CONCURRENTLY) are found by calling this with the ccnew OID, and
+		 * are also cleaned up automatically via cascading AUTO dependency
+		 * when the intermediate index is dropped.
+		 */
+		if (deprec->classid == RelationRelationId &&
+			(deprec->deptype == DEPENDENCY_AUTO) &&
+			get_rel_relkind(deprec->objid) == RELKIND_INDEX &&
+			get_rel_relam(deprec->objid) == STIR_AM_OID)
+		{
+			auxiliaryIndexOid = deprec->objid;
+			break;
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(depRel, AccessShareLock);
+
+	return auxiliaryIndexOid;
+}
+
 /*
  * get_index_ref_constraints
  *		Given the OID of an index, return the OID of all foreign key
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index c33e43df1ec..b16eac0357f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -314,6 +314,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
 	indexInfo->ii_Auxiliary = false;
+	indexInfo->ii_AuxiliaryForIndexId = InvalidOid;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index cb07e1ae389..de603d3ff83 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -247,7 +247,7 @@ CheckIndexCompatible(Oid oldId,
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
 							  false, false, amsummarizing,
-							  isWithoutOverlaps, isauxiliary);
+							  isWithoutOverlaps, isauxiliary, InvalidOid);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -947,7 +947,8 @@ DefineIndex(ParseState *pstate,
 							  concurrent,
 							  amissummarizing,
 							  stmt->iswithoutoverlaps,
-							  false);
+							  false,
+							  InvalidOid);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -3711,6 +3712,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		Oid			indexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -4060,6 +4062,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
@@ -4067,6 +4070,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		int			save_nestlevel;
 		Relation	newIndexRel;
 		Relation	auxIndexRel;
+		Relation	junkAuxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -4140,12 +4144,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 												   tablespaceid,
 												   auxConcurrentName);
 
+		/* Search for auxiliary index for reindexed index, to drop it */
+		junkAuxIndexId = get_auxiliary_index(idx->indexId);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
+		if (OidIsValid(junkAuxIndexId))
+			junkAuxIndexRel = index_open(junkAuxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4155,6 +4164,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
 		newidx->auxIndexId = auxIndexId;
+		newidx->junkAuxIndexId = junkAuxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4176,10 +4186,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		if (OidIsValid(junkAuxIndexId))
+		{
+			lockrelid = palloc_object(LockRelId);
+			*lockrelid = junkAuxIndexRel->rd_lockInfo.lockRelId;
+			relationLocks = lappend(relationLocks, lockrelid);
+		}
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		if (OidIsValid(junkAuxIndexId))
+			index_close(junkAuxIndexRel, NoLock);
 		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
@@ -4368,7 +4386,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 	/*
 	 * At this moment all target indexes are marked as "ready-to-insert". So,
-	 * we are free to start process of dropping auxiliary indexes.
+	 * we are free to start process of dropping auxiliary indexes - including
+	 * junk indexes detected earlier.
 	 */
 	foreach(lc, newIndexIds)
 	{
@@ -4391,6 +4410,9 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 */
 		PushActiveSnapshot(GetTransactionSnapshot());
 		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		/* Ensure the junk index is marked as non-ready */
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_set_state_flags(newidx->junkAuxIndexId, INDEX_DROP_CLEAR_READY);
 		PopActiveSnapshot();
 
 		CommitTransactionCommand();
@@ -4610,6 +4632,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_concurrently_set_dead(newidx->tableId, newidx->junkAuxIndexId);
 
 		PopActiveSnapshot();
 	}
@@ -4661,6 +4685,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			object.objectSubId = 0;
 
 			add_exact_object_address(&object, objects);
+
+			if (OidIsValid(idx->junkAuxIndexId))
+			{
+				object.objectId = idx->junkAuxIndexId;
+				object.objectSubId = 0;
+				add_exact_object_address(&object, objects);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..df29a7021b7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1567,6 +1567,8 @@ RemoveRelations(DropStmt *drop)
 	ListCell   *cell;
 	int			flags = 0;
 	LOCKMODE	lockmode = AccessExclusiveLock;
+	MemoryContext private_context,
+				  oldcontext;
 
 	/* DROP CONCURRENTLY uses a weaker lock, and has some restrictions */
 	if (drop->concurrent)
@@ -1631,9 +1633,20 @@ RemoveRelations(DropStmt *drop)
 			relkind = 0;		/* keep compiler quiet */
 			break;
 	}
+	/*
+	 * Create a memory context that will survive forced transaction commits we
+	 * may need to do below (in case of concurrent index drop).
+	 * Since it is a child of PortalContext, it will go away eventually even if
+	 * we suffer an error; there's no need for special abort cleanup logic.
+	 */
+	private_context = AllocSetContextCreate(PortalContext,
+											"RemoveRelations",
+											ALLOCSET_SMALL_SIZES);
 
+	oldcontext = MemoryContextSwitchTo(private_context);
 	/* Lock and validate each relation; build a list of object addresses */
 	objects = new_object_addresses();
+	MemoryContextSwitchTo(oldcontext);
 
 	foreach(cell, drop->objects)
 	{
@@ -1685,6 +1698,38 @@ RemoveRelations(DropStmt *drop)
 			flags |= PERFORM_DELETION_CONCURRENTLY;
 		}
 
+		/*
+		 * Concurrent index drop requires it to be the first transaction. But in
+		 * case we have junk auxiliary index - we want to drop it too (and also
+		 * in a concurrent way). In this case perform silent internal deletion
+		 * of auxiliary index, and restore transaction state. It is fine to do it
+		 * in the loop because there is only single element in drop->objects.
+		 */
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
+			state.actual_relkind == RELKIND_INDEX)
+		{
+			Oid junkAuxIndexOid = get_auxiliary_index(relOid);
+			if (OidIsValid(junkAuxIndexOid))
+			{
+				ObjectAddress object;
+				object.classId = RelationRelationId;
+				object.objectId = junkAuxIndexOid;
+				object.objectSubId = 0;
+				performDeletion(&object, DROP_RESTRICT,
+										 PERFORM_DELETION_CONCURRENTLY |
+										 PERFORM_DELETION_INTERNAL |
+										 PERFORM_DELETION_QUIETLY);
+				CommitTransactionCommand();
+				MemoryContextDelete(private_context);
+
+				/* And start again - now without auxiliary index. */
+				StartTransactionCommand();
+				PushActiveSnapshot(GetTransactionSnapshot());
+				RemoveRelations(drop);
+				return;
+			}
+		}
+
 		/*
 		 * Concurrent index drop cannot be used with partitioned indexes,
 		 * either.
@@ -1713,12 +1758,17 @@ RemoveRelations(DropStmt *drop)
 		obj.objectId = relOid;
 		obj.objectSubId = 0;
 
+		oldcontext = MemoryContextSwitchTo(private_context);
 		add_exact_object_address(&obj, objects);
+		MemoryContextSwitchTo(oldcontext);
 	}
 
+	/* Deletion may involve multiple commits, so, switch to memory context */
+	oldcontext = MemoryContextSwitchTo(private_context);
 	performMultipleDeletions(objects, drop->behavior, flags);
+	MemoryContextSwitchTo(oldcontext);
 
-	free_object_addresses(objects);
+	MemoryContextDelete(private_context);
 }
 
 /*
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 84f7cf9824e..c54748ff644 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps, bool auxiliary)
+			  bool withoutoverlaps, bool auxiliary, Oid auxiliary_for_index_id)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -851,6 +851,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
 	n->ii_Auxiliary = auxiliary;
+	n->ii_AuxiliaryForIndexId = auxiliary_for_index_id;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 2f3c1eae3c7..6ae210c584e 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -193,6 +193,7 @@ extern List *getOwnedSequences(Oid relid);
 extern Oid	getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok);
 
 extern Oid	get_index_constraint(Oid indexId);
+extern Oid	get_auxiliary_index(Oid indexId);
 
 extern List *get_index_ref_constraints(Oid indexId);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 74efa237212..136dddbbf11 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -229,6 +229,8 @@ typedef struct IndexInfo
 	int			ii_ParallelWorkers;
 	/* is auxiliary for concurrent index build? */
 	bool		ii_Auxiliary;
+	/* if creating an auxiliary index, the OID of the main index; otherwise InvalidOid. */
+	Oid			ii_AuxiliaryForIndexId;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index cd7f1eb0592..3a704781c8b 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -100,7 +100,7 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
 								bool summarizing, bool withoutoverlaps,
-								bool auxiliary);
+								bool auxiliary, Oid auxiliary_for_index_id);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index d1723f47e89..2d6abb15a89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3279,20 +3279,109 @@ ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-ERROR:  relation "concur_reindex_tab4" does not exist
-LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-                    ^
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
-ERROR:  could not create unique index "aux_index_ind6"
-DETAIL:  Key (c1)=(1) is duplicated.
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
 WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
 HINT:  Use DROP INDEX or REINDEX INDEX.
 WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
 NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+-- Make sure it is still exists
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
+DROP INDEX aux_index_ind6;
+ERROR:  index "aux_index_ind6" does not exist
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index c2c1b031527..fd96d80abbc 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1344,11 +1344,62 @@ REINDEX INDEX aux_index_ind6_ccaux;
 -- Concurrently also
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
+-- Make sure it is still exists
+\d aux_index_tab5
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.43.0



  [application/octet-stream] v32-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch (98.0K, 5-v32-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch)
  download | inline diff:
From e0ea26d0562821d2ab8090c26573da675b19b2f5 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 15:03:10 +0100
Subject: [PATCH v32 4/7] Use auxiliary indexes for concurrent index operations

Replace the second table full scan in concurrent index builds with an auxiliary index approach:
- create a STIR auxiliary index with the same predicate (if exists) as in main index
- use it to track tuples inserted during the first phase
- merge auxiliary index with main index during validation to catch up new index with any tuples missed during the first phase
- automatically drop auxiliary when main index is ready

To merge main and auxiliary indexes:
- index_bulk_delete called for both, TIDs put into tuplesort
- both tuplesort are being sorted
- both tuplesort scanned with two pointers looking for the TIDs present in auxiliary index, but absent in main one
- all such TIDs are put into tuplestore
- all TIDs in tuplestore are fetched using the stream, tuplestore used in heapam_index_validate_scan_read_stream_next to provide the next page to prefetch
- if fetched tuple is alive - it is inserted into the main index

This eliminates the need for a second full table scan during validation, improving performance, especially for large tables. Affects both CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY operations.
---
 doc/src/sgml/monitoring.sgml               |  26 +-
 doc/src/sgml/ref/create_index.sgml         |  34 +-
 doc/src/sgml/ref/reindex.sgml              |  40 +-
 src/backend/access/heap/README.HOT         |  13 +-
 src/backend/access/heap/heapam_handler.c   | 561 ++++++++++++++-------
 src/backend/catalog/index.c                | 322 ++++++++++--
 src/backend/catalog/system_views.sql       |  17 +-
 src/backend/commands/indexcmds.c           | 344 +++++++++++--
 src/backend/nodes/makefuncs.c              |   4 +-
 src/backend/utils/misc/guc_parameters.dat  |   9 +
 src/include/access/tableam.h               |  12 +-
 src/include/catalog/index.h                |   9 +-
 src/include/commands/progress.h            |  13 +-
 src/include/miscadmin.h                    |   1 +
 src/include/nodes/makefuncs.h              |   3 +-
 src/test/regress/expected/create_index.out |  42 ++
 src/test/regress/expected/indexing.out     |   3 +-
 src/test/regress/expected/rules.out        |  17 +-
 src/test/regress/sql/create_index.sql      |  21 +
 19 files changed, 1155 insertions(+), 336 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bb75ed1069b..835b4aeed77 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6787,6 +6787,18 @@ FROM pg_stat_get_backend_idset() AS backendid;
        information for this phase.
       </entry>
      </row>
+     <row>
+      <entry><literal>waiting for writers to use auxiliary index</literal></entry>
+      <entry>
+       <command>CREATE INDEX CONCURRENTLY</command> or <command>REINDEX CONCURRENTLY</command> is waiting for transactions
+       with write locks that can potentially see the table to finish, to ensure use of auxiliary index for new tuples in
+       future transactions.
+       This phase is skipped when not in concurrent mode.
+       Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
+       and <structname>current_locker_pid</structname> contain the progress
+       information for this phase.
+      </entry>
+     </row>
      <row>
       <entry><literal>building index</literal></entry>
       <entry>
@@ -6827,13 +6839,12 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </entry>
      </row>
      <row>
-      <entry><literal>index validation: scanning table</literal></entry>
+      <entry><literal>index validation: merging indexes</literal></entry>
       <entry>
-       <command>CREATE INDEX CONCURRENTLY</command> is scanning the table
-       to validate the index tuples collected in the previous two phases.
+       <command>CREATE INDEX CONCURRENTLY</command> is merging content of auxiliary index with the target index.
        This phase is skipped when not in concurrent mode.
-       Columns <structname>blocks_total</structname> (set to the total size of the table)
-       and <structname>blocks_done</structname> contain the progress information for this phase.
+       Columns <structname>tuples_total</structname> (set to the number of tuples to be merged)
+       and <structname>tuples_done</structname> contain the progress information for this phase.
       </entry>
      </row>
      <row>
@@ -6850,8 +6861,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
      <row>
       <entry><literal>waiting for readers before marking dead</literal></entry>
       <entry>
-       <command>REINDEX CONCURRENTLY</command> is waiting for transactions
-       with read locks on the table to finish, before marking the old index dead.
+       <command>CREATE INDEX CONCURRENTLY</command> is waiting for transactions
+        with read locks on the table to finish, before marking the auxiliary index as dead.
+       <command>REINDEX CONCURRENTLY</command> is also waiting before marking the old index as dead.
        This phase is skipped when not in concurrent mode.
        Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
        and <structname>current_locker_pid</structname> contain the progress
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index bb7505d171b..12c88587a79 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -620,10 +620,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     out writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>CREATE INDEX</command>.
     When this option is used,
-    <productname>PostgreSQL</productname> must perform two scans of the table, and in
-    addition it must wait for all existing transactions that could potentially
-    modify or use the index to terminate.  Thus
-    this method requires more total work than a standard index build and takes
+    <productname>PostgreSQL</productname> must perform table scan followed by
+    validation phase, and in addition it must wait for all existing transactions
+    that could potentially modify or use the index to terminate.  Thus
+    this method requires more total work than a standard index build and may take
     significantly longer to complete.  However, since it allows normal
     operations to continue while the index is built, this method is useful for
     adding new indexes in a production environment.  Of course, the extra CPU
@@ -631,14 +631,14 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </para>
 
    <para>
-    In a concurrent index build, the index is actually entered as an
-    <quote>invalid</quote> index into
-    the system catalogs in one transaction, then two table scans occur in
-    two more transactions.  Before each table scan, the index build must
+    In a concurrent index build, the main and auxiliary indexes are actually
+    entered as an <quote>invalid</quote> index into
+    the system catalogs in one transaction, then two phases occur in
+    multiple transactions.  Before each phase, the index build must
     wait for existing transactions that have modified the table to terminate.
-    After the second scan, the index build must wait for any transactions
+    After the second phase, the index build must wait for any transactions
     that have a snapshot (see <xref linkend="mvcc"/>) predating the second
-    scan to terminate, including transactions used by any phase of concurrent
+    phase to terminate, including transactions used by any phase of concurrent
     index builds on other tables, if the indexes involved are partial or have
     columns that are not simple column references.
     Then finally the index can be marked <quote>valid</quote> and ready for use,
@@ -651,10 +651,11 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    <para>
     If a problem arises while scanning the table, such as a deadlock or a
     uniqueness violation in a unique index, the <command>CREATE INDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> index. This index
-    will be ignored for querying purposes because it might be incomplete;
-    however it will still consume update overhead. The <application>psql</application>
-    <command>\d</command> command will report such an index as <literal>INVALID</literal>:
+    command will fail but leave behind an <quote>invalid</quote> index and its
+    associated auxiliary index. These indexes
+    will be ignored for querying purposes because they might be incomplete;
+    however they will still consume update overhead. The <application>psql</application>
+    <command>\d</command> command will report such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -664,11 +665,12 @@ postgres=# \d tab
  col    | integer |           |          |
 Indexes:
     "idx" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     The recommended recovery
-    method in such cases is to drop the index and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>.  (Another possibility is
+    method in such cases is to drop these indexes and try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
     to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
    </para>
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 185cd75ca30..9e0248261ae 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -368,9 +368,8 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <productname>PostgreSQL</productname> supports rebuilding indexes with minimum locking
     of writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>REINDEX</command>. When this option
-    is used, <productname>PostgreSQL</productname> must perform two scans of the table
-    for each index that needs to be rebuilt and wait for termination of
-    all existing transactions that could potentially use the index.
+    is used, <productname>PostgreSQL</productname> must perform several steps to ensure data
+    consistency while allowing normal operations to continue.
     This method requires more total work than a standard index
     rebuild and takes significantly longer to complete as it needs to wait
     for unfinished transactions that might modify the index. However, since
@@ -388,7 +387,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <orderedlist>
      <listitem>
       <para>
-       A new transient index definition is added to the catalog
+       A new transient index definition and an auxiliary index are added to the catalog
        <literal>pg_index</literal>.  This definition will be used to replace
        the old index.  A <literal>SHARE UPDATE EXCLUSIVE</literal> lock at
        session level is taken on the indexes being reindexed as well as their
@@ -398,7 +397,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       A first pass to build the index is done for each new index.  Once the
+       The auxiliary index is marked as "ready for inserts", making
+       it visible to other sessions. This index efficiently tracks all new
+       tuples during the reindex process.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The new main index is built by scanning the table.  Once the
        index is built, its flag <literal>pg_index.indisready</literal> is
        switched to <quote>true</quote> to make it ready for inserts, making it
        visible to other sessions once the transaction that performed the build
@@ -409,9 +416,9 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       Then a second pass is performed to add tuples that were added while the
-       first pass was running.  This step is also done in a separate
-       transaction for each index.
+       A validation phase merges any missing entries from the auxiliary index
+       into the main index, ensuring all concurrent changes are captured.
+       This step is also done in a separate transaction for each index.
       </para>
      </listitem>
 
@@ -428,7 +435,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes have <literal>pg_index.indisready</literal> switched to
+       The old and auxiliary indexes have <literal>pg_index.indisready</literal> switched to
        <quote>false</quote> to prevent any new tuple insertions, after waiting
        for running queries that might reference the old index to complete.
       </para>
@@ -436,7 +443,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes are dropped.  The <literal>SHARE UPDATE
+       The old and auxiliary indexes are dropped.  The <literal>SHARE UPDATE
        EXCLUSIVE</literal> session locks for the indexes and the table are
        released.
       </para>
@@ -447,11 +454,11 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
    <para>
     If a problem arises while rebuilding the indexes, such as a
     uniqueness violation in a unique index, the <command>REINDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> new index in addition to
-    the pre-existing one. This index will be ignored for querying purposes
-    because it might be incomplete; however it will still consume update
+    command will fail but leave behind an <quote>invalid</quote> new index and its auxiliary index in addition to
+    the pre-existing one. These indexes will be ignored for querying purposes
+    because they might be incomplete; however they will still consume update
     overhead. The <application>psql</application> <command>\d</command> command will report
-    such an index as <literal>INVALID</literal>:
+    such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -462,12 +469,13 @@ postgres=# \d tab
 Indexes:
     "idx" btree (col)
     "idx_ccnew" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     If the index marked <literal>INVALID</literal> is suffixed
-    <literal>_ccnew</literal>, then it corresponds to the transient
+    <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
-    recovery method is to drop it using <literal>DROP INDEX</literal>,
+    recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
     then attempt <command>REINDEX CONCURRENTLY</command> again.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 74e407f375a..b1c797517ee 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -375,6 +375,11 @@ constraint on which updates can be HOT.  Other transactions must include
 such an index when determining HOT-safety of updates, even though they
 must ignore it for both insertion and searching purposes.
 
+Also, special auxiliary index is created the same way. It is marked as
+"ready for inserts" without any actual table scan. Its purpose is to collect
+new tuples inserted into table while our target index is still "not ready
+for inserts".
+
 We must do this to avoid making incorrect index entries.  For example,
 suppose we are building an index on column X and we make an index entry for
 a non-HOT tuple with X=1.  Then some other backend, unaware that X is an
@@ -394,10 +399,10 @@ entry at the root of the HOT-update chain but we use the key value from the
 live tuple.
 
 We mark the index open for inserts (but still not ready for reads) then
-we again wait for transactions which have the table open.  Then we take
-a second reference snapshot and validate the index.  This searches for
-tuples missing from the index, and inserts any missing ones.  Again,
-the index entries have to have TIDs equal to HOT-chain root TIDs, but
+we again wait for transactions which have the table open.  Then validate
+the index.  This searches for tuples missing from the index in auxiliary
+index, and inserts any missing ones if they are visible to reference snapshot.
+Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..194ac75caa5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -42,15 +42,20 @@
 #include "storage/lmgr.h"
 #include "storage/lock.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/tuplesort.h"
+#include "utils/tuplestore.h"
+
+/* GUC: percentage of maintenance_work_mem for CIC validation tuplestore */
+int			debug_cic_validate_store_mem_pct = 10;
 
 static void reform_and_rewrite_tuple(HeapTuple tuple,
-									 Relation OldHeap, Relation NewHeap,
-									 Datum *values, bool *isnull, RewriteState rwstate);
+                                     Relation OldHeap, Relation NewHeap,
+                                     Datum *values, bool *isnull, RewriteState rwstate);
 
 static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer,
 								   HeapTuple tuple,
@@ -1769,242 +1774,422 @@ heapam_index_build_range_scan(Relation heapRelation,
 	return reltuples;
 }
 
+/*
+ * Calculate set difference (relative complement) of main and aux
+ * sets.
+ *
+ * All records which are present in auxiliary tuplesort but not in
+ * main are added to the store.
+ *
+ * In set theory notation store = aux - main or store = aux / main.
+ *
+ * returns number of items added to store
+ */
+static int64
+heapam_index_validate_tuplesort_difference(Tuplesortstate *main,
+										   Tuplesortstate *aux,
+										   Tuplestorestate *store)
+{
+	int64		num = 0;
+	/* state variables for the merge */
+	ItemPointer	indexcursor = NULL,
+					auxindexcursor = NULL;
+	ItemPointerData decoded,
+					auxdecoded;
+	bool			tuplesort_empty = false,
+					auxtuplesort_empty = false;
+
+	/* Initialize pointers. */
+	ItemPointerSetInvalid(&decoded);
+	ItemPointerSetInvalid(&auxdecoded);
+
+	/*
+	 * Main loop: we step through the auxiliary sort (auxState->tuplesort),
+	 * which holds TIDs that must compared to those from the "main" sort
+	 * (state->tuplesort).
+	 */
+	while (!auxtuplesort_empty)
+	{
+		Datum		ts_val;
+		bool		ts_isnull;
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Attempt to fetch the next TID from the auxiliary sort. If it's
+		 * empty, we set auxindexcursor to NULL.
+		 */
+		auxtuplesort_empty = !tuplesort_getdatum(aux, true,
+												 false, &ts_val, &ts_isnull,
+												 NULL);
+		Assert(auxtuplesort_empty || !ts_isnull);
+		if (!auxtuplesort_empty)
+		{
+			itemptr_decode(&auxdecoded, DatumGetInt64(ts_val));
+			auxindexcursor = &auxdecoded;
+		}
+		else
+		{
+			auxindexcursor = NULL;
+		}
+
+		/*
+		 * If the auxiliary sort is not yet empty, we now try to synchronize
+		 * the "main" sort cursor (indexcursor) with auxindexcursor. We advance
+		 * the main sort cursor until we've reached or passed the auxiliary TID.
+		 */
+		if (!auxtuplesort_empty)
+		{
+			/*
+			 * Move the main sort forward while:
+			 *   (1) It's not exhausted (tuplesort_empty == false), and
+			 *   (2) Either indexcursor is NULL (first iteration) or
+			 *       indexcursor < auxindexcursor in TID order.
+			 */
+			while (!tuplesort_empty && (indexcursor == NULL || /* null on first time here */
+						ItemPointerCompare(indexcursor, auxindexcursor) < 0))
+			{
+				/*
+				 * Get the next TID from the main sort. If it's empty,
+				 * we set indexcursor to NULL.
+				 */
+				tuplesort_empty = !tuplesort_getdatum(main, true,
+													  false, &ts_val, &ts_isnull,
+													  NULL);
+				Assert(tuplesort_empty || !ts_isnull);
+
+				if (!tuplesort_empty)
+				{
+					itemptr_decode(&decoded, DatumGetInt64(ts_val));
+					indexcursor = &decoded;
+				}
+				else
+				{
+					indexcursor = NULL;
+				}
+
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * Now, if either:
+			 *  - the main sort is empty, or
+			 *  - indexcursor > auxindexcursor,
+			 *
+			 * then auxindexcursor identifies a TID that doesn't appear in
+			 * the main sort. We likely need to insert it
+			 * into the target index if it’s visible in the heap.
+			 */
+			if (tuplesort_empty || ItemPointerCompare(indexcursor, auxindexcursor) > 0)
+			{
+				tuplestore_putdatum(store, Int64GetDatum(itemptr_encode(auxindexcursor)));
+				num++;
+			}
+		}
+	}
+
+	return num;
+}
+
+typedef struct ValidateIndexScanState
+{
+	Tuplestorestate		*store;
+	BlockNumber			prev_block_number;
+	OffsetNumber		prev_offset_number;
+} ValidateIndexScanState;
+
+/*
+ * This is ReadStreamBlockNumberCB implementation which works as follows:
+ *
+ * 1) It iterates over a sorted tuplestore, where each element is an encoded
+ *    ItemPointer
+ *
+ * 2) It returns the current BlockNumber and collects all OffsetNumbers
+ *    for that block in per_buffer_data.
+ *
+ * 3) Once the code encounters a new BlockNumber, it stops reading more
+ *    offsets and saves the OffsetNumber of the new block for the next call.
+ *
+ * 4) The list of offsets for a block is always terminated with InvalidOffsetNumber.
+ *
+ * This function is intended to be repeatedly called, each time returning
+ * the next block and its corresponding set of offsets.
+ */
+static BlockNumber
+heapam_index_validate_scan_read_stream_next(
+								  ReadStream *stream,
+								  void *void_callback_private_data,
+								  void *void_per_buffer_data
+								  )
+{
+	bool should_free;
+	Datum datum;
+	BlockNumber result = InvalidBlockNumber;
+	int i = 0;
+
+	/*
+	 * Retrieve the specialized callback state and the output buffer.
+	 * callback_private_data keeps track of the previous block and offset
+	 * from a prior invocation, if any.
+	 */
+	ValidateIndexScanState *callback_private_data = void_callback_private_data;
+	OffsetNumber *per_buffer_data = void_per_buffer_data;
+
+	/*
+	 * If there is a "leftover" offset number from the previous invocation,
+	 * it means we had switched to a new block in the middle of the last call.
+	 * We place that leftover offset number into the buffer first.
+	 */
+	if (callback_private_data->prev_offset_number != InvalidOffsetNumber)
+	{
+		Assert(callback_private_data->prev_block_number != InvalidBlockNumber);
+		/*
+		 * 'result' is the block number to return. We set it to the block
+		 * from the previous leftover offset.
+		 */
+		result = callback_private_data->prev_block_number;
+		/* Place leftover offset number in the output buffer. */
+		per_buffer_data[i++] = callback_private_data->prev_offset_number;
+		/*
+		 * Clear the leftover offset number so it won't be reused unless
+		 * we encounter another block change.
+		 */
+		callback_private_data->prev_offset_number = InvalidOffsetNumber;
+	}
+
+	/*
+	 * Read from the tuplestore until we either run out of tuples or we
+	 * encounter a block change. For each tuple:
+	 *
+	 *   1) Decode its block/offset from the Datum.
+	 *   2) If it's the first time in this call (prev_block_number == InvalidBlockNumber),
+	 *      initialize prev_block_number.
+	 *   3) If the block number matches the current block, collect the offset.
+	 *   4) If the block number differs, save that offset as leftover and break
+	 *      so that the next call can handle the new block.
+	 */
+	while (tuplestore_getdatum(callback_private_data->store, true, &should_free, &datum))
+	{
+		BlockNumber next_block_number;
+		ItemPointerData next_data;
+
+		/* Decode the datum into an ItemPointer (block + offset). */
+		itemptr_decode(&next_data, DatumGetInt64(datum));
+		next_block_number = ItemPointerGetBlockNumber(&next_data);
+
+		/*
+		 * If we haven't set a block number yet this round, initialize it
+		 * using the first tuple we read.
+		 */
+		if (callback_private_data->prev_block_number == InvalidBlockNumber)
+			callback_private_data->prev_block_number = next_block_number;
+
+		/*
+		 * Always set the result to be the "current" block number
+		 * we are filling offsets for.
+		 */
+		result = callback_private_data->prev_block_number;
+
+		/*
+		 * If this tuple is from the same block, just store its offset
+		 * in our per_buffer_data array.
+		 */
+		if (next_block_number == callback_private_data->prev_block_number)
+		{
+			per_buffer_data[i++] = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+		}
+		else
+		{
+			/*
+			 * If the block just changed, store the offset of the new block
+			 * as leftover for the next invocation and break out.
+			 */
+			callback_private_data->prev_block_number = next_block_number;
+			callback_private_data->prev_offset_number = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+
+			/* Break to let the next call handle the new block. */
+			break;
+		}
+	}
+
+	/*
+	 * Terminate the list of offsets for this block with an InvalidOffsetNumber.
+	 */
+	per_buffer_data[i] = InvalidOffsetNumber;
+	return result;
+}
+
 static void
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
 						   Snapshot snapshot,
-						   ValidateIndexState *state)
+						   ValidateIndexState *state,
+						   ValidateIndexState *auxState)
 {
-	TableScanDesc scan;
-	HeapScanDesc hscan;
-	HeapTuple	heapTuple;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-	BlockNumber previous_blkno = InvalidBlockNumber;
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
+
+	TupleTableSlot  *slot;
+	EState			*estate;
+	ExprContext		*econtext;
+	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	int64			num_to_check;
+	Tuplestorestate *tuples_for_check;
+	ValidateIndexScanState callback_private_data;
+
+	Buffer buf;
+	OffsetNumber *tuples;
+	ReadStream *read_stream;
+
+	/* Use a percentage of maintenance_work_mem for tuple store. */
+	int		store_work_mem_part = maintenance_work_mem * debug_cic_validate_store_mem_pct / 100;
+
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting
+	 * item pointers.  This can be significantly faster, primarily because TID
+	 * is a pass-by-reference type on all platforms, whereas int8 is
+	 * pass-by-value on most platforms.
+	 */
+	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
 	/*
 	 * sanity checks
 	 */
 	Assert(OidIsValid(indexRelation->rd_rel->relam));
 
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
+	num_to_check = heapam_index_validate_tuplesort_difference(state->tuplesort,
+														 auxState->tuplesort,
+														 tuples_for_check);
+
+	/* It is our responsibility to close tuple sort as fast as we can */
+	tuplesort_end(state->tuplesort);
+	tuplesort_end(auxState->tuplesort);
+
+	state->tuplesort = auxState->tuplesort = NULL;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+									&TTSOpsBufferHeapTuple);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
 
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+	callback_private_data.prev_block_number = InvalidBlockNumber;
+	callback_private_data.store = tuples_for_check;
+	callback_private_data.prev_offset_number = InvalidOffsetNumber;
 
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	scan = table_beginscan_strat(heapRelation,	/* relation */
-								 snapshot,	/* snapshot */
-								 0, /* number of keys */
-								 NULL,	/* scan key */
-								 true,	/* buffer access strategy OK */
-								 false);	/* syncscan not OK */
-	hscan = (HeapScanDesc) scan;
+	read_stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING,
+														 bstrategy,
+														 heapRelation, MAIN_FORKNUM,
+														 heapam_index_validate_scan_read_stream_next,
+														 &callback_private_data,
+														 (MaxHeapTuplesPerPage + 1) * sizeof(OffsetNumber));
 
-	pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_TOTAL,
-								 hscan->rs_nblocks);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_TOTAL, num_to_check);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE, 0);
 
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while ((buf = read_stream_next_buffer(read_stream, (void **) &tuples)) != InvalidBuffer)
 	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
+		HeapTupleData	heap_tuple_data[MaxHeapTuplesPerPage];
+		int i;
+		OffsetNumber off;
+		BlockNumber block_number;
 
 		CHECK_FOR_INTERRUPTS();
 
-		state->htups += 1;
+		LockBuffer(buf, BUFFER_LOCK_SHARE);
+		block_number = BufferGetBlockNumber(buf);
 
-		if ((previous_blkno == InvalidBlockNumber) ||
-			(hscan->rs_cblock != previous_blkno))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
-										 hscan->rs_cblock);
-			previous_blkno = hscan->rs_cblock;
+			ItemPointerData tid;
+			bool		all_dead, found;
+			ItemPointerSet(&tid, block_number, off);
+
+			found = heap_hot_search_buffer(&tid, heapRelation, buf, snapshot,
+										   &heap_tuple_data[i], &all_dead, true);
+			if (!found)
+				ItemPointerSetInvalid(&heap_tuple_data[i].t_self);
+			i++;
+			state->htups += 1;
 		}
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		/*
-		 * As commented in table_index_build_scan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
+		 * It is safe to access tuple data after releasing the buffer lock
+		 * because the buffer pin is still held, and the only operation that
+		 * could physically move tuple data on the page is
+		 * PageRepairFragmentation via heap_page_prune.  VACUUM conflicts with
+		 * CIC (both take ShareUpdateExclusiveLock), and opportunistic pruning
+		 * from concurrent DML cannot affect root tuples we are referencing.
 		 */
-		if (hscan->rs_cblock != root_blkno)
-		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
 		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
+		 * No predicate evaluation is needed here: the auxiliary STIR index
+		 * only contains TIDs for tuples that already satisfied the partial
+		 * index predicate at DML time (checked in ExecInsertIndexTuples).
 		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
+			if (ItemPointerIsValid(&heap_tuple_data[i].t_self))
 			{
+				ItemPointerData root_tid;
+				ItemPointerSet(&root_tid, block_number, off);
+
+				/* Reset the per-tuple memory context for the next fetch. */
+				MemoryContextReset(econtext->ecxt_per_tuple_memory);
+				ExecStoreBufferHeapTuple(&heap_tuple_data[i], slot, buf);
+
+				/* Compute the key values and null flags for this tuple. */
+				FormIndexDatum(indexInfo,
+							   slot,
+							   estate,
+							   values,
+							   isnull);
+
 				/*
-				 * Remember index items seen earlier on the current heap page
+				 * Insert the tuple into the target index.
 				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+				index_insert(indexRelation,
+							 values,
+							 isnull,
+							 &root_tid, /* insert root tuple */
+							 heapRelation,
+							 indexInfo->ii_Unique ?
+							 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+							 false,
+							 indexInfo);
+
+				state->tups_inserted += 1;
 			}
 
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  false, &ts_val, &ts_isnull,
-												  NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_TUPLES_DONE, 1);
+			i++;
 		}
 
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 false,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
+		ReleaseBuffer(buf);
 	}
 
-	table_endscan(scan);
-
 	ExecDropSingleTupleTableSlot(slot);
 
 	FreeExecutorState(estate);
 
+	read_stream_end(read_stream);
+	tuplestore_end(tuples_for_check);
+
+	FreeAccessStrategy(bstrategy);
+
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2fc86ca9c5b..7f29dfa0b28 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -715,11 +715,16 @@ UpdateIndexRelation(Oid indexoid,
  *			already exists.
  *		INDEX_CREATE_PARTITIONED:
  *			create a partitioned index (table must be partitioned)
+ *		INDEX_CREATE_AUXILIARY:
+ *			mark index as auxiliary index
  * constr_flags: flags passed to index_constraint_create
  *		(only if INDEX_CREATE_ADD_CONSTRAINT is set)
  * allow_system_table_mods: allow table to be a system catalog
  * is_internal: if true, post creation hook for new index
  * constraintId: if not NULL, receives OID of created constraint
+ * relpersistence: persistence level to use for index. In most of the
+ *		cases it should be equal to the persistence level of the table,
+ *		auxiliary indexes are only exception here.
  *
  * Returns the OID of the created index.
  */
@@ -760,6 +765,7 @@ index_create(Relation heapRelation,
 	bool		invalid = (flags & INDEX_CREATE_INVALID) != 0;
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
+	bool		auxiliary = (flags & INDEX_CREATE_AUXILIARY) != 0;
 	char		relkind;
 	TransactionId relfrozenxid;
 	MultiXactId relminmxid;
@@ -785,7 +791,10 @@ index_create(Relation heapRelation,
 	namespaceId = RelationGetNamespace(heapRelation);
 	shared_relation = heapRelation->rd_rel->relisshared;
 	mapped_relation = RelationIsMapped(heapRelation);
-	relpersistence = heapRelation->rd_rel->relpersistence;
+	if (auxiliary)
+		relpersistence = RELPERSISTENCE_UNLOGGED; /* aux indexes are always unlogged */
+	else
+		relpersistence = heapRelation->rd_rel->relpersistence;
 
 	/*
 	 * check parameters
@@ -793,6 +802,11 @@ index_create(Relation heapRelation,
 	if (indexInfo->ii_NumIndexAttrs < 1)
 		elog(ERROR, "must index at least one column");
 
+	if (indexInfo->ii_Am == STIR_AM_OID && !auxiliary)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("user-defined indexes with STIR access method are not supported")));
+
 	if (!allow_system_table_mods &&
 		IsSystemRelation(heapRelation) &&
 		IsNormalProcessingMode())
@@ -1398,20 +1412,24 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							false,	/* not ready for inserts */
 							true,
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							false);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
 	 * index information.  All this information will be used for the index
 	 * creation.
 	 */
-	for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
 	{
 		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
-		Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
 
-		indexColNames = lappend(indexColNames, NameStr(att->attname));
-		newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+		{
+			Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+			indexColNames = lappend(indexColNames, NameStr(att->attname));
+			newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		}
 	}
 
 	/* Extract opclass options for each attribute */
@@ -1473,6 +1491,157 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	return newIndexId;
 }
 
+/*
+ * index_concurrently_create_aux
+ *
+ * Create concurrently an auxiliary index based on the definition of the one
+ * provided by caller.  The index is inserted into catalogs and needs to be
+ * built later on. This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ */
+Oid
+index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
+							   Oid tablespaceOid, const char *newName)
+{
+	Relation	indexRelation;
+	IndexInfo  *oldInfo,
+			*newInfo;
+	Oid			newIndexId = InvalidOid;
+	HeapTuple	indexTuple;
+
+	List	   *indexColNames = NIL;
+	List	   *indexExprs = NIL;
+	List	   *indexPreds = NIL;
+
+	Oid *auxOpclassIds;
+	int16 *auxColoptions;
+
+	indexRelation = index_open(mainIndexId, RowExclusiveLock);
+
+	/* The new index needs some information from the old index */
+	oldInfo = BuildIndexInfo(indexRelation);
+
+	/*
+	 * Build of an auxiliary index with exclusion constraints is not
+	 * supported.
+	 */
+	if (oldInfo->ii_ExclusionOps != NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("auxiliary index creation for exclusion constraints is not supported")));
+
+	/* Get the array of class and column options IDs from index info */
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(mainIndexId));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", mainIndexId);
+
+
+	/*
+	 * Fetch the list of expressions and predicates directly from the
+	 * catalogs.  This cannot rely on the information from IndexInfo of the
+	 * old index as these have been flattened for the planner.
+	 */
+	if (oldInfo->ii_Expressions != NIL)
+	{
+		Datum		exprDatum;
+		char	   *exprString;
+
+		exprDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indexprs);
+		exprString = TextDatumGetCString(exprDatum);
+		indexExprs = (List *) stringToNode(exprString);
+		pfree(exprString);
+	}
+	if (oldInfo->ii_Predicate != NIL)
+	{
+		Datum		predDatum;
+		char	   *predString;
+
+		predDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indpred);
+		predString = TextDatumGetCString(predDatum);
+		indexPreds = (List *) stringToNode(predString);
+
+		/* Also convert to implicit-AND format */
+		indexPreds = make_ands_implicit((Expr *) indexPreds);
+		pfree(predString);
+	}
+
+	/*
+	 * Build the index information for the new index.  Note that rebuild of
+	 * indexes with exclusion constraints is not supported, hence there is no
+	 * need to fill all the ii_Exclusion* fields.
+	 */
+	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
+							oldInfo->ii_NumIndexKeyAttrs,
+							STIR_AM_OID, /* special AM for aux indexes */
+							indexExprs,
+							indexPreds,
+							false,	/* aux index are not unique */
+							oldInfo->ii_NullsNotDistinct,
+							false,	/* not ready for inserts */
+							true,
+							false,	/* aux are not summarizing */
+							false,	/* aux are not without overlaps */
+							true	/* auxiliary */);
+
+	/*
+	 * Extract the list of column names and the column numbers for the new
+	 * index information.  All this information will be used for the index
+	 * creation.
+	 */
+	{
+		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
+
+		for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+		{
+			Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+			indexColNames = lappend(indexColNames, NameStr(att->attname));
+			newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		}
+	}
+
+	auxOpclassIds = palloc0(sizeof(Oid) * newInfo->ii_NumIndexAttrs);
+	auxColoptions = palloc0(sizeof(int16) * newInfo->ii_NumIndexAttrs);
+
+	/* Fill with "any ops" */
+	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
+	{
+		auxOpclassIds[i] = ANY_STIR_OPS_OID;
+		auxColoptions[i] = 0;
+	}
+
+	newIndexId = index_create(heapRelation,
+							  newName,
+							  InvalidOid,    /* indexRelationId */
+							  InvalidOid,    /* parentIndexRelid */
+							  InvalidOid,    /* parentConstraintId */
+							  InvalidRelFileNumber, /* relFileNumber */
+							  newInfo,
+							  indexColNames,
+							  STIR_AM_OID,
+							  tablespaceOid,
+							  indexRelation->rd_indcollation,
+							  auxOpclassIds,
+							  NULL,
+							  auxColoptions,
+							  NULL,
+							  (Datum) 0,
+							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT | INDEX_CREATE_AUXILIARY,
+							  0,
+							  true, /* allow table to be a system catalog? */
+							  false,    /* is_internal? */
+							  NULL);
+
+	/* Close the relations used and clean up */
+	index_close(indexRelation, NoLock);
+	ReleaseSysCache(indexTuple);
+
+	return newIndexId;
+}
+
 /*
  * index_concurrently_build
  *
@@ -2453,7 +2622,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2513,7 +2683,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3289,12 +3460,21 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * We do a concurrent index build by first inserting the catalog entry for the
  * index via index_create(), marking it not indisready and not indisvalid.
+ * Then we create special auxiliary index the same way. It based on STIR AM.
  * Then we commit our transaction and start a new one, then we wait for all
  * transactions that could have been modifying the table to terminate.  Now
- * we know that any subsequently-started transactions will see the index and
+ * we know that any subsequently-started transactions will see indexes and
  * honor its constraints on HOT updates; so while existing HOT-chains might
  * be broken with respect to the index, no currently live tuple will have an
- * incompatible HOT update done to it.  We now build the index normally via
+ * incompatible HOT update done to it.
+ *
+ * After that, we build the auxiliary index. It is fast operation without any actual
+ * table scan. As result, we have empty STIR index. We commit transaction and
+ * again wait for all transactions that could have been modifying the table
+ * to terminate. At that moment all new tuples are going to be inserted into
+ * auxiliary index.
+ *
+ * We now build the index normally via
  * index_build(), while holding a weak lock that allows concurrent
  * insert/update/delete.  Also, we index only tuples that are valid
  * as of the start of the scan (see table_index_build_scan), whereas a normal
@@ -3304,14 +3484,17 @@ IndexCheckExclusion(Relation heapRelation,
  * bogus unique-index failures due to concurrent UPDATEs (we might see
  * different versions of the same row as being valid when we pass over them,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
- * does not contain any tuples added to the table while we built the index.
+ * does not contain any tuples added to the table while we built the index
+ * (but these tuples contained in auxiliary index).
  *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
- * commit the second transaction and start a third.  Again we wait for all
+ * commit the third transaction and start a fourth.  Again we wait for all
  * transactions that could have been modifying the table to terminate.  Now
  * we know that any subsequently-started transactions will see the index and
- * insert their new tuples into it.  We then take a new reference snapshot
- * which is passed to validate_index().  Any tuples that are valid according
+ * insert their new tuples into it. At the same moment we clear "indisready" for
+ * auxiliary index, since it is no more required to be updated.
+ *
+ * We then take a new reference snapshot, any tuples that are valid according
  * to this snap, but are not in the index, must be added to the index.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
@@ -3319,12 +3502,14 @@ IndexCheckExclusion(Relation heapRelation,
  * that might care about them before we mark the index valid.)
  *
  * validate_index() works by first gathering all the TIDs currently in the
- * index, using a bulkdelete callback that just stores the TIDs and doesn't
+ * indexes, using a bulkdelete callback that just stores the TIDs and doesn't
  * ever say "delete it".  (This should be faster than a plain indexscan;
  * also, not all index AMs support full-index indexscan.)  Then we sort the
- * TIDs, and finally scan the table doing a "merge join" against the TID list
- * to see which tuples are missing from the index.  Thus we will ensure that
- * all tuples valid according to the reference snapshot are in the index.
+ * TIDs of both auxiliary and target indexes, and doing a "merge join" against
+ * the TID lists to see which tuples from auxiliary index are missing from the
+ * target index.  Thus we will ensure that all tuples valid according to the
+ * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
  * tuple that is already dead or is in process of being deleted, and we
@@ -3342,22 +3527,26 @@ IndexCheckExclusion(Relation heapRelation,
  * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
  * transactions will be able to use it for queries.
  *
- * Doing two full table scans is a brute-force strategy.  We could try to be
- * cleverer, eg storing new tuples in a special area of the table (perhaps
- * making the table append-only by setting use_fsm).  However that would
- * add yet more locking issues.
+ * Also, some actions to concurrent drop the auxiliary index are performed.
  */
 void
-validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 {
 	Relation	heapRelation,
-				indexRelation;
+				indexRelation,
+				auxIndexRelation;
 	IndexInfo  *indexInfo;
-	IndexVacuumInfo ivinfo;
-	ValidateIndexState state;
+	IndexVacuumInfo ivinfo, auxivinfo;
+	ValidateIndexState state, auxState;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	/* Use 80% of maintenance_work_mem to target index sorting and
+	 * 10% rest for auxiliary.
+	 *
+	 * Rest 10% will be used for tuplestore later. */
+	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
+	int			aux_work_mem_part = maintenance_work_mem / 10;
 
 	{
 		const int	progress_index[] = {
@@ -3390,6 +3579,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	RestrictSearchPath();
 
 	indexRelation = index_open(indexId, RowExclusiveLock);
+	auxIndexRelation = index_open(auxIndexId, RowExclusiveLock);
 
 	/*
 	 * Fetch info needed for index_insert.  (You might think this should be
@@ -3414,15 +3604,49 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.strategy = NULL;
 	ivinfo.validate_index = true;
 
+	/*
+	 * Copy all info to auxiliary info, changing only relation.
+	 */
+	auxivinfo = ivinfo;
+	auxivinfo.index = auxIndexRelation;
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
 	 * is a pass-by-reference type on all platforms, whereas int8 is
 	 * pass-by-value on most platforms.
 	 */
+	auxState.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
+										   InvalidOid, false,
+										   aux_work_mem_part,
+										   NULL, TUPLESORT_NONE);
+	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
+
+	(void) index_bulk_delete(&auxivinfo, NULL,
+							 validate_index_callback, &auxState);
+	/* If aux index is empty, merge may be skipped */
+	if (auxState.itups == 0)
+	{
+		tuplesort_end(auxState.tuplesort);
+		auxState.tuplesort = NULL;
+
+		/* Roll back any GUC changes executed by index functions */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+
+		/* Close rels, but keep locks */
+		index_close(auxIndexRelation, NoLock);
+		index_close(indexRelation, NoLock);
+		table_close(heapRelation, NoLock);
+
+		return;
+	}
+
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
 											InvalidOid, false,
-											maintenance_work_mem,
+											(int) main_work_mem_part,
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
@@ -3445,27 +3669,30 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	tuplesort_performsort(auxState.tuplesort);
 
 	/*
-	 * Now scan the heap and "merge" it with the index
+	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN);
+								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
 	table_index_validate_scan(heapRelation,
 							  indexRelation,
 							  indexInfo,
 							  snapshot,
-							  &state);
+							  &state,
+							  &auxState);
 
-	/* Done with tuplesort object */
-	tuplesort_end(state.tuplesort);
+	/* Tuple sort closed by table_index_validate_scan */
+	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
 
 	/* Make sure to release resources cached in indexInfo (if needed). */
 	index_insert_cleanup(indexRelation, indexInfo);
 
 	elog(DEBUG2,
-		 "validate_index found %.0f heap tuples, %.0f index tuples; inserted %.0f missing tuples",
-		 state.htups, state.itups, state.tups_inserted);
+		 "validate_index fetched %.0f heap tuples, %.0f index tuples;"
+						" %.0f aux index tuples; inserted %.0f missing tuples",
+		 state.htups, state.itups, auxState.itups, state.tups_inserted);
 
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
@@ -3474,6 +3701,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
 	/* Close rels, but keep locks */
+	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
 }
@@ -3534,6 +3762,12 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			Assert(!indexForm->indisvalid);
 			indexForm->indisvalid = true;
 			break;
+		case INDEX_DROP_CLEAR_READY:
+			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
+			Assert(indexForm->indisready);
+			Assert(!indexForm->indisvalid);
+			indexForm->indisready = false;
+			break;
 		case INDEX_DROP_CLEAR_VALID:
 
 			/*
@@ -3805,6 +4039,13 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		indexInfo->ii_ExclusionStrats = NULL;
 	}
 
+	/* Auxiliary indexes are not allowed to be rebuilt */
+	if (indexInfo->ii_Auxiliary)
+		ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("reindex of auxiliary index \"%s\" not supported",
+					RelationGetRelationName(iRel))));
+
 	/* Suppress use of the target index while rebuilding it */
 	SetReindexProcessing(heapId, indexId);
 
@@ -4047,6 +4288,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
+		Oid			indexAm = get_rel_relam(indexOid);
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4072,6 +4314,18 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
+		if (indexAm == STIR_AM_OID)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+							get_namespace_name(indexNamespaceId),
+							get_rel_name(indexOid))));
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			continue;
+		}
+
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e54018004db..08634c43ea6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1388,16 +1388,17 @@ CREATE VIEW pg_stat_progress_create_index AS
                       END AS command,
         CASE S.param10 WHEN 0 THEN 'initializing'
                        WHEN 1 THEN 'waiting for writers before build'
-                       WHEN 2 THEN 'building index' ||
+                       WHEN 2 THEN 'waiting for writers to use auxiliary index'
+                       WHEN 3 THEN 'building index' ||
                            COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)),
                                     '')
-                       WHEN 3 THEN 'waiting for writers before validation'
-                       WHEN 4 THEN 'index validation: scanning index'
-                       WHEN 5 THEN 'index validation: sorting tuples'
-                       WHEN 6 THEN 'index validation: scanning table'
-                       WHEN 7 THEN 'waiting for old snapshots'
-                       WHEN 8 THEN 'waiting for readers before marking dead'
-                       WHEN 9 THEN 'waiting for readers before dropping'
+                       WHEN 4 THEN 'waiting for writers before validation'
+                       WHEN 5 THEN 'index validation: scanning index'
+                       WHEN 6 THEN 'index validation: sorting tuples'
+                       WHEN 7 THEN 'index validation: merging indexes'
+                       WHEN 8 THEN 'waiting for old snapshots'
+                       WHEN 9 THEN 'waiting for readers before marking dead'
+                       WHEN 10 THEN 'waiting for readers before dropping'
                        END as phase,
         S.param4 AS lockers_total,
         S.param5 AS lockers_done,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index dd593ccbc1c..cb07e1ae389 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -183,6 +183,7 @@ CheckIndexCompatible(Oid oldId,
 					 bool isWithoutOverlaps)
 {
 	bool		isconstraint;
+	bool		isauxiliary;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
 	Oid		   *opclassIds;
@@ -233,6 +234,7 @@ CheckIndexCompatible(Oid oldId,
 
 	amcanorder = amRoutine->amcanorder;
 	amsummarizing = amRoutine->amsummarizing;
+	isauxiliary = accessMethodId == STIR_AM_OID;
 
 	/*
 	 * Compute the operator classes, collations, and exclusion operators for
@@ -244,7 +246,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing,
+							  isWithoutOverlaps, isauxiliary);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -557,6 +560,7 @@ DefineIndex(ParseState *pstate,
 {
 	bool		concurrent;
 	char	   *indexRelationName;
+	char	   *auxIndexRelationName = NULL;
 	char	   *accessMethodName;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
@@ -566,6 +570,7 @@ DefineIndex(ParseState *pstate,
 	Oid			namespaceId;
 	Oid			tablespaceId;
 	Oid			createdConstraintId = InvalidOid;
+	Oid			auxIndexRelationId = InvalidOid;
 	List	   *indexColNames;
 	List	   *allIndexParams;
 	Relation	rel;
@@ -587,6 +592,7 @@ DefineIndex(ParseState *pstate,
 	int			numberOfKeyAttributes;
 	TransactionId limitXmin;
 	ObjectAddress address;
+	ObjectAddress auxAddress;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
@@ -837,6 +843,15 @@ DefineIndex(ParseState *pstate,
 											stmt->excludeOpNames,
 											stmt->primary,
 											stmt->isconstraint);
+	/*
+	 * Select name for auxiliary index
+	 */
+	if (concurrent)
+		auxIndexRelationName = ChooseRelationName(indexRelationName,
+												  NULL,
+												  "ccaux",
+												  namespaceId,
+												  false);
 
 	/*
 	 * look up the access method, verify it can handle the requested features
@@ -931,7 +946,8 @@ DefineIndex(ParseState *pstate,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  false);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -1603,6 +1619,16 @@ DefineIndex(ParseState *pstate,
 		return address;
 	}
 
+	/*
+	 * In case of concurrent build - create auxiliary index record.
+	 */
+	if (concurrent)
+	{
+		auxIndexRelationId = index_concurrently_create_aux(rel, indexRelationId,
+											tablespaceId, auxIndexRelationName);
+		ObjectAddressSet(auxAddress, RelationRelationId, auxIndexRelationId);
+	}
+
 	AtEOXact_GUC(false, root_save_nestlevel);
 	SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 
@@ -1631,11 +1657,11 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * For a concurrent build, it's important to make the catalog entries
 	 * visible to other transactions before we start to build the index. That
-	 * will prevent them from making incompatible HOT updates.  The new index
-	 * will be marked not indisready and not indisvalid, so that no one else
-	 * tries to either insert into it or use it for queries.
+	 * will prevent them from making incompatible HOT updates. New indexes
+	 * (main and auxiliary) will be marked not indisready and not indisvalid,
+	 * so that no one else tries to either insert into it or use it for queries.
 	 *
-	 * We must commit our current transaction so that the index becomes
+	 * We must commit our current transaction so that the indexes becomes
 	 * visible; then start another.  Note that all the data structures we just
 	 * built are lost in the commit.  The only data we keep past here are the
 	 * relation IDs.
@@ -1645,7 +1671,7 @@ DefineIndex(ParseState *pstate,
 	 * cannot block, even if someone else is waiting for access, because we
 	 * already have the same lock within our transaction.
 	 *
-	 * Note: we don't currently bother with a session lock on the index,
+	 * Note: we don't currently bother with a session lock on the indexes,
 	 * because there are no operations that could change its state while we
 	 * hold lock on the parent table.  This might need to change later.
 	 */
@@ -1684,7 +1710,7 @@ DefineIndex(ParseState *pstate,
 	 * with the old list of indexes.  Use ShareLock to consider running
 	 * transactions that hold locks that permit writing to the table.  Note we
 	 * do not need to worry about xacts that open the table for writing after
-	 * this point; they will see the new index when they open it.
+	 * this point; they will see the new indexes when they open it.
 	 *
 	 * Note: the reason we use actual lock acquisition here, rather than just
 	 * checking the ProcArray and sleeping, is that deadlock is possible if
@@ -1696,14 +1722,44 @@ DefineIndex(ParseState *pstate,
 
 	/*
 	 * At this moment we are sure that there are no transactions with the
-	 * table open for write that don't have this new index in their list of
+	 * table open for write that don't have this new indexes in their list of
 	 * indexes.  We have waited out all the existing transactions and any new
-	 * transaction will have the new index in its list, but the index is still
-	 * marked as "not-ready-for-inserts".  The index is consulted while
+	 * transaction will have both new indexes in its list, but indexes are still
+	 * marked as "not-ready-for-inserts". The indexes are consulted while
 	 * deciding HOT-safety though.  This arrangement ensures that no new HOT
 	 * chains can be created where the new tuple and the old tuple in the
 	 * chain have different index keys.
 	 *
+	 * Now call build on auxiliary index. Index will be created empty without
+	 * any actual heap scan, but marked as "ready-for-inserts". The goal of
+	 * that index is accumulate new tuples while main index is actually built.
+	 */
+
+	/* Set ActiveSnapshot since functions in the indexes may need it */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	index_concurrently_build(tableId, auxIndexRelationId);
+	/* we can do away with our snapshot */
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Now we need to ensure there are no transactions with the auxiliary index
+	 * marked as "not-ready-for-inserts".
+	 */
+	WaitForLockers(heaplocktag, ShareLock, true);
+
+	/*
+	 * At this moment we are sure that all new tuples in table are inserted into
+	 * the auxiliary index. Now it is time to build the target index itself.
+	 *
 	 * We now take a new snapshot, and build the index using all tuples that
 	 * are visible in this snapshot.  We can be sure that any HOT updates to
 	 * these tuples will be compatible with the index, since any updates made
@@ -1738,9 +1794,28 @@ DefineIndex(ParseState *pstate,
 	 * the index marked as read-only for updates.
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForLockers(heaplocktag, ShareLock, true);
 
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/*
+	 * Now target index is marked as "ready" for all transactions. So, auxiliary
+	 * index is no longer needed. So, start removing process by reverting "ready"
+	 * flag.
+	 */
+	index_set_state_flags(auxIndexRelationId, INDEX_DROP_CLEAR_READY);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * Now take the "reference snapshot" that will be used by validate_index()
 	 * to filter candidate tuples.  Beware!  There might still be snapshots in
@@ -1758,24 +1833,14 @@ DefineIndex(ParseState *pstate,
 	 */
 	snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	PushActiveSnapshot(snapshot);
-
 	/*
-	 * Scan the index and the heap, insert any missing index entries.
-	 */
-	validate_index(tableId, indexRelationId, snapshot);
-
-	/*
-	 * Drop the reference snapshot.  We must do this before waiting out other
-	 * snapshot holders, else we will deadlock against other processes also
-	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
-	 * they must wait for.  But first, save the snapshot's xmin to use as
-	 * limitXmin for GetCurrentVirtualXIDs().
+	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
+	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
 	limitXmin = snapshot->xmin;
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
-
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1802,7 +1867,7 @@ DefineIndex(ParseState *pstate,
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForOlderSnapshots(limitXmin, true);
 
 	/*
@@ -1827,6 +1892,53 @@ DefineIndex(ParseState *pstate,
 	 * to replan; so relcache flush on the index itself was sufficient.)
 	 */
 	CacheInvalidateRelcacheByRelid(heaprelid.relId);
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+	/* Now wait for all transaction to see auxiliary as "non-ready for inserts" */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/* Now it is time to mark auxiliary index as dead */
+	index_concurrently_set_dead(tableId, auxIndexRelationId);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_6);
+	/* Now wait for all transaction to ignore auxiliary because it is dead */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Drop auxiliary index.
+	 *
+	 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
+	 * right lock level.
+	 */
+	performDeletion(&auxAddress, DROP_RESTRICT,
+							 PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
 
 	/*
 	 * Last thing to do is release the session-level lock on the parent table.
@@ -3598,6 +3710,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	typedef struct ReindexIndexInfo
 	{
 		Oid			indexId;
+		Oid			auxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -3703,8 +3816,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
 														   ShareUpdateExclusiveLock);
+					IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-					if (!indexRelation->rd_index->indisvalid)
+
+					if (indexInfo->ii_Auxiliary)
+						ereport(WARNING,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+									get_namespace_name(get_rel_namespace(cellOid)),
+									get_rel_name(cellOid))));
+					else if (!indexRelation->rd_index->indisvalid)
 						ereport(WARNING,
 								(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 								 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3756,8 +3876,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
 															   ShareUpdateExclusiveLock);
+						IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-						if (!indexRelation->rd_index->indisvalid)
+						if (indexInfo->ii_Auxiliary)
+							ereport(WARNING,
+									(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+									 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+											get_namespace_name(get_rel_namespace(cellOid)),
+											get_rel_name(cellOid))));
+						else if (!indexRelation->rd_index->indisvalid)
 							ereport(WARNING,
 									(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 									 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3818,6 +3945,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot reindex invalid index on TOAST table")));
 
+				/* Auxiliary indexes are not allowed to be rebuilt */
+				if (get_rel_relam(relationOid) == STIR_AM_OID)
+					ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("reindex of auxiliary index \"%s\" not supported",
+								get_rel_name(relationOid))));
+
 				/*
 				 * Check if parent relation can be locked and if it exists,
 				 * this needs to be done at this stage as the list of indexes
@@ -3921,15 +4055,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
+		char	   *auxConcurrentName;
 		ReindexIndexInfo *idx = lfirst(lc);
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
+		Oid			auxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
 		int			save_sec_context;
 		int			save_nestlevel;
 		Relation	newIndexRel;
+		Relation	auxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -3980,6 +4117,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 											"ccnew",
 											get_rel_namespace(indexRel->rd_index->indrelid),
 											false);
+		auxConcurrentName = ChooseRelationName(get_rel_name(idx->indexId),
+											NULL,
+											"ccaux",
+											get_rel_namespace(indexRel->rd_index->indrelid),
+											false);
 
 		/* Choose the new tablespace, indexes of toast tables are not moved */
 		if (OidIsValid(params->tablespaceOid) &&
@@ -3993,12 +4135,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 													idx->indexId,
 													tablespaceid,
 													concurrentName);
+		auxIndexId = index_concurrently_create_aux(heapRel,
+												   newIndexId,
+												   tablespaceid,
+												   auxConcurrentName);
 
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
+		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4007,6 +4154,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
+		newidx->auxIndexId = auxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4025,10 +4173,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		lockrelid = palloc_object(LockRelId);
+		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
+		relationLocks = lappend(relationLocks, lockrelid);
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
 		/* Roll back any GUC changes executed by index functions */
@@ -4109,13 +4261,60 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * doing that, wait until no running transactions could have the table of
 	 * the index open with the old list of indexes.  See "phase 2" in
 	 * DefineIndex() for more details.
+	*/
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * Now build all auxiliary indexes and mark them as "ready-for-inserts".
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		StartTransactionCommand();
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/* Set ActiveSnapshot since functions in the indexes may need it */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Build auxiliary index, it is fast - without any actual heap scan, just an empty index. */
+		index_concurrently_build(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	StartTransactionCommand();
+
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Wait until all auxiliary indexes are taken into account by all
+	 * transactions.
+	 */
 	WaitForLockersMultiple(lockTags, ShareLock, true);
 	CommitTransactionCommand();
 
+	/* Now it is time to perform target index build. */
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4162,6 +4361,41 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * need to set the PROC_IN_SAFE_IC flag here.
 	 */
 
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * At this moment all target indexes are marked as "ready-to-insert". So,
+	 * we are free to start process of dropping auxiliary indexes.
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+		StartTransactionCommand();
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		PopActiveSnapshot();
+
+		CommitTransactionCommand();
+	}
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -4169,12 +4403,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * were created during the previous phase.  See "phase 3" in DefineIndex()
 	 * for more details.
 	 */
-
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
-	WaitForLockersMultiple(lockTags, ShareLock, true);
-	CommitTransactionCommand();
-
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4212,7 +4440,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -4241,7 +4469,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * there's no need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-									 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+									 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 		WaitForOlderSnapshots(limitXmin, true);
 
 		CommitTransactionCommand();
@@ -4332,14 +4560,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
-	 * Mark the old indexes as dead.  First we must wait until no running
-	 * transaction could be using the index for a query.  See also
+	 * Mark the old and auxiliary indexes as dead. First we must wait until no
+	 * running transaction could be using the index for a query.  See also
 	 * index_drop() for more details.
 	 */
 
 	INJECTION_POINT("reindex-relation-concurrently-before-set-dead", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	foreach(lc, indexIds)
@@ -4364,6 +4592,28 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PopActiveSnapshot();
 	}
 
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+	}
+
 	/* Commit this transaction to make the updates visible. */
 	CommitTransactionCommand();
 	StartTransactionCommand();
@@ -4377,11 +4627,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
-	 * Drop the old indexes.
+	 * Drop the old and auxiliary indexes.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_6);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -4401,6 +4651,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			add_exact_object_address(&object, objects);
 		}
 
+		foreach(lc, newIndexIds)
+		{
+			ReindexIndexInfo *idx = lfirst(lc);
+			ObjectAddress object;
+
+			object.classId = RelationRelationId;
+			object.objectId = idx->auxIndexId;
+			object.objectSubId = 0;
+
+			add_exact_object_address(&object, objects);
+		}
+
 		/*
 		 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
 		 * right lock level.
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 5359dab1176..84f7cf9824e 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, bool auxiliary)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -850,6 +850,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Concurrent = concurrent;
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
+	n->ii_Auxiliary = auxiliary;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
@@ -875,7 +876,6 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
-	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 0a862693fcd..a80ee4fb03f 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -631,6 +631,15 @@
   boot_val => 'DEFAULT_ASSERT_ENABLED',
 },
 
+{ name => 'debug_cic_validate_store_mem_pct', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Percentage of maintenance_work_mem used for CIC validation tuplestore.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'debug_cic_validate_store_mem_pct',
+  boot_val => '10',
+  min => '1',
+  max => '90',
+},
+
 { name => 'debug_copy_parse_plan_trees', type => 'bool', context => 'PGC_SUSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Set this to force all parse and plan trees to be passed through copyObject(), to facilitate catching errors and omissions in copyObject().',
   flags => 'GUC_NOT_IN_SAMPLE',
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..1a997537800 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -705,7 +705,8 @@ typedef struct TableAmRoutine
 										Relation index_rel,
 										IndexInfo *index_info,
 										Snapshot snapshot,
-										ValidateIndexState *state);
+										ValidateIndexState *state,
+										ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1824,19 +1825,24 @@ table_index_build_range_scan(Relation table_rel,
  * table_index_validate_scan - second table scan for concurrent index build
  *
  * See validate_index() for an explanation.
+ *
+ * Note: it is responsibility of that function to close sortstates in
+ * both `state` and `auxstate`.
  */
 static inline void
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
 						  Snapshot snapshot,
-						  ValidateIndexState *state)
+						  ValidateIndexState *state,
+						  ValidateIndexState *auxstate)
 {
 	table_rel->rd_tableam->index_validate_scan(table_rel,
 											   index_rel,
 											   index_info,
 											   snapshot,
-											   state);
+											   state,
+											   auxstate);
 }
 
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 36b70689254..727993d1a5a 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -31,6 +31,7 @@ typedef enum
 {
 	INDEX_CREATE_SET_READY,
 	INDEX_CREATE_SET_VALID,
+	INDEX_DROP_CLEAR_READY,
 	INDEX_DROP_CLEAR_VALID,
 	INDEX_DROP_SET_DEAD,
 } IndexStateFlagsAction;
@@ -71,6 +72,7 @@ extern void index_check_primary_key(Relation heapRel,
 #define	INDEX_CREATE_IF_NOT_EXISTS			(1 << 4)
 #define	INDEX_CREATE_PARTITIONED			(1 << 5)
 #define INDEX_CREATE_INVALID				(1 << 6)
+#define INDEX_CREATE_AUXILIARY				(1 << 7)
 
 extern Oid	index_create(Relation heapRelation,
 						 const char *indexRelationName,
@@ -106,6 +108,11 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid tablespaceOid,
 										   const char *newName);
 
+extern Oid	index_concurrently_create_aux(Relation heapRelation,
+										  Oid mainIndexId,
+										  Oid tablespaceOid,
+										  const char *newName);
+
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
 
@@ -151,7 +158,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
+extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 9c40772706c..8e5f98c6fad 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -117,14 +117,15 @@
 
 /* Phases of CREATE INDEX (as advertised via PROGRESS_CREATEIDX_PHASE) */
 #define PROGRESS_CREATEIDX_PHASE_WAIT_1			1
-#define PROGRESS_CREATEIDX_PHASE_BUILD			2
-#define PROGRESS_CREATEIDX_PHASE_WAIT_2			3
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	4
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		5
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN	6
-#define PROGRESS_CREATEIDX_PHASE_WAIT_3			7
+#define PROGRESS_CREATEIDX_PHASE_WAIT_2			2
+#define PROGRESS_CREATEIDX_PHASE_BUILD			3
+#define PROGRESS_CREATEIDX_PHASE_WAIT_3			4
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	5
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		6
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE	7
 #define PROGRESS_CREATEIDX_PHASE_WAIT_4			8
 #define PROGRESS_CREATEIDX_PHASE_WAIT_5			9
+#define PROGRESS_CREATEIDX_PHASE_WAIT_6			10
 
 /*
  * Subphases of CREATE INDEX, for index_build.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f16f35659b9..f4f4aa19963 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -268,6 +268,7 @@ extern PGDLLIMPORT bool allowSystemTableMods;
 extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
+extern PGDLLIMPORT int debug_cic_validate_store_mem_pct;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
 
 /*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index bf54d39feb0..cd7f1eb0592 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,8 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								bool auxiliary);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 55538c4c41e..d1723f47e89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1437,6 +1437,7 @@ DETAIL:  Key (f1)=(b) already exists.
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
 ERROR:  could not create unique index "concur_index3"
 DETAIL:  Key (f2)=(b) is duplicated.
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -3211,6 +3212,7 @@ INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
 ERROR:  could not create unique index "concur_reindex_ind5"
 DETAIL:  Key (c1)=(1) is duplicated.
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
@@ -3223,8 +3225,10 @@ DETAIL:  Key (c1)=(1) is duplicated.
  c1     | integer |           |          | 
 Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1) INVALID
+    "concur_reindex_ind5_ccaux" stir (c1) INVALID
     "concur_reindex_ind5_ccnew" UNIQUE, btree (c1) INVALID
 
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -3252,6 +3256,44 @@ Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1)
 
 DROP TABLE concur_reindex_tab4;
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+ERROR:  relation "concur_reindex_tab4" does not exist
+LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+                    ^
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
+HINT:  Use DROP INDEX or REINDEX INDEX.
+WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index f50868ca6a6..b34009f868c 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1585,10 +1585,11 @@ select indexrelid::regclass, indisvalid,
 --------------------------------+------------+-----------------------+-------------------------------
  parted_isvalid_idx             | f          | parted_isvalid_tab    | 
  parted_isvalid_idx_11          | f          | parted_isvalid_tab_11 | parted_isvalid_tab_1_expr_idx
+ parted_isvalid_idx_11_ccaux    | f          | parted_isvalid_tab_11 | 
  parted_isvalid_tab_12_expr_idx | t          | parted_isvalid_tab_12 | parted_isvalid_tab_1_expr_idx
  parted_isvalid_tab_1_expr_idx  | f          | parted_isvalid_tab_1  | parted_isvalid_idx
  parted_isvalid_tab_2_expr_idx  | t          | parted_isvalid_tab_2  | parted_isvalid_idx
-(5 rows)
+(6 rows)
 
 drop table parted_isvalid_tab;
 -- Check state of replica indexes when attaching a partition.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2b3cf6d8569..b01fa1e61e3 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2064,14 +2064,15 @@ pg_stat_progress_create_index| SELECT s.pid,
         CASE s.param10
             WHEN 0 THEN 'initializing'::text
             WHEN 1 THEN 'waiting for writers before build'::text
-            WHEN 2 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
-            WHEN 3 THEN 'waiting for writers before validation'::text
-            WHEN 4 THEN 'index validation: scanning index'::text
-            WHEN 5 THEN 'index validation: sorting tuples'::text
-            WHEN 6 THEN 'index validation: scanning table'::text
-            WHEN 7 THEN 'waiting for old snapshots'::text
-            WHEN 8 THEN 'waiting for readers before marking dead'::text
-            WHEN 9 THEN 'waiting for readers before dropping'::text
+            WHEN 2 THEN 'waiting for writers to use auxiliary index'::text
+            WHEN 3 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
+            WHEN 4 THEN 'waiting for writers before validation'::text
+            WHEN 5 THEN 'index validation: scanning index'::text
+            WHEN 6 THEN 'index validation: sorting tuples'::text
+            WHEN 7 THEN 'index validation: merging indexes'::text
+            WHEN 8 THEN 'waiting for old snapshots'::text
+            WHEN 9 THEN 'waiting for readers before marking dead'::text
+            WHEN 10 THEN 'waiting for readers before dropping'::text
             ELSE NULL::text
         END AS phase,
     s.param4 AS lockers_total,
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 82e4062a215..c2c1b031527 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -503,6 +503,7 @@ CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS concur_index2 ON concur_heap(f1);
 INSERT INTO concur_heap VALUES ('b','x');
 -- check if constraint is enforced properly at build time
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -1315,10 +1316,12 @@ CREATE TABLE concur_reindex_tab4 (c1 int);
 INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 -- This trick creates an invalid index.
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -1330,6 +1333,24 @@ REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
 DROP TABLE concur_reindex_tab4;
 
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+\d aux_index_tab5
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+DROP TABLE aux_index_tab5;
+
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
-- 
2.43.0



  [application/octet-stream] v32-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch (21.0K, 6-v32-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch)
  download | inline diff:
From 69264c56ed0ecdac67080215896033dd4767df25 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 12 Jan 2026 00:57:56 +0300
Subject: [PATCH v32 3/7] Add Datum storage support to tuplestore Extend
 tuplestore to store individual Datum values

This support enables usages of tuplestore for non-tuple data (TIDs) in the next commit.
---
 src/backend/utils/sort/tuplestore.c | 367 +++++++++++++++++++++++-----
 src/include/utils/tuplestore.h      |  33 +--
 2 files changed, 327 insertions(+), 73 deletions(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index caad7cad0b4..132ecf22088 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1,16 +1,19 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.c
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
+ *
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
  * Also, it is possible to support multiple independent read pointers.
  *
  * A temporary file is used to handle the data if it exceeds the
@@ -61,6 +64,8 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "storage/buffile.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
 #include "utils/tuplestore.h"
@@ -116,16 +121,15 @@ struct Tuplestorestate
 	BufFile    *myfile;			/* underlying file, or NULL if none */
 	MemoryContext context;		/* memory context for holding tuples */
 	ResourceOwner resowner;		/* resowner for holding temp files */
+	Oid			datumType;		/* InvalidOid or oid of Datum's to be stored */
+	int16		datumTypeLen;	/* typelen of that Datum */
+	bool		datumTypeByVal; /* by-value of that Datum */
 
 	/*
 	 * These function pointers decouple the routines that must know what kind
 	 * of tuple we are handling from the routines that don't need to know it.
 	 * They are set up by the tuplestore_begin_xxx routines.
 	 *
-	 * (Although tuplestore.c currently only supports heap tuples, I've copied
-	 * this part of tuplesort.c so that extension to other kinds of objects
-	 * will be easy if it's ever needed.)
-	 *
 	 * Function to copy a supplied input tuple into palloc'd space. (NB: we
 	 * assume that a single pfree() is enough to release the tuple later, so
 	 * the representation must be "flat" in one palloc chunk.) state->availMem
@@ -150,6 +154,12 @@ struct Tuplestorestate
 	 */
 	void	   *(*readtup) (Tuplestorestate *state, unsigned int len);
 
+	/*
+	 * Function to get length of tuple from tape. Used to provide 'len' argument
+	 * for readtup (see above).
+	 */
+	unsigned int(*lentup) (Tuplestorestate *state, bool eofOK);
+
 	/*
 	 * This array holds pointers to tuples in memory if we are in state INMEM.
 	 * In states WRITEFILE and READFILE it's not used.
@@ -186,6 +196,7 @@ struct Tuplestorestate
 #define COPYTUP(state,tup)	((*(state)->copytup) (state, tup))
 #define WRITETUP(state,tup) ((*(state)->writetup) (state, tup))
 #define READTUP(state,len)	((*(state)->readtup) (state, len))
+#define LENTUP(state,eofOK)	((*(state)->lentup) (state, eofOK))
 #define LACKMEM(state)		((state)->availMem < 0)
 #define USEMEM(state,amt)	((state)->availMem -= (amt))
 #define FREEMEM(state,amt)	((state)->availMem += (amt))
@@ -194,9 +205,9 @@ struct Tuplestorestate
  *
  * NOTES about on-tape representation of tuples:
  *
- * We require the first "unsigned int" of a stored tuple to be the total size
- * on-tape of the tuple, including itself (so it is never zero).
- * The remainder of the stored tuple
+ * In case of tuples we use first "unsigned int" of a stored tuple
+ * to be the total size on-tape of the tuple, including itself
+ * (so it is never zero). The remainder of the stored tuple
  * may or may not match the in-memory representation of the tuple ---
  * any conversion needed is the job of the writetup and readtup routines.
  *
@@ -207,10 +218,13 @@ struct Tuplestorestate
  * state->backward is not set, the write/read routines may omit the extra
  * length word.
  *
- * writetup is expected to write both length words as well as the tuple
+ * In the case of Datum with constant length, both "unsigned int" are omitted.
+ *
+ * writetup is expected to write both length words and the tuple
  * data.  When readtup is called, the tape is positioned just after the
- * front length word; readtup must read the tuple data and advance past
- * the back length word (if present).
+ * front length word (if it is not omitted like in case of content-size Datum);
+ * readtup must read the tuple data and advance past the back length word
+ * (if present).
  *
  * The write/read routines can make use of the tuple description data
  * stored in the Tuplestorestate record, if needed. They are also expected
@@ -242,11 +256,16 @@ static Tuplestorestate *tuplestore_begin_common(int eflags,
 static void tuplestore_puttuple_common(Tuplestorestate *state, void *tuple);
 static void dumptuples(Tuplestorestate *state);
 static void tuplestore_updatemax(Tuplestorestate *state);
-static unsigned int getlen(Tuplestorestate *state, bool eofOK);
+
+static unsigned int lentup_heap(Tuplestorestate *state, bool eofOK);
 static void *copytup_heap(Tuplestorestate *state, void *tup);
 static void writetup_heap(Tuplestorestate *state, void *tup);
 static void *readtup_heap(Tuplestorestate *state, unsigned int len);
 
+static unsigned int lentup_datum(Tuplestorestate *state, bool eofOK);
+static void *copytup_datum(Tuplestorestate *state, void *datum);
+static void writetup_datum(Tuplestorestate *state, void *datum);
+static void *readtup_datum(Tuplestorestate *state, unsigned int len);
 
 /*
  *		tuplestore_begin_xxx
@@ -269,6 +288,12 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
 	state->allowedMem = maxKBytes * (int64) 1024;
 	state->availMem = state->allowedMem;
 	state->myfile = NULL;
+	/*
+	 * Set Datum related data to invalid by default.
+	 */
+	state->datumType = InvalidOid;
+	state->datumTypeLen = 0;
+	state->datumTypeByVal = false;
 
 	/*
 	 * The palloc/pfree pattern for tuple memory is in a FIFO pattern.  A
@@ -346,6 +371,37 @@ tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
 	state->copytup = copytup_heap;
 	state->writetup = writetup_heap;
 	state->readtup = readtup_heap;
+	state->lentup = lentup_heap;
+
+	return state;
+}
+
+/*
+ * The same as tuplestore_begin_heap but create store for Datum values.
+ */
+Tuplestorestate *
+tuplestore_begin_datum(Oid datumType, bool randomAccess, bool interXact, int maxKBytes)
+{
+	Tuplestorestate *state;
+	int			eflags;
+
+	/*
+	 * This interpretation of the meaning of randomAccess is compatible with
+	 * the pre-8.3 behavior of tuplestores.
+	 */
+	eflags = randomAccess ?
+		(EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND) :
+		(EXEC_FLAG_REWIND);
+
+	state = tuplestore_begin_common(eflags, interXact, maxKBytes);
+	state->datumType = datumType;
+	get_typlenbyval(state->datumType, &state->datumTypeLen, &state->datumTypeByVal);
+	Assert(!(state->datumTypeByVal && randomAccess));
+
+	state->copytup = copytup_datum;
+	state->writetup = writetup_datum;
+	state->readtup = readtup_datum;
+	state->lentup = lentup_datum;
 
 	return state;
 }
@@ -444,16 +500,19 @@ tuplestore_clear(Tuplestorestate *state)
 	{
 		int64		availMem = state->availMem;
 
-		/*
-		 * Below, we reset the memory context for storing tuples.  To save
-		 * from having to always call GetMemoryChunkSpace() on all stored
-		 * tuples, we adjust the availMem to forget all the tuples and just
-		 * recall USEMEM for the space used by the memtuples array.  Here we
-		 * just Assert that's correct and the memory tracking hasn't gone
-		 * wrong anywhere.
-		 */
-		for (i = state->memtupdeleted; i < state->memtupcount; i++)
-			availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			/*
+			 * Below, we reset the memory context for storing tuples.  To save
+			 * from having to always call GetMemoryChunkSpace() on all stored
+			 * tuples, we adjust the availMem to forget all the tuples and just
+			 * recall USEMEM for the space used by the memtuples array.  Here we
+			 * just Assert that's correct and the memory tracking hasn't gone
+			 * wrong anywhere.
+			 */
+			for (i = state->memtupdeleted; i < state->memtupcount; i++)
+				availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		}
 
 		availMem += GetMemoryChunkSpace(state->memtuples);
 
@@ -777,6 +836,25 @@ tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple)
 	MemoryContextSwitchTo(oldcxt);
 }
 
+/*
+ * Like tuplestore_puttupleslot but for single Datum.
+ */
+void
+tuplestore_putdatum(Tuplestorestate *state, Datum datum)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->context);
+
+	/*
+	 * Copy the Datum.  (Must do this even in WRITEFILE case.  Note that
+	 * COPYTUP includes USEMEM, so we needn't do that here.)
+	 */
+	datum = PointerGetDatum(COPYTUP(state, DatumGetPointer(datum)));
+
+	tuplestore_puttuple_common(state, DatumGetPointer(datum));
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
 /*
  * Similar to tuplestore_puttuple(), but work from values + nulls arrays.
  * This avoids an extra tuple-construction operation.
@@ -1028,10 +1106,10 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 			pg_fallthrough;
 
 		case TSS_READFILE:
-			*should_free = true;
+			*should_free = !state->datumTypeByVal;
 			if (forward)
 			{
-				if ((tuplen = getlen(state, true)) != 0)
+				if ((tuplen = LENTUP(state, true)) != 0)
 				{
 					tup = READTUP(state, tuplen);
 					return tup;
@@ -1043,6 +1121,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				}
 			}
 
+			Assert(!state->datumTypeByVal);
 			/*
 			 * Backward.
 			 *
@@ -1060,7 +1139,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				Assert(!state->truncated);
 				return NULL;
 			}
-			tuplen = getlen(state, false);
+			tuplen = LENTUP(state, false);
 
 			if (readptr->eof_reached)
 			{
@@ -1091,7 +1170,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 					Assert(!state->truncated);
 					return NULL;
 				}
-				tuplen = getlen(state, false);
+				tuplen = LENTUP(state, false);
 			}
 
 			/*
@@ -1153,6 +1232,41 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 	}
 }
 
+bool
+tuplestore_getdatum(Tuplestorestate *state, bool forward,
+					bool *should_free, Datum *result)
+{
+	Datum datum;
+	*should_free = false;
+
+	datum = (Datum) tuplestore_gettuple(state, forward, should_free);
+
+	/* For by-value datum we may receive zero as valid value. */
+	if (state->datumTypeByVal)
+	{
+		/* Return false only on EOF */
+		if (state->readptrs[state->activeptr].eof_reached)
+		{
+			*result = PointerGetDatum(NULL);
+			return false;
+		}
+
+		*result = datum;
+		return true;
+	}
+
+	if (datum)
+	{
+		*result = datum;
+		return true;
+	}
+	else
+	{
+		*result = PointerGetDatum(NULL);
+		return false;
+	}
+}
+
 /*
  * tuplestore_gettupleslot_force - exported function to fetch a tuple
  *
@@ -1205,10 +1319,20 @@ tuplestore_advance(Tuplestorestate *state, bool forward)
 			pfree(tuple);
 		return true;
 	}
-	else
+
+	/*
+	 * A NULL return normally means end-of-data, but for by-value datum
+	 * stores a valid zero-valued datum (e.g., false, 0) is indistinguishable
+	 * from NULL via pointer check.  Use eof_reached to distinguish.
+	 */
+	if (state->datumTypeByVal)
 	{
-		return false;
+		TSReadPointer *readptr = &state->readptrs[state->activeptr];
+
+		return !readptr->eof_reached;
 	}
+
+	return false;
 }
 
 /*
@@ -1271,7 +1395,13 @@ tuplestore_skiptuples(Tuplestorestate *state, int64 ntuples, bool forward)
 				tuple = tuplestore_gettuple(state, forward, &should_free);
 
 				if (tuple == NULL)
-					return false;
+				{
+					/* See tuplestore_advance for why pointer check is insufficient */
+					if (!state->datumTypeByVal ||
+						state->readptrs[state->activeptr].eof_reached)
+						return false;
+					continue;
+				}
 				if (should_free)
 					pfree(tuple);
 				CHECK_FOR_INTERRUPTS();
@@ -1493,8 +1623,11 @@ tuplestore_trim(Tuplestorestate *state)
 	/* Release no-longer-needed tuples */
 	for (i = state->memtupdeleted; i < nremove; i++)
 	{
-		FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
-		pfree(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
+			pfree(state->memtuples[i]);
+		}
 		state->memtuples[i] = NULL;
 	}
 	state->memtupdeleted = nremove;
@@ -1589,25 +1722,6 @@ tuplestore_in_memory(Tuplestorestate *state)
 	return (state->status == TSS_INMEM);
 }
 
-
-/*
- * Tape interface routines
- */
-
-static unsigned int
-getlen(Tuplestorestate *state, bool eofOK)
-{
-	unsigned int len;
-	size_t		nbytes;
-
-	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
-	if (nbytes == 0)
-		return 0;
-	else
-		return len;
-}
-
-
 /*
  * Routines specialized for HeapTuple case
  *
@@ -1618,6 +1732,19 @@ getlen(Tuplestorestate *state, bool eofOK)
  * to write that separately.
  */
 
+static unsigned int
+lentup_heap(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	else
+		return len;
+}
+
 static void *
 copytup_heap(Tuplestorestate *state, void *tup)
 {
@@ -1664,3 +1791,127 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 		BufFileReadExact(state->myfile, &tuplen, sizeof(tuplen));
 	return tuple;
 }
+
+/*
+ * Routines specialized for Datum case.
+ *
+ * Handles both fixed and variable-length Datums efficiently:
+ * - Fixed-length and Variable-length includes length prefix (and suffix if backward scan)
+ * - By-value types handled inline without extra copying, storing single extra byte
+ *   XXX: consider refactoring to avoid it, currently need it for correct rewind logic
+ */
+
+static unsigned int
+lentup_datum(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	Assert(state->datumType != InvalidOid);
+
+	if (state->datumTypeByVal)
+	{
+		uint8	junk;
+		nbytes = BufFileReadMaybeEOF(state->myfile, &junk, sizeof(uint8), eofOK);
+		if (nbytes == 0)
+			return 0;
+		Assert(junk == (uint8) state->datumTypeLen);
+		return state->datumTypeLen;
+	}
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	return len;
+}
+
+static void *
+copytup_datum(Tuplestorestate *state, void *datum)
+{
+	Datum d;
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+		return DatumGetPointer(PointerGetDatum(datum));
+
+	if (datum == NULL)
+		return NULL;
+
+	d = datumCopy(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+	USEMEM(state, GetMemoryChunkSpace(DatumGetPointer(d)));
+	return DatumGetPointer(d);
+}
+
+static void
+writetup_datum(Tuplestorestate *state, void *datum)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		uint8 junk = state->datumTypeLen; /* overflow is ok */
+		Datum v;
+		Assert(state->datumTypeLen > 0);
+
+		/* just marker byte used to track the end of data for rewind logic */
+		BufFileWrite(state->myfile, &junk, sizeof(junk));
+		store_att_byval(&v, PointerGetDatum(datum), state->datumTypeLen);
+		BufFileWrite(state->myfile, &v, state->datumTypeLen);
+		Assert(!state->backward);
+	}
+	else
+	{
+		unsigned int size;
+		unsigned int tuplen;
+
+		if (state->datumTypeLen < 0)
+			size = datumGetSize(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+		else
+			size = state->datumTypeLen;
+
+		/*
+		 * Include sizeof(unsigned int) in the stored length, matching the
+		 * convention used by writetup_heap.  The backward-scan seek
+		 * arithmetic in tuplestore_gettuple assumes this.
+		 */
+		tuplen = size + sizeof(unsigned int);
+		BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		BufFileWrite(state->myfile, datum, size);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		FREEMEM(state, GetMemoryChunkSpace(datum));
+		pfree(datum);
+	}
+}
+
+static void *
+readtup_datum(Tuplestorestate *state, unsigned int len)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		Datum datum = 0;
+
+		Assert(state->datumTypeLen > 0);
+		Assert(len == state->datumTypeLen);
+		BufFileReadExact(state->myfile, &datum, state->datumTypeLen);
+
+		Assert(!state->backward);
+		return DatumGetPointer(fetch_att(&datum, true, state->datumTypeLen));
+	}
+	else
+	{
+		unsigned int datalen = len - sizeof(unsigned int);
+		void *data = palloc(datalen);
+
+		BufFileReadExact(state->myfile, data, datalen);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileReadExact(state->myfile, &len, sizeof(len));
+
+		return data;
+	}
+}
diff --git a/src/include/utils/tuplestore.h b/src/include/utils/tuplestore.h
index f638b96e156..e16d9a3d352 100644
--- a/src/include/utils/tuplestore.h
+++ b/src/include/utils/tuplestore.h
@@ -1,17 +1,18 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.h
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
- * Also, it is possible to support multiple independent read pointers.
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
  * A temporary file is used to handle the data if it exceeds the
  * space limit specified by the caller.
@@ -39,14 +40,13 @@
  */
 typedef struct Tuplestorestate Tuplestorestate;
 
-/*
- * Currently we only need to store MinimalTuples, but it would be easy
- * to support the same behavior for IndexTuples and/or bare Datums.
- */
-
 extern Tuplestorestate *tuplestore_begin_heap(bool randomAccess,
 											  bool interXact,
 											  int maxKBytes);
+extern Tuplestorestate *tuplestore_begin_datum(Oid datumType,
+											   bool randomAccess,
+											   bool interXact,
+											   int maxKBytes);
 
 extern void tuplestore_set_eflags(Tuplestorestate *state, int eflags);
 
@@ -55,6 +55,7 @@ extern void tuplestore_puttupleslot(Tuplestorestate *state,
 extern void tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple);
 extern void tuplestore_putvalues(Tuplestorestate *state, TupleDesc tdesc,
 								 const Datum *values, const bool *isnull);
+extern void tuplestore_putdatum(Tuplestorestate *state, Datum datum);
 
 extern int	tuplestore_alloc_read_pointer(Tuplestorestate *state, int eflags);
 
@@ -72,6 +73,8 @@ extern bool tuplestore_in_memory(Tuplestorestate *state);
 
 extern bool tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 									bool copy, TupleTableSlot *slot);
+extern bool tuplestore_getdatum(Tuplestorestate *state, bool forward,
+								bool *should_free, Datum *result);
 
 extern bool tuplestore_gettupleslot_force(Tuplestorestate *state, bool forward,
 										  bool copy, TupleTableSlot *slot);
-- 
2.43.0



  [application/octet-stream] v32-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch (36.6K, 7-v32-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch)
  download | inline diff:
From 85db69b74e79ca165ae7db874a47ddd054ee77e6 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sun, 11 Jan 2026 19:27:52 +0300
Subject: [PATCH v32 2/7] Add STIR access method and flags related to auxiliary
 indexes

This patch provides infrastructure for following enhancements to concurrent index builds by:
- ii_Auxiliary in IndexInfo: indicates that an index is an auxiliary index used during concurrent index build
- validate_index in IndexVacuumInfo: set if index_bulk_delete called during the validation phase of concurrent index build
- STIR (Short-Term Index Replacement) access method is introduced, intended solely for short-lived, auxiliary usage

STIR functions are designed as an ephemeral helper during concurrent index builds, temporarily storing TIDs without providing the full features of a typical access method. As such, it raises warnings or errors when accessed outside its specialized usage path.

Planned to be used in following commits.
---
 contrib/pgstattuple/pgstattuple.c        |   3 +
 src/backend/access/Makefile              |   1 +
 src/backend/access/heap/vacuumlazy.c     |   2 +
 src/backend/access/meson.build           |   1 +
 src/backend/access/stir/Makefile         |  18 +
 src/backend/access/stir/meson.build      |   5 +
 src/backend/access/stir/stir.c           | 567 +++++++++++++++++++++++
 src/backend/catalog/index.c              |   1 +
 src/backend/catalog/toasting.c           |   1 +
 src/backend/commands/analyze.c           |   1 +
 src/backend/commands/vacuumparallel.c    |   1 +
 src/backend/nodes/makefuncs.c            |   1 +
 src/include/access/genam.h               |   1 +
 src/include/access/reloptions.h          |   3 +-
 src/include/access/stir.h                | 110 +++++
 src/include/catalog/pg_am.dat            |   3 +
 src/include/catalog/pg_opclass.dat       |   4 +
 src/include/catalog/pg_opfamily.dat      |   2 +
 src/include/catalog/pg_proc.dat          |   4 +
 src/include/nodes/execnodes.h            |   7 +-
 src/include/utils/index_selfuncs.h       |   8 +
 src/test/regress/expected/amutils.out    |   8 +-
 src/test/regress/expected/opr_sanity.out |   7 +-
 src/test/regress/expected/psql.out       |  24 +-
 24 files changed, 765 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/stir/Makefile
 create mode 100644 src/backend/access/stir/meson.build
 create mode 100644 src/backend/access/stir/stir.c
 create mode 100644 src/include/access/stir.h

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 6a7f8cb4a7c..5b5984e3aa2 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -285,6 +285,9 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 			case SPGIST_AM_OID:
 				err = "spgist index";
 				break;
+			case STIR_AM_OID:
+				err = "stir index";
+				break;
 			case BRIN_AM_OID:
 				err = "brin index";
 				break;
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index e88d72ea039..ebbcfa90715 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -19,6 +19,7 @@ SUBDIRS	    = \
 	nbtree \
 	rmgrdesc \
 	spgist \
+	stir \
 	sequence \
 	table \
 	tablesample \
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..339dfb21df7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3012,6 +3012,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
@@ -3063,6 +3064,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
diff --git a/src/backend/access/meson.build b/src/backend/access/meson.build
index 5fd18de74f9..7219c65f365 100644
--- a/src/backend/access/meson.build
+++ b/src/backend/access/meson.build
@@ -11,6 +11,7 @@ subdir('nbtree')
 subdir('rmgrdesc')
 subdir('sequence')
 subdir('spgist')
+subdir('stir')
 subdir('table')
 subdir('tablesample')
 subdir('transam')
diff --git a/src/backend/access/stir/Makefile b/src/backend/access/stir/Makefile
new file mode 100644
index 00000000000..8785dab37bd
--- /dev/null
+++ b/src/backend/access/stir/Makefile
@@ -0,0 +1,18 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/stir
+#
+# IDENTIFICATION
+#    src/backend/access/stir/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/stir
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	stir.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/stir/meson.build b/src/backend/access/stir/meson.build
new file mode 100644
index 00000000000..4b7ad15346c
--- /dev/null
+++ b/src/backend/access/stir/meson.build
@@ -0,0 +1,5 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+backend_sources += files(
+	'stir.c',
+)
diff --git a/src/backend/access/stir/stir.c b/src/backend/access/stir/stir.c
new file mode 100644
index 00000000000..932590d9ccb
--- /dev/null
+++ b/src/backend/access/stir/stir.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.c
+ *	  Implementation of Short-Term Index Replacement.
+ *
+ * STIR is a specialized access method type designed for temporary storage
+ * of TID values during concurrent index build operations.
+ *
+ * The typical lifecycle of a STIR index is:
+ * 1. created as an auxiliary index for CIC/RIC
+ * 2. accepts inserts for a period
+ * 3. stirbulkdelete called during index validation phase
+ * 4. gets dropped
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/stir/stir.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amvalidate.h"
+#include "access/htup_details.h"
+#include "access/stir.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_amop.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_opfamily.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "utils/catcache.h"
+#include "utils/fmgrprotos.h"
+#include "utils/index_selfuncs.h"
+#include "utils/memutils.h"
+#include "utils/regproc.h"
+#include "utils/syscache.h"
+
+/*
+ * Stir handler function: return IndexAmRoutine with access method parameters
+ * and callbacks.
+ */
+Datum
+stirhandler(PG_FUNCTION_ARGS)
+{
+	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
+
+	/* Set STIR-specific strategy and procedure numbers */
+	amroutine->amstrategies = STIR_NSTRATEGIES;
+	amroutine->amsupport = STIR_NPROC;
+	amroutine->amoptsprocnum = STIR_OPTIONS_PROC;
+
+	/* STIR doesn't support most index operations */
+	amroutine->amcanorder = false;
+	amroutine->amcanorderbyop = false;
+	amroutine->amcanbackward = false;
+	amroutine->amcanunique = false;
+	amroutine->amcanmulticol = true;
+	amroutine->amoptionalkey = true;
+	amroutine->amsearcharray = false;
+	amroutine->amsearchnulls = false;
+	amroutine->amstorage = false;
+	amroutine->amclusterable = false;
+	amroutine->ampredlocks = false;
+	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
+	amroutine->amcaninclude = true;
+	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amparallelvacuumoptions = VACUUM_OPTION_NO_PARALLEL;
+	amroutine->amkeytype = InvalidOid;
+
+	/* Set up function callbacks */
+	amroutine->ambuild = stirbuild;
+	amroutine->ambuildempty = stirbuildempty;
+	amroutine->aminsert = stirinsert;
+	amroutine->aminsertcleanup = NULL;
+	amroutine->ambulkdelete = stirbulkdelete;
+	amroutine->amvacuumcleanup = stirvacuumcleanup;
+	amroutine->amcanreturn = NULL;
+	amroutine->amcostestimate = stircostestimate;
+	amroutine->amoptions = stiroptions;
+	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
+	amroutine->amvalidate = stirvalidate;
+	amroutine->amadjustmembers = NULL;
+	amroutine->ambeginscan = stirbeginscan;
+	amroutine->amrescan = stirrescan;
+	amroutine->amgettuple = NULL;
+	amroutine->amgetbitmap = NULL;
+	amroutine->amendscan = stirendscan;
+	amroutine->ammarkpos = NULL;
+	amroutine->amrestrpos = NULL;
+	amroutine->amestimateparallelscan = NULL;
+	amroutine->aminitparallelscan = NULL;
+	amroutine->amparallelrescan = NULL;
+
+	PG_RETURN_POINTER(amroutine);
+}
+
+/*
+ * Validates operator class for STIR index.
+ *
+ * STIR is not a real index, so validate may be skipped.
+ * But we do it just for consistency.
+ */
+bool
+stirvalidate(Oid opclassoid)
+{
+	bool result = true;
+	HeapTuple classtup;
+	Form_pg_opclass classform;
+	Oid opfamilyoid;
+	HeapTuple familytup;
+	Form_pg_opfamily familyform;
+	char *opfamilyname;
+	CatCList *oprlist;
+	int i;
+
+	/* Fetch opclass information */
+	classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
+	if (!HeapTupleIsValid(classtup))
+		elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
+	classform = (Form_pg_opclass) GETSTRUCT(classtup);
+
+	opfamilyoid = classform->opcfamily;
+
+	/* Fetch opfamily information */
+	familytup = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfamilyoid));
+	if (!HeapTupleIsValid(familytup))
+		elog(ERROR, "cache lookup failed for operator family %u", opfamilyoid);
+	familyform = (Form_pg_opfamily) GETSTRUCT(familytup);
+
+	opfamilyname = NameStr(familyform->opfname);
+
+	/* Fetch all operators and support functions of the opfamily */
+	oprlist = SearchSysCacheList1(AMOPSTRATEGY, ObjectIdGetDatum(opfamilyoid));
+
+	/* Check individual operators */
+	for (i = 0; i < oprlist->n_members; i++)
+	{
+		HeapTuple oprtup = &oprlist->members[i]->tuple;
+		Form_pg_amop oprform = (Form_pg_amop) GETSTRUCT(oprtup);
+
+		/* Check it's allowed strategy for stir */
+		if (oprform->amopstrategy < 1 ||
+			oprform->amopstrategy > STIR_NSTRATEGIES)
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with invalid strategy number %d",
+					        opfamilyname,
+					        format_operator(oprform->amopopr),
+					        oprform->amopstrategy)));
+			result = false;
+		}
+
+		/* stir doesn't support ORDER BY operators */
+		if (oprform->amoppurpose != AMOP_SEARCH ||
+			OidIsValid(oprform->amopsortfamily))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains invalid ORDER BY specification for operator %s",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+
+		/* Check operator signature --- same for all stir strategies */
+		if (!check_amop_signature(oprform->amopopr, BOOLOID,
+		                          oprform->amoplefttype,
+		                          oprform->amoprighttype))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with wrong signature",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+	}
+
+	ReleaseCatCacheList(oprlist);
+	ReleaseSysCache(familytup);
+	ReleaseSysCache(classtup);
+
+	return result;
+}
+
+/*
+ * Initialize meta-page of a STIR index.
+ * The skipInserts flag determines if new inserts will be accepted or skipped.
+ */
+void
+StirFillMetapage(Relation index, Page metaPage, bool skipInserts)
+{
+	StirMetaPageData *metadata;
+
+	StirInitPage(metaPage, STIR_META);
+	metadata = StirPageGetMeta(metaPage);
+	memset(metadata, 0, sizeof(StirMetaPageData));
+	metadata->magicNumber = STIR_MAGIC_NUMBER;
+	metadata->skipInserts = skipInserts;
+	((PageHeader) metaPage)->pd_lower = ((char *) metadata + sizeof(StirMetaPageData)) - (char *) metaPage;
+}
+
+/*
+ * Create and initialize the metapage for a STIR index.
+ * This is called during index creation.
+ */
+void
+StirInitMetapage(Relation index, ForkNumber forknum)
+{
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	/*
+	 * Make a new page; since it is the first page it should be associated with
+	 * block number 0 (STIR_METAPAGE_BLKNO).  No need to hold the extension
+	 * lock because there cannot be concurrent inserters yet.
+	 */
+	metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	Assert(BufferGetBlockNumber(metaBuffer) == STIR_METAPAGE_BLKNO);
+
+	metaPage = BufferGetPage(metaBuffer);
+	StirFillMetapage(index, metaPage, forknum == INIT_FORKNUM);
+
+	MarkBufferDirty(metaBuffer);
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * Initialize any page of a stir index.
+ */
+void
+StirInitPage(Page page, uint16 flags)
+{
+	StirPageOpaque opaque;
+
+	PageInit(page, BLCKSZ, sizeof(StirPageOpaqueData));
+
+	opaque = StirPageGetOpaque(page);
+	opaque->flags = flags;
+	opaque->stir_page_id = STIR_PAGE_ID;
+}
+
+/*
+ * Add a tuple to a STIR page. Returns false if the tuple doesn't fit.
+ * The tuple is added to the end of the page.
+ */
+static bool
+StirPageAddItem(Page page, StirTuple *tuple)
+{
+	StirTuple *itup;
+	StirPageOpaque opaque;
+	char *ptr;
+
+	/* We shouldn't be pointed to an invalid page */
+	Assert(!PageIsNew(page));
+
+	/* Does the new tuple fit on the page? */
+	if (StirPageGetFreeSpace(page) < sizeof(StirTuple))
+		return false;
+
+	/* Copy a new tuple to the end of the page */
+	opaque = StirPageGetOpaque(page);
+	itup = StirPageGetTuple(page, opaque->maxoff + 1);
+	memcpy(itup, tuple, sizeof(StirTuple));
+
+	/* Adjust maxoff and pd_lower */
+	opaque->maxoff++;
+	ptr = (char *) StirPageGetTuple(page, opaque->maxoff + 1);
+	((PageHeader) page)->pd_lower = ptr - page;
+
+	/* Assert we didn't overrun available space */
+	Assert(((PageHeader) page)->pd_lower <= ((PageHeader) page)->pd_upper);
+	return true;
+}
+
+/*
+ * Insert a new tuple into a STIR index.
+ */
+bool
+stirinsert(Relation index, Datum *values, bool *isnull,
+		  ItemPointer ht_ctid, Relation heapRel,
+		  IndexUniqueCheck checkUnique,
+		  bool indexUnchanged,
+		  struct IndexInfo *indexInfo)
+{
+	StirTuple itup;
+	StirMetaPageData *metaData;
+	Buffer buffer,
+			metaBuffer;
+	Page page;
+	BlockNumber blkNo;
+
+	itup.heapPtr = *ht_ctid;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+
+	for (;;)
+	{
+		LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if inserts are allowed */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+		blkNo = metaData->lastBlkNo;
+		/* Don't hold metabuffer lock while doing insert */
+		LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+
+		if (blkNo > 0)
+		{
+			buffer = ReadBuffer(index, blkNo);
+			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+			page = BufferGetPage(buffer);
+
+			Assert(!PageIsNew(page));
+
+			/* Try to add tuple to the existing page */
+			if (StirPageAddItem(page, &itup))
+			{
+				/* Success!  Apply the change, clean up, and exit */
+				MarkBufferDirty(buffer);
+
+				UnlockReleaseBuffer(buffer);
+				ReleaseBuffer(metaBuffer);
+				return false;
+			}
+
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Need to add a new page - get exclusive lock on meta-page */
+		LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+
+		/* Re-check after acquiring exclusive lock */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+
+		/* Check if another backend already extended the index */
+		if (blkNo != metaData->lastBlkNo)
+		{
+			Assert(blkNo < metaData->lastBlkNo);
+			/* Someone else inserted the new page into the index, let's try again */
+			LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+			continue;
+		}
+		else
+		{
+			/* Must extend the file */
+			buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
+									   EB_LOCK_FIRST);
+			page = BufferGetPage(buffer);
+
+			StirInitPage(page, 0);
+
+			if (!StirPageAddItem(page, &itup))
+			{
+				/* We shouldn't be here since we're inserting to an empty page */
+				elog(ERROR, "could not add new stir tuple to empty page");
+			}
+
+			/* Update meta-page with new last block number */
+			metaData->lastBlkNo = BufferGetBlockNumber(buffer);
+
+			MarkBufferDirty(metaBuffer);
+			MarkBufferDirty(buffer);
+
+			UnlockReleaseBuffer(buffer);
+			UnlockReleaseBuffer(metaBuffer);
+
+			return false;
+		}
+	}
+}
+
+/*
+ * STIR doesn't support scans - these functions all error out
+ */
+IndexScanDesc
+stirbeginscan(Relation r, int nkeys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void
+stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+		  ScanKey orderbys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void stirendscan(IndexScanDesc scan)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+/*
+ * Build a STIR index - only allowed for auxiliary indexes.
+ * Just initializes the meta-page without any heap scans.
+ */
+IndexBuildResult *
+stirbuild(Relation heap, Relation index,
+						   struct IndexInfo *indexInfo)
+{
+	IndexBuildResult *result;
+
+	if (!indexInfo->ii_Auxiliary)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Building STIR indexes is not supported")));
+
+	StirInitMetapage(index, MAIN_FORKNUM);
+
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	result->heap_tuples = 0;
+	result->index_tuples = 0;
+	return result;
+}
+
+void stirbuildempty(Relation index)
+{
+	StirInitMetapage(index, INIT_FORKNUM);
+}
+
+IndexBulkDeleteResult *
+stirbulkdelete(IndexVacuumInfo *info,
+									 IndexBulkDeleteResult *stats,
+									 IndexBulkDeleteCallback callback,
+									 void *callback_state)
+{
+	Relation index = info->index;
+	BlockNumber blkno, npages;
+	Buffer buffer;
+	Page page;
+
+	/*
+	 * For normal VACUUM, mark to skip inserts and warn about an index drop
+	 * needed.  In practice this path is not reachable during CREATE INDEX
+	 * CONCURRENTLY because the table-level locks held by CIC prevent concurrent
+	 * VACUUM from opening the auxiliary index.  It can only be reached if a
+	 * leftover STIR index somehow survives after a failed CIC and a later
+	 * VACUUM encounters it.
+	 */
+	if (!info->validate_index)
+	{
+		StirMarkAsSkipInserts(index);
+
+		ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+		return NULL;
+	}
+
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	/*
+	 * Iterate over the pages. We don't care about concurrently added pages,
+	 * because the index is marked as not-ready for that moment and the index is not
+	 * used for insert.
+	 */
+	npages = RelationGetNumberOfBlocks(index);
+	for (blkno = STIR_HEAD_BLKNO; blkno < npages; blkno++)
+	{
+		StirTuple *itup, *itupEnd;
+
+		vacuum_delay_point(false);
+
+		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
+									RBM_NORMAL, info->strategy);
+
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+		page = BufferGetPage(buffer);
+
+		if (PageIsNew(page))
+		{
+			UnlockReleaseBuffer(buffer);
+			continue;
+		}
+
+		itup = StirPageGetTuple(page, FirstOffsetNumber);
+		itupEnd = StirPageGetTuple(page, OffsetNumberNext(StirPageGetMaxOffset(page)));
+		while (itup < itupEnd)
+		{
+			/* Do we have to delete this tuple? */
+			if (callback(&itup->heapPtr, callback_state))
+			{
+				ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("we never delete in stir")));
+			}
+
+			itup = StirPageGetNextTuple(itup);
+		}
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	return stats;
+}
+
+/*
+ * Mark a STIR index to skip future inserts
+ */
+void
+StirMarkAsSkipInserts(Relation index)
+{
+	StirMetaPageData *metaData;
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	metaPage = BufferGetPage(metaBuffer);
+	metaData = StirPageGetMeta(metaPage);
+
+	if (!metaData->skipInserts)
+	{
+		metaData->skipInserts = true;
+		MarkBufferDirty(metaBuffer);
+	}
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * As with stirbulkdelete, this is not reachable during a normal CIC due to
+ * table-level locking.  It serves as a safety net for leftover STIR indexes
+ * from failed concurrent index builds.
+ */
+IndexBulkDeleteResult *
+stirvacuumcleanup(IndexVacuumInfo *info,
+				  IndexBulkDeleteResult *stats)
+{
+	StirMarkAsSkipInserts(info->index);
+	ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+	return NULL;
+}
+
+bytea *
+stiroptions(Datum reloptions, bool validate)
+{
+	return NULL;
+}
+
+void
+stircostestimate(PlannerInfo *root, IndexPath *path,
+					 double loop_count, Cost *indexStartupCost,
+					 Cost *indexTotalCost, Selectivity *indexSelectivity,
+					 double *indexCorrelation, double *indexPages)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index d8219b18c48..2fc86ca9c5b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3412,6 +3412,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
+	ivinfo.validate_index = true;
 
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 078a1cf5127..c33e43df1ec 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -313,6 +313,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_ParallelWorkers = 0;
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
+	indexInfo->ii_Auxiliary = false;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index eeed91be266..1fbe70d187c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -726,6 +726,7 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 			ivinfo.message_level = elevel;
 			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
 			ivinfo.strategy = vac_strategy;
+			ivinfo.validate_index = false;
 
 			stats = index_vacuum_cleanup(&ivinfo, NULL);
 
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 77834b96a21..1671c3c2196 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -896,6 +896,7 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
 	ivinfo.estimated_count = pvs->shared->estimated_count;
 	ivinfo.num_heap_tuples = pvs->shared->reltuples;
 	ivinfo.strategy = pvs->bstrategy;
+	ivinfo.validate_index = false;
 
 	/* Update error traceback information */
 	pvs->indname = pstrdup(RelationGetRelationName(indrel));
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 3cd35c5c457..5359dab1176 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -875,6 +875,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
+	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..0356901ee10 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -58,6 +58,7 @@ typedef struct IndexVacuumInfo
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
 	double		num_heap_tuples;	/* tuples remaining in heap */
+	bool		validate_index; /* validating concurrently built index? */
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index c228147420a..1c7a4d17557 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -51,8 +51,9 @@ typedef enum relopt_kind
 	RELOPT_KIND_VIEW = (1 << 9),
 	RELOPT_KIND_BRIN = (1 << 10),
 	RELOPT_KIND_PARTITIONED = (1 << 11),
+	RELOPT_KIND_STIR = (1 << 12),
 	/* if you add a new kind, make sure you update "last_default" too */
-	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_STIR,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/access/stir.h b/src/include/access/stir.h
new file mode 100644
index 00000000000..b08cf4d4ef0
--- /dev/null
+++ b/src/include/access/stir.h
@@ -0,0 +1,110 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.h
+ *	  header file for postgres stir access method implementation.
+ *
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * src/include/access/stir.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STIR_H
+#define STIR_H
+
+#include "access/amapi.h"
+#include "nodes/pathnodes.h"
+#include "storage/bufpage.h"
+
+/* Support procedures numbers */
+#define STIR_NPROC				0
+
+/* Scan strategies */
+#define STIR_NSTRATEGIES		1
+
+#define STIR_OPTIONS_PROC				0
+
+/* Macros for accessing stir page structures */
+#define StirPageGetOpaque(page) ((StirPageOpaque) PageGetSpecialPointer(page))
+#define StirPageGetMaxOffset(page) (StirPageGetOpaque(page)->maxoff)
+#define StirPageIsMeta(page) \
+	((StirPageGetOpaque(page)->flags & STIR_META) != 0)
+#define StirPageGetTuple(page, offset) \
+	((StirTuple *)(PageGetContents(page) \
+		+ sizeof(StirTuple) * ((offset) - 1)))
+#define StirPageGetNextTuple(tuple) \
+	((StirTuple *)((char *)(tuple) + sizeof(StirTuple)))
+
+
+
+/* Preserved page numbers */
+#define STIR_METAPAGE_BLKNO	(0)
+#define STIR_HEAD_BLKNO		(1) /* first data page */
+
+
+/* Opaque for stir pages */
+typedef struct StirPageOpaqueData
+{
+	OffsetNumber maxoff;		/* number of index tuples on the page */
+	uint16		flags;			/* see bit definitions below */
+	uint16		stir_page_id;	/* for identification of STIR indexes */
+} StirPageOpaqueData;
+
+/* Stir page flags */
+#define STIR_META		(1<<0)
+
+typedef StirPageOpaqueData *StirPageOpaque;
+
+#define STIR_PAGE_ID		0xFF84
+
+/* Metadata of stir index */
+typedef struct StirMetaPageData
+{
+	uint32		magicNumber;
+	BlockNumber	lastBlkNo;
+	bool		skipInserts;	/* should we just exit without any inserts? */
+} StirMetaPageData;
+
+/* Magic number to distinguish stir pages from others */
+#define STIR_MAGIC_NUMBER (0xDBAC0DEF)
+
+#define StirPageGetMeta(page)	((StirMetaPageData *) PageGetContents(page))
+
+typedef struct StirTuple
+{
+	ItemPointerData heapPtr;
+} StirTuple;
+
+#define StirPageGetFreeSpace(page) \
+	(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+		- StirPageGetMaxOffset(page) * (sizeof(StirTuple)) \
+		- MAXALIGN(sizeof(StirPageOpaqueData)))
+
+extern void StirFillMetapage(Relation index, Page metaPage, bool skipInserts);
+extern void StirInitMetapage(Relation index, ForkNumber forknum);
+extern void StirInitPage(Page page, uint16 flags);
+extern void StirMarkAsSkipInserts(Relation index);
+
+/* index access method interface functions */
+extern bool stirvalidate(Oid opclassoid);
+extern bool stirinsert(Relation index, Datum *values, bool *isnull,
+					 ItemPointer ht_ctid, Relation heapRel,
+					 IndexUniqueCheck checkUnique,
+					 bool indexUnchanged,
+					 struct IndexInfo *indexInfo);
+extern IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys);
+extern void stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+					 ScanKey orderbys, int norderbys);
+extern void stirendscan(IndexScanDesc scan);
+extern IndexBuildResult *stirbuild(Relation heap, Relation index,
+								 struct IndexInfo *indexInfo);
+extern void stirbuildempty(Relation index);
+extern IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+										   IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+										   void *callback_state);
+extern IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+											  IndexBulkDeleteResult *stats);
+extern bytea *stiroptions(Datum reloptions, bool validate);
+
+#endif			/* STIR_H */
diff --git a/src/include/catalog/pg_am.dat b/src/include/catalog/pg_am.dat
index 46d361047fe..8bd2c2b46ba 100644
--- a/src/include/catalog/pg_am.dat
+++ b/src/include/catalog/pg_am.dat
@@ -33,5 +33,8 @@
 { oid => '3580', oid_symbol => 'BRIN_AM_OID',
   descr => 'block range index (BRIN) access method',
   amname => 'brin', amhandler => 'brinhandler', amtype => 'i' },
+{ oid => '5555', oid_symbol => 'STIR_AM_OID',
+  descr => 'short term index replacement access method',
+  amname => 'stir', amhandler => 'stirhandler', amtype => 'i' },
 
 ]
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index df170b80840..a3457e749db 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -492,4 +492,8 @@
 
 # no brin opclass for the geometric types except box
 
+# allow any types for STIR
+{ opcmethod => 'stir', oid_symbol => 'ANY_STIR_OPS_OID', opcname => 'stir_ops',
+  opcfamily => 'stir/any_ops', opcintype => 'any'},
+
 ]
diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat
index 7a027c4810e..6ffc20a061c 100644
--- a/src/include/catalog/pg_opfamily.dat
+++ b/src/include/catalog/pg_opfamily.dat
@@ -308,5 +308,7 @@
   opfmethod => 'hash', opfname => 'multirange_ops' },
 { oid => '6158',
   opfmethod => 'gist', opfname => 'multirange_ops' },
+{ oid => '5558',
+  opfmethod => 'stir', opfname => 'any_ops' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0118e970dda..9649995f812 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -935,6 +935,10 @@
   proname => 'brinhandler', provolatile => 'v',
   prorettype => 'index_am_handler', proargtypes => 'internal',
   prosrc => 'brinhandler' },
+{ oid => '5556', descr => 'short term index replacement access method handler',
+  proname => 'stirhandler', provolatile => 'v',
+  prorettype => 'index_am_handler', proargtypes => 'internal',
+  prosrc => 'stirhandler' },
 { oid => '3952', descr => 'brin: standalone scan new table pages',
   proname => 'brin_summarize_new_values', provolatile => 'v',
   proparallel => 'u', prorettype => 'int4', proargtypes => 'regclass',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 684e398f824..74efa237212 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -166,8 +166,8 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
- * during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
+ * are used only during index build; they're conventionally zeroed otherwise.
  * ----------------
  */
 typedef struct IndexInfo
@@ -227,7 +227,8 @@ typedef struct IndexInfo
 	bool		ii_WithoutOverlaps;
 	/* # of workers requested (excludes leader) */
 	int			ii_ParallelWorkers;
-
+	/* is auxiliary for concurrent index build? */
+	bool		ii_Auxiliary;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/utils/index_selfuncs.h b/src/include/utils/index_selfuncs.h
index 74793a1a19d..bf0e30dabe9 100644
--- a/src/include/utils/index_selfuncs.h
+++ b/src/include/utils/index_selfuncs.h
@@ -62,6 +62,14 @@ extern void spgcostestimate(struct PlannerInfo *root,
 							Selectivity *indexSelectivity,
 							double *indexCorrelation,
 							double *indexPages);
+extern void stircostestimate(struct PlannerInfo *root,
+							struct IndexPath *path,
+							double loop_count,
+							Cost *indexStartupCost,
+							Cost *indexTotalCost,
+							Selectivity *indexSelectivity,
+							double *indexCorrelation,
+							double *indexPages);
 extern void gincostestimate(struct PlannerInfo *root,
 							struct IndexPath *path,
 							double loop_count,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index 7ab6113c619..92c033a2010 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -173,7 +173,13 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
  spgist | can_exclude   | t
  spgist | can_include   | t
  spgist | bogus         | 
-(36 rows)
+ stir   | can_order     | f
+ stir   | can_unique    | f
+ stir   | can_multi_col | t
+ stir   | can_exclude   | f
+ stir   | can_include   | t
+ stir   | bogus         | 
+(42 rows)
 
 --
 -- additional checks for pg_index_column_has_property
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 6ff4d7ee901..9259679eea2 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2129,9 +2129,10 @@ FROM pg_opclass AS c1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS a1
                  WHERE a1.amopfamily = c1.opcfamily
                    AND binary_coercible(c1.opcintype, a1.amoplefttype));
- opcname | opcfamily 
----------+-----------
-(0 rows)
+ opcname  | opcfamily 
+----------+-----------
+ stir_ops |      5558
+(1 row)
 
 -- Check that each operator listed in pg_amop has an associated opclass,
 -- that is one whose opcintype matches oprleft (possibly by coercion).
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index c8f3932edf0..ecc2c2a6049 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5171,7 +5171,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA *
 List of access methods
@@ -5185,7 +5186,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA h*
 List of access methods
@@ -5210,9 +5212,9 @@ List of access methods
 
 \dA: extra argument "bar" ignored
 \dA+
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5221,12 +5223,13 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ *
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5235,7 +5238,8 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ h*
                      List of access methods
-- 
2.43.0



  [application/octet-stream] v32-0007-Refresh-snapshot-periodically-during-index-valid.patch (27.0K, 8-v32-0007-Refresh-snapshot-periodically-during-index-valid.patch)
  download | inline diff:
From 9cd374760b80ff1699c3dd16882ffb2263ac81a5 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 21 Apr 2025 14:11:53 +0200
Subject: [PATCH v32 7/7] Refresh snapshot periodically during index validation

Enhances validation phase of concurrently built indexes by periodically refreshing snapshots rather than using a single reference snapshot. This addresses issues with xmin propagation during long-running validations.

The validation now takes a fresh snapshot every few pages, allowing the xmin horizon to advance. This restores feature of commit d9d076222f5b, which was reverted in commit e28bb8851969. New STIR-based approach does not depend on single reference snapshot anymore.
---
 src/backend/access/heap/README.HOT         |  4 +-
 src/backend/access/heap/heapam_handler.c   | 77 +++++++++++++++++++++-
 src/backend/access/spgist/spgvacuum.c      | 12 +++-
 src/backend/catalog/index.c                | 73 +++++++++++++++-----
 src/backend/commands/indexcmds.c           | 50 ++------------
 src/backend/utils/misc/guc_parameters.dat  |  9 +++
 src/include/access/tableam.h               | 25 ++++---
 src/include/access/transam.h               | 15 +++++
 src/include/catalog/index.h                |  2 +-
 src/include/miscadmin.h                    |  1 +
 src/test/regress/expected/create_index.out |  3 +
 src/test/regress/sql/create_index.sql      |  4 ++
 12 files changed, 192 insertions(+), 83 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index b1c797517ee..382fe1723a5 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -401,12 +401,12 @@ live tuple.
 We mark the index open for inserts (but still not ready for reads) then
 we again wait for transactions which have the table open.  Then validate
 the index.  This searches for tuples missing from the index in auxiliary
-index, and inserts any missing ones if they are visible to reference snapshot.
+index, and inserts any missing ones if they are visible to a fresh snapshot.
 Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
-the second reference snapshot is finished.  This ensures that nobody is
+the latest used snapshot is finished.  This ensures that nobody is
 alive any longer who could need to see any tuples that might be missing
 from the index, as well as ensuring that no one can see any inconsistent
 rows in a broken HOT chain (the first condition is stronger than the
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 194ac75caa5..5f5431ba389 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -53,6 +53,9 @@
 /* GUC: percentage of maintenance_work_mem for CIC validation tuplestore */
 int			debug_cic_validate_store_mem_pct = 10;
 
+/* GUC: refresh snapshot every N pages during CIC validation (0 = disable) */
+int			debug_cic_validate_snapshot_pages = 4096;
+
 static void reform_and_rewrite_tuple(HeapTuple tuple,
                                      Relation OldHeap, Relation NewHeap,
                                      Datum *values, bool *isnull, RewriteState rwstate);
@@ -2026,24 +2029,35 @@ heapam_index_validate_scan_read_stream_next(
 	return result;
 }
 
-static void
+static TransactionId
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
-						   Snapshot snapshot,
 						   ValidateIndexState *state,
 						   ValidateIndexState *auxState)
 {
+	TransactionId limitXmin;
+
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Snapshot		snapshot;
 	TupleTableSlot  *slot;
 	EState			*estate;
 	ExprContext		*econtext;
 	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	int64			num_to_check;
+	int64			page_read_counter = 1; /* set to 1 to skip snapshot reset at start */
 	Tuplestorestate *tuples_for_check;
+
+	/*
+	 * Under REPEATABLE READ or SERIALIZABLE (possible via
+	 * default_transaction_isolation), GetLatestSnapshot() returns the
+	 * transaction-level snapshot and xmin stays pinned.  Periodic snapshot
+	 * refresh is pointless in that case, so skip it.
+	 */
+	bool		reset_snapshot = XactIsoLevel <= XACT_READ_COMMITTED;
 	ValidateIndexScanState callback_private_data;
 
 	Buffer buf;
@@ -2053,6 +2067,8 @@ heapam_index_validate_scan(Relation heapRelation,
 	/* Use a percentage of maintenance_work_mem for tuple store. */
 	int		store_work_mem_part = maintenance_work_mem * debug_cic_validate_store_mem_pct / 100;
 
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
@@ -2061,6 +2077,12 @@ heapam_index_validate_scan(Relation heapRelation,
 	 */
 	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
+	PopActiveSnapshot();
+	InvalidateCatalogSnapshot();
+
+	Assert(!reset_snapshot || !HaveRegisteredOrActiveSnapshot());
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+
 	/*
 	 * sanity checks
 	 */
@@ -2076,6 +2098,29 @@ heapam_index_validate_scan(Relation heapRelation,
 
 	state->tuplesort = auxState->tuplesort = NULL;
 
+	/*
+	 * Now take the first snapshot that will be used to filter candidate
+	 * tuples. We are going to replace it by newer snapshot every so often
+	 * to propagate horizon.
+	 *
+	 * Beware!  There might still be snapshots in use that treat some transaction
+	 * as in-progress that our temporary snapshot treats as committed.
+	 *
+	 * If such a recently-committed transaction deleted tuples in the table,
+	 * we will not include them in the index; yet those transactions which
+	 * see the deleting one as still-in-progress will expect such tuples to
+	 * be there once we mark the index as valid.
+	 *
+	 * We solve this by waiting for all endangered transactions to exit before
+	 * we mark the index as valid, for that reason limitXmin is supported.
+	 *
+	 * We also set ActiveSnapshot to this snap, since functions in indexes may
+	 * need a snapshot.
+	 */
+	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	PushActiveSnapshot(snapshot);
+	limitXmin = snapshot->xmin;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
@@ -2109,6 +2154,7 @@ heapam_index_validate_scan(Relation heapRelation,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		block_number = BufferGetBlockNumber(buf);
+		page_read_counter++;
 
 		i = 0;
 		while ((off = tuples[i]) != InvalidOffsetNumber)
@@ -2179,6 +2225,21 @@ heapam_index_validate_scan(Relation heapRelation,
 		}
 
 		ReleaseBuffer(buf);
+		if (reset_snapshot &&
+			debug_cic_validate_snapshot_pages > 0 &&
+			page_read_counter % debug_cic_validate_snapshot_pages == 0)
+		{
+			PopActiveSnapshot();
+			UnregisterSnapshot(snapshot);
+			/* to make sure we propagate xmin */
+			InvalidateCatalogSnapshot();
+			Assert(!TransactionIdIsValid(MyProc->xmin));
+
+			snapshot = RegisterSnapshot(GetLatestSnapshot());
+			PushActiveSnapshot(snapshot);
+			/* Advance limitXmin so we wait for all snapshots seen so far */
+			limitXmin = TransactionIdNewer(limitXmin, snapshot->xmin);
+		}
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -2188,11 +2249,23 @@ heapam_index_validate_scan(Relation heapRelation,
 	read_stream_end(read_stream);
 	tuplestore_end(tuples_for_check);
 
+	/*
+	 * Drop the latest snapshot.  We must do this before waiting out other
+	 * snapshot holders, else we will deadlock against other processes also
+	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
+	 * they must wait for.
+	 */
+	PopActiveSnapshot();
+	UnregisterSnapshot(snapshot);
+	InvalidateCatalogSnapshot();
+	Assert(!reset_snapshot || MyProc->xmin == InvalidTransactionId);
 	FreeAccessStrategy(bstrategy);
 
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
+
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index c461f8dc02d..ef192fb99c2 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -191,14 +191,16 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 			 * Add target TID to pending list if the redirection could have
 			 * happened since VACUUM started.  (If xid is invalid, assume it
 			 * must have happened before VACUUM started, since REINDEX
-			 * CONCURRENTLY locks out VACUUM.)
+			 * CONCURRENTLY locks out VACUUM, if myXmin is invalid it is
+			 * validation scan.)
 			 *
 			 * Note: we could make a tighter test by seeing if the xid is
 			 * "running" according to the active snapshot; but snapmgr.c
 			 * doesn't currently export a suitable API, and it's not entirely
 			 * clear that a tighter test is worth the cycles anyway.
 			 */
-			if (TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
+			if (!TransactionIdIsValid(bds->myXmin) ||
+					TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
 				spgAddPendingTID(bds, &dt->pointer);
 		}
 		else
@@ -808,7 +810,6 @@ spgvacuumscan(spgBulkDeleteState *bds)
 	/* Finish setting up spgBulkDeleteState */
 	initSpGistState(&bds->spgstate, index);
 	bds->pendingList = NULL;
-	bds->myXmin = GetActiveSnapshot()->xmin;
 	bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
 
 	/*
@@ -959,6 +960,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	bds.stats = stats;
 	bds.callback = callback;
 	bds.callback_state = callback_state;
+	if (info->validate_index)
+		bds.myXmin = InvalidTransactionId;
+	else
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 	spgvacuumscan(&bds);
 
@@ -999,6 +1004,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.stats = stats;
 		bds.callback = dummy_callback;
 		bds.callback_state = NULL;
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 		spgvacuumscan(&bds);
 	}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b8e4dfe88aa..5f8779426c7 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -69,6 +69,7 @@
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -3518,8 +3519,9 @@ IndexCheckExclusion(Relation heapRelation,
  * insert their new tuples into it. At the same moment we clear "indisready" for
  * auxiliary index, since it is no more required to be updated.
  *
- * We then take a new reference snapshot, any tuples that are valid according
- * to this snap, but are not in the index, must be added to the index.
+ * We then take a new snapshot, any tuples that are valid according
+ * to this snap, but are not in the index, must be added to the index. In
+ * order to propagate xmin we reset that snapshot every so often.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
  * the snap need not be indexed, because we will wait out all transactions
@@ -3532,7 +3534,7 @@ IndexCheckExclusion(Relation heapRelation,
  * TIDs of both auxiliary and target indexes, and doing a "merge join" against
  * the TID lists to see which tuples from auxiliary index are missing from the
  * target index.  Thus we will ensure that all tuples valid according to the
- * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * latest snapshot are in the index. Notice we need to do bulkdelete in the
  * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
@@ -3545,21 +3547,24 @@ IndexCheckExclusion(Relation heapRelation,
  * before it declares a uniqueness error.
  *
  * After completing validate_index(), we wait until all transactions that
- * were alive at the time of the reference snapshot are gone; this is
- * necessary to be sure there are none left with a transaction snapshot
- * older than the reference (and hence possibly able to see tuples we did
- * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
- * transactions will be able to use it for queries.
+ * were alive at the time of the latest snapshot used during validation are
+ * gone; this is necessary to be sure there are none left with a transaction
+ * snapshot older than that (and hence possibly able to see tuples we did
+ * not index).  The snapshot is periodically refreshed during the heap scan
+ * to propagate the xmin horizon, so limitXmin tracks the most recent one.
+ * Then we mark the index "indisvalid" and commit.  Subsequent transactions
+ * will be able to use it for queries.
  *
  * Also, some actions to concurrent drop the auxiliary index are performed.
  */
-void
-validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
+TransactionId
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId)
 {
 	Relation	heapRelation,
 				indexRelation,
 				auxIndexRelation;
 	IndexInfo  *indexInfo;
+	TransactionId limitXmin;
 	IndexVacuumInfo ivinfo, auxivinfo;
 	ValidateIndexState state, auxState;
 	Oid			save_userid;
@@ -3572,6 +3577,16 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
 	int			aux_work_mem_part = maintenance_work_mem / 10;
 
+	/*
+	 * Under REPEATABLE READ or SERIALIZABLE (possible via
+	 * default_transaction_isolation), GetLatestSnapshot() returns the
+	 * transaction-level snapshot and xmin stays pinned.  Periodic snapshot
+	 * refresh is pointless in that case, so skip it.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	bool		reset_snapshot = XactIsoLevel <= XACT_READ_COMMITTED;
+#endif
+
 	{
 		const int	progress_index[] = {
 			PROGRESS_CREATEIDX_PHASE,
@@ -3609,8 +3624,12 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	 * Fetch info needed for index_insert.  (You might think this should be
 	 * passed in from DefineIndex, but its copy is long gone due to having
 	 * been built in a previous transaction.)
+	 *
+	 * We might need snapshot for index expressions or predicates.
 	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	indexInfo = BuildIndexInfo(indexRelation);
+	PopActiveSnapshot();
 
 	/* mark build is concurrent just for consistency */
 	indexInfo->ii_Concurrent = true;
@@ -3646,6 +3665,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 										   NULL, TUPLESORT_NONE);
 	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	(void) index_bulk_delete(&auxivinfo, NULL,
 							 validate_index_callback, &auxState);
 	/* If aux index is empty, merge may be skipped */
@@ -3665,7 +3687,13 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		index_close(indexRelation, NoLock);
 		table_close(heapRelation, NoLock);
 
-		return;
+		PushActiveSnapshot(GetTransactionSnapshot());
+		limitXmin = GetActiveSnapshot()->xmin;
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+		return limitXmin;
 	}
 
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
@@ -3674,6 +3702,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	/* ambulkdelete updates progress metrics */
 	(void) index_bulk_delete(&ivinfo, NULL,
 							 validate_index_callback, &state);
@@ -3693,19 +3724,24 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	tuplesort_performsort(auxState.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
 
 	/*
 	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
-	table_index_validate_scan(heapRelation,
-							  indexRelation,
-							  indexInfo,
-							  snapshot,
-							  &state,
-							  &auxState);
+	limitXmin = table_index_validate_scan(heapRelation,
+										  indexRelation,
+										  indexInfo,
+										  &state,
+										  &auxState);
 
 	/* Tuple sort closed by table_index_validate_scan */
 	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
@@ -3728,6 +3764,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
+
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index de603d3ff83..bf8a1dbc35d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -596,7 +596,6 @@ DefineIndex(ParseState *pstate,
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -1816,32 +1815,11 @@ DefineIndex(ParseState *pstate,
 	/* Tell concurrent index builds to ignore us, if index qualifies */
 	if (safe_index)
 		set_indexsafe_procflags();
-
-	/*
-	 * Now take the "reference snapshot" that will be used by validate_index()
-	 * to filter candidate tuples.  Beware!  There might still be snapshots in
-	 * use that treat some transaction as in-progress that our reference
-	 * snapshot treats as committed.  If such a recently-committed transaction
-	 * deleted tuples in the table, we will not include them in the index; yet
-	 * those transactions which see the deleting one as still-in-progress will
-	 * expect such tuples to be there once we mark the index as valid.
-	 *
-	 * We solve this by waiting for all endangered transactions to exit before
-	 * we mark the index as valid.
-	 *
-	 * We also set ActiveSnapshot to this snap, since functions in indexes may
-	 * need a snapshot.
-	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
-	PushActiveSnapshot(snapshot);
 	/*
 	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
-	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
-	limitXmin = snapshot->xmin;
+	limitXmin = validate_index(tableId, indexRelationId, auxIndexRelationId);
 
-	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1863,8 +1841,8 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * The index is now valid in the sense that it contains all currently
 	 * interesting tuples.  But since it might not contain tuples deleted just
-	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.
+	 * before the last snapshot during validating was taken, we have to wait
+	 * out any transactions that might have older snapshots.
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
@@ -4429,7 +4407,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
-		Snapshot	snapshot;
 
 		StartTransactionCommand();
 
@@ -4444,13 +4421,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		if (newidx->safe)
 			set_indexsafe_procflags();
 
-		/*
-		 * Take the "reference snapshot" that will be used by validate_index()
-		 * to filter candidate tuples.
-		 */
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
-		PushActiveSnapshot(snapshot);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
@@ -4462,16 +4432,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
-
-		/*
-		 * We can now do away with our active snapshot, we still need to save
-		 * the xmin limit to wait for older snapshots.
-		 */
-		limitXmin = snapshot->xmin;
-
-		PopActiveSnapshot();
-		UnregisterSnapshot(snapshot);
+		limitXmin = validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId);
+		Assert(!TransactionIdIsValid(MyProc->xmin));
 
 		/*
 		 * To ensure no deadlocks, we must commit and start yet another
@@ -4484,7 +4446,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		/*
 		 * The index is now valid in the sense that it contains all currently
 		 * interesting tuples.  But since it might not contain tuples deleted
-		 * just before the reference snap was taken, we have to wait out any
+		 * just before the latest snap was taken, we have to wait out any
 		 * transactions that might have older snapshots.
 		 *
 		 * Because we don't take a snapshot or Xid in this transaction,
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index a80ee4fb03f..be29cf3ba5a 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -631,6 +631,15 @@
   boot_val => 'DEFAULT_ASSERT_ENABLED',
 },
 
+{ name => 'debug_cic_validate_snapshot_pages', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Refresh snapshot every N pages during CIC validation (0 to disable).',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'debug_cic_validate_snapshot_pages',
+  boot_val => '4096',
+  min => '0',
+  max => '1000000',
+},
+
 { name => 'debug_cic_validate_store_mem_pct', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Percentage of maintenance_work_mem used for CIC validation tuplestore.',
   flags => 'GUC_NOT_IN_SAMPLE',
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1a997537800..2380a593d71 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -701,12 +701,11 @@ typedef struct TableAmRoutine
 										   TableScanDesc scan);
 
 	/* see table_index_validate_scan for reference about parameters */
-	void		(*index_validate_scan) (Relation table_rel,
-										Relation index_rel,
-										IndexInfo *index_info,
-										Snapshot snapshot,
-										ValidateIndexState *state,
-										ValidateIndexState *aux_state);
+	TransactionId		(*index_validate_scan) (Relation table_rel,
+												Relation index_rel,
+												IndexInfo *index_info,
+												ValidateIndexState *state,
+												ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1829,20 +1828,18 @@ table_index_build_range_scan(Relation table_rel,
  * Note: it is responsibility of that function to close sortstates in
  * both `state` and `auxstate`.
  */
-static inline void
+static inline TransactionId
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
-						  Snapshot snapshot,
 						  ValidateIndexState *state,
 						  ValidateIndexState *auxstate)
 {
-	table_rel->rd_tableam->index_validate_scan(table_rel,
-											   index_rel,
-											   index_info,
-											   snapshot,
-											   state,
-											   auxstate);
+	return table_rel->rd_tableam->index_validate_scan(table_rel,
+													  index_rel,
+													  index_info,
+													  state,
+													  auxstate);
 }
 
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 6fa91bfcdc0..b33084cb91a 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -417,6 +417,21 @@ NormalTransactionIdOlder(TransactionId a, TransactionId b)
 	return b;
 }
 
+/* return the newer of the two IDs */
+static inline TransactionId
+TransactionIdNewer(TransactionId a, TransactionId b)
+{
+	if (!TransactionIdIsValid(a))
+		return b;
+
+	if (!TransactionIdIsValid(b))
+		return a;
+
+	if (TransactionIdFollows(a, b))
+		return a;
+	return b;
+}
+
 /* return the newer of the two IDs */
 static inline FullTransactionId
 FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 727993d1a5a..91666663834 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -158,7 +158,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
+extern TransactionId validate_index(Oid heapId, Oid indexId, Oid auxIndexId);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f4f4aa19963..2af08b66d43 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -269,6 +269,7 @@ extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
 extern PGDLLIMPORT int debug_cic_validate_store_mem_pct;
+extern PGDLLIMPORT int debug_cic_validate_snapshot_pages;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2d6abb15a89..758c5884ff5 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3382,6 +3382,9 @@ DROP INDEX aux_index_ind6;
 --------+---------+-----------+----------+---------
  c1     | integer |           |          | 
 
+SET default_transaction_isolation = 'repeatable read';
+CREATE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+SET default_transaction_isolation = 'read committed';
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index fd96d80abbc..65dd58b947d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1400,6 +1400,10 @@ DROP INDEX aux_index_ind6;
 -- Make sure auxiliary index dropped too
 \d aux_index_tab5
 
+SET default_transaction_isolation = 'repeatable read';
+CREATE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+SET default_transaction_isolation = 'read committed';
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.43.0



  [application/octet-stream] v32-0006-Optimize-auxiliary-index-handling.patch (3.0K, 9-v32-0006-Optimize-auxiliary-index-handling.patch)
  download | inline diff:
From 1ceedab1465ccb7c981667c210d35a2e14615c77 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 30 Dec 2024 16:37:12 +0100
Subject: [PATCH v32 6/7] Optimize auxiliary index handling

Skip unnecessary computations for auxiliary indices by:
- in the index-insert path, detect auxiliary indexes and bypass Datum value computation
- set indexUnchanged=false for auxiliary indices to avoid redundant checks

These optimizations reduce overhead during concurrent index operations.
---
 src/backend/catalog/index.c         | 9 +++++++++
 src/backend/executor/execIndexing.c | 5 ++++-
 src/include/nodes/execnodes.h       | 6 ++++--
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5bf7fe131c0..b8e4dfe88aa 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2923,6 +2923,15 @@ FormIndexDatum(IndexInfo *indexInfo,
 	ListCell   *indexpr_item;
 	int			i;
 
+	/* Auxiliary index does not need any values to be computed */
+	if (unlikely(indexInfo->ii_Auxiliary))
+	{
+		Assert(indexInfo->ii_Am == STIR_AM_OID);
+		memset(values, 0, sizeof(Datum) * indexInfo->ii_NumIndexAttrs);
+		memset(isnull, true, sizeof(bool) * indexInfo->ii_NumIndexAttrs);
+		return;
+	}
+
 	if (indexInfo->ii_Expressions != NIL &&
 		indexInfo->ii_ExpressionsState == NIL)
 	{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..b0e606460be 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -438,8 +438,11 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
 		 * There's definitely going to be an index_insert() call for this
 		 * index.  If we're being called as part of an UPDATE statement,
 		 * consider if the 'indexUnchanged' = true hint should be passed.
+		 *
+		 * For auxiliary indexes, always pass false to skip value comparison checks,
+		 * since auxiliary indexes only store TIDs and don't track value changes.
 		 */
-		indexUnchanged = ((flags & EIIT_IS_UPDATE) &&
+		indexUnchanged = ((flags & EIIT_IS_UPDATE) && !indexInfo->ii_Auxiliary &&
 						  index_unchanged_by_update(resultRelInfo,
 													estate,
 													indexInfo,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 136dddbbf11..69441685ddb 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -166,8 +166,10 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
- * are used only during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
+ * during index build; they're conventionally zeroed otherwise.  ii_Auxiliary
+ * is also used during retail inserts to skip datum formation for auxiliary
+ * indexes.
  * ----------------
  */
 typedef struct IndexInfo
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-03-31 22:11  Mihail Nikalayeu <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Mihail Nikalayeu @ 2026-03-31 22:11 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; +Cc: Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Hello!

Just rebased.


Attachments:

  [application/x-patch] v33-0001-Add-stress-tests-for-concurrent-index-builds.patch (11.9K, 2-v33-0001-Add-stress-tests-for-concurrent-index-builds.patch)
  download | inline diff:
From 1377297603529fadec0727ee3fa7dc51853b440d Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v33 1/7] Add stress tests for concurrent index builds

Introduce stress tests for concurrent index operations:
- test concurrent inserts/updates during CREATE/REINDEX INDEX CONCURRENTLY
- cover various index types (btree, gin, gist, brin, hash, spgist)
- test unique and non-unique indexes
- test with expressions and predicates
- test both parallel and non-parallel operations

These tests verify the behavior of the following commits.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 273 ++++++++++++++++++++++++++++++++
 2 files changed, 274 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 592cef74ecb..51a62dccb7b 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..47fc65b9dab
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,273 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+use constant STRESS_PGBENCH_CLIENTS => 30;
+use constant STRESS_PGBENCH_JOBS => 8;
+use constant STRESS_PGBENCH_TRANSACTIONS => 10000;
+use constant STRESS_MAX_SLEEP_MS => 10;
+
+use constant DEFAULT_PGBENCH_CLIENTS => 15;
+use constant DEFAULT_PGBENCH_JOBS => 4;
+use constant DEFAULT_PGBENCH_TRANSACTIONS => 500;
+use constant DEFAULT_MAX_SLEEP_MS => 1;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my $node;
+my $pg_test_extra = $ENV{PG_TEST_EXTRA} // '';
+my $is_stress = $pg_test_extra =~ /\bstress\b/ ? 1 : 0;
+my $pgbench_clients =
+  $is_stress ? STRESS_PGBENCH_CLIENTS : DEFAULT_PGBENCH_CLIENTS;
+my $pgbench_jobs = $is_stress ? STRESS_PGBENCH_JOBS : DEFAULT_PGBENCH_JOBS;
+my $pgbench_transactions =
+  $is_stress ? STRESS_PGBENCH_TRANSACTIONS : DEFAULT_PGBENCH_TRANSACTIONS;
+my $max_sleep_ms = $is_stress ? STRESS_MAX_SLEEP_MS : DEFAULT_MAX_SLEEP_MS;
+my $pgbench_options = sprintf(
+	'--no-vacuum --client=%d --jobs=%d --exit-on-abort --transactions=%d',
+	$pgbench_clients,
+	$pgbench_jobs,
+	$pgbench_transactions);
+my $no_hot = $is_stress ? int(rand(2)) : 0;
+
+print(
+		sprintf(
+		'settings: PG_TEST_EXTRA=%s stress=%d clients=%d jobs=%d transactions=%d max_sleep_ms=%d no_hot=%d',
+		defined($ENV{PG_TEST_EXTRA})
+		? ($pg_test_extra eq '' ? '(empty)' : $pg_test_extra)
+		: '(undef)',
+		$is_stress,
+		$pgbench_clients,
+		$pgbench_jobs,
+		$pgbench_transactions,
+		$max_sleep_ms,
+		$no_hot));
+print "\n";
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'maintenance_work_mem = 32MB'); # to avoid OOM
+$node->append_conf('postgresql.conf', 'shared_buffers = 32MB'); # to avoid OOM
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE UNLOGGED TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp,
+								ia int4[], p point)));
+
+if ($no_hot) { $node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);)); }
+
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => sprintf(q(
+			SET debug_parallel_query = off; -- this is because predicate_stable implementation
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for unique BTREE',
+	{
+		'concurrent_ops_unique_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE UNIQUE INDEX CONCURRENTLY new_idx ON tbl(i);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIN with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN',
+	{
+		'concurrent_ops_gin_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIN (ia);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIST/BRIN/HASH/SPGIST index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_other_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\set variant random(0, 3)
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIST (p);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING BRIN (updated_at);
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING HASH (updated_at);
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING SPGIST (p);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					DROP INDEX CONCURRENTLY new_idx;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+		});
+
+$node->stop;
+done_testing();
-- 
2.53.0



  [application/x-patch] v33-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch (21.0K, 3-v33-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch)
  download | inline diff:
From 2d640837af25d4cdd0ab54d61c641ee2dd5a7c8d Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 12 Jan 2026 00:57:56 +0300
Subject: [PATCH v33 3/7] Add Datum storage support to tuplestore Extend
 tuplestore to store individual Datum values

This support enables usages of tuplestore for non-tuple data (TIDs) in the next commit.
---
 src/backend/utils/sort/tuplestore.c | 367 +++++++++++++++++++++++-----
 src/include/utils/tuplestore.h      |  33 +--
 2 files changed, 327 insertions(+), 73 deletions(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index f9e2d95186a..2a9b25bd238 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1,16 +1,19 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.c
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
+ *
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
  * Also, it is possible to support multiple independent read pointers.
  *
  * A temporary file is used to handle the data if it exceeds the
@@ -61,6 +64,8 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "storage/buffile.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
 #include "utils/tuplestore.h"
@@ -116,16 +121,15 @@ struct Tuplestorestate
 	BufFile    *myfile;			/* underlying file, or NULL if none */
 	MemoryContext context;		/* memory context for holding tuples */
 	ResourceOwner resowner;		/* resowner for holding temp files */
+	Oid			datumType;		/* InvalidOid or oid of Datum's to be stored */
+	int16		datumTypeLen;	/* typelen of that Datum */
+	bool		datumTypeByVal; /* by-value of that Datum */
 
 	/*
 	 * These function pointers decouple the routines that must know what kind
 	 * of tuple we are handling from the routines that don't need to know it.
 	 * They are set up by the tuplestore_begin_xxx routines.
 	 *
-	 * (Although tuplestore.c currently only supports heap tuples, I've copied
-	 * this part of tuplesort.c so that extension to other kinds of objects
-	 * will be easy if it's ever needed.)
-	 *
 	 * Function to copy a supplied input tuple into palloc'd space. (NB: we
 	 * assume that a single pfree() is enough to release the tuple later, so
 	 * the representation must be "flat" in one palloc chunk.) state->availMem
@@ -150,6 +154,12 @@ struct Tuplestorestate
 	 */
 	void	   *(*readtup) (Tuplestorestate *state, unsigned int len);
 
+	/*
+	 * Function to get length of tuple from tape. Used to provide 'len' argument
+	 * for readtup (see above).
+	 */
+	unsigned int(*lentup) (Tuplestorestate *state, bool eofOK);
+
 	/*
 	 * This array holds pointers to tuples in memory if we are in state INMEM.
 	 * In states WRITEFILE and READFILE it's not used.
@@ -186,6 +196,7 @@ struct Tuplestorestate
 #define COPYTUP(state,tup)	((*(state)->copytup) (state, tup))
 #define WRITETUP(state,tup) ((*(state)->writetup) (state, tup))
 #define READTUP(state,len)	((*(state)->readtup) (state, len))
+#define LENTUP(state,eofOK)	((*(state)->lentup) (state, eofOK))
 #define LACKMEM(state)		((state)->availMem < 0)
 #define USEMEM(state,amt)	((state)->availMem -= (amt))
 #define FREEMEM(state,amt)	((state)->availMem += (amt))
@@ -194,9 +205,9 @@ struct Tuplestorestate
  *
  * NOTES about on-tape representation of tuples:
  *
- * We require the first "unsigned int" of a stored tuple to be the total size
- * on-tape of the tuple, including itself (so it is never zero).
- * The remainder of the stored tuple
+ * In case of tuples we use first "unsigned int" of a stored tuple
+ * to be the total size on-tape of the tuple, including itself
+ * (so it is never zero). The remainder of the stored tuple
  * may or may not match the in-memory representation of the tuple ---
  * any conversion needed is the job of the writetup and readtup routines.
  *
@@ -207,10 +218,13 @@ struct Tuplestorestate
  * state->backward is not set, the write/read routines may omit the extra
  * length word.
  *
- * writetup is expected to write both length words as well as the tuple
+ * In the case of Datum with constant length, both "unsigned int" are omitted.
+ *
+ * writetup is expected to write both length words and the tuple
  * data.  When readtup is called, the tape is positioned just after the
- * front length word; readtup must read the tuple data and advance past
- * the back length word (if present).
+ * front length word (if it is not omitted like in case of content-size Datum);
+ * readtup must read the tuple data and advance past the back length word
+ * (if present).
  *
  * The write/read routines can make use of the tuple description data
  * stored in the Tuplestorestate record, if needed. They are also expected
@@ -242,11 +256,16 @@ static Tuplestorestate *tuplestore_begin_common(int eflags,
 static void tuplestore_puttuple_common(Tuplestorestate *state, void *tuple);
 static void dumptuples(Tuplestorestate *state);
 static void tuplestore_updatemax(Tuplestorestate *state);
-static unsigned int getlen(Tuplestorestate *state, bool eofOK);
+
+static unsigned int lentup_heap(Tuplestorestate *state, bool eofOK);
 static void *copytup_heap(Tuplestorestate *state, void *tup);
 static void writetup_heap(Tuplestorestate *state, void *tup);
 static void *readtup_heap(Tuplestorestate *state, unsigned int len);
 
+static unsigned int lentup_datum(Tuplestorestate *state, bool eofOK);
+static void *copytup_datum(Tuplestorestate *state, void *datum);
+static void writetup_datum(Tuplestorestate *state, void *datum);
+static void *readtup_datum(Tuplestorestate *state, unsigned int len);
 
 /*
  *		tuplestore_begin_xxx
@@ -269,6 +288,12 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
 	state->allowedMem = maxKBytes * (int64) 1024;
 	state->availMem = state->allowedMem;
 	state->myfile = NULL;
+	/*
+	 * Set Datum related data to invalid by default.
+	 */
+	state->datumType = InvalidOid;
+	state->datumTypeLen = 0;
+	state->datumTypeByVal = false;
 
 	/*
 	 * The palloc/pfree pattern for tuple memory is in a FIFO pattern.  A
@@ -346,6 +371,37 @@ tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
 	state->copytup = copytup_heap;
 	state->writetup = writetup_heap;
 	state->readtup = readtup_heap;
+	state->lentup = lentup_heap;
+
+	return state;
+}
+
+/*
+ * The same as tuplestore_begin_heap but create store for Datum values.
+ */
+Tuplestorestate *
+tuplestore_begin_datum(Oid datumType, bool randomAccess, bool interXact, int maxKBytes)
+{
+	Tuplestorestate *state;
+	int			eflags;
+
+	/*
+	 * This interpretation of the meaning of randomAccess is compatible with
+	 * the pre-8.3 behavior of tuplestores.
+	 */
+	eflags = randomAccess ?
+		(EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND) :
+		(EXEC_FLAG_REWIND);
+
+	state = tuplestore_begin_common(eflags, interXact, maxKBytes);
+	state->datumType = datumType;
+	get_typlenbyval(state->datumType, &state->datumTypeLen, &state->datumTypeByVal);
+	Assert(!(state->datumTypeByVal && randomAccess));
+
+	state->copytup = copytup_datum;
+	state->writetup = writetup_datum;
+	state->readtup = readtup_datum;
+	state->lentup = lentup_datum;
 
 	return state;
 }
@@ -444,16 +500,19 @@ tuplestore_clear(Tuplestorestate *state)
 	{
 		int64		availMem = state->availMem;
 
-		/*
-		 * Below, we reset the memory context for storing tuples.  To save
-		 * from having to always call GetMemoryChunkSpace() on all stored
-		 * tuples, we adjust the availMem to forget all the tuples and just
-		 * recall USEMEM for the space used by the memtuples array.  Here we
-		 * just Assert that's correct and the memory tracking hasn't gone
-		 * wrong anywhere.
-		 */
-		for (i = state->memtupdeleted; i < state->memtupcount; i++)
-			availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			/*
+			 * Below, we reset the memory context for storing tuples.  To save
+			 * from having to always call GetMemoryChunkSpace() on all stored
+			 * tuples, we adjust the availMem to forget all the tuples and just
+			 * recall USEMEM for the space used by the memtuples array.  Here we
+			 * just Assert that's correct and the memory tracking hasn't gone
+			 * wrong anywhere.
+			 */
+			for (i = state->memtupdeleted; i < state->memtupcount; i++)
+				availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		}
 
 		availMem += GetMemoryChunkSpace(state->memtuples);
 
@@ -777,6 +836,25 @@ tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple)
 	MemoryContextSwitchTo(oldcxt);
 }
 
+/*
+ * Like tuplestore_puttupleslot but for single Datum.
+ */
+void
+tuplestore_putdatum(Tuplestorestate *state, Datum datum)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->context);
+
+	/*
+	 * Copy the Datum.  (Must do this even in WRITEFILE case.  Note that
+	 * COPYTUP includes USEMEM, so we needn't do that here.)
+	 */
+	datum = PointerGetDatum(COPYTUP(state, DatumGetPointer(datum)));
+
+	tuplestore_puttuple_common(state, DatumGetPointer(datum));
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
 /*
  * Similar to tuplestore_puttuple(), but work from values + nulls arrays.
  * This avoids an extra tuple-construction operation.
@@ -1028,10 +1106,10 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 			pg_fallthrough;
 
 		case TSS_READFILE:
-			*should_free = true;
+			*should_free = !state->datumTypeByVal;
 			if (forward)
 			{
-				if ((tuplen = getlen(state, true)) != 0)
+				if ((tuplen = LENTUP(state, true)) != 0)
 				{
 					tup = READTUP(state, tuplen);
 					return tup;
@@ -1043,6 +1121,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				}
 			}
 
+			Assert(!state->datumTypeByVal);
 			/*
 			 * Backward.
 			 *
@@ -1060,7 +1139,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				Assert(!state->truncated);
 				return NULL;
 			}
-			tuplen = getlen(state, false);
+			tuplen = LENTUP(state, false);
 
 			if (readptr->eof_reached)
 			{
@@ -1091,7 +1170,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 					Assert(!state->truncated);
 					return NULL;
 				}
-				tuplen = getlen(state, false);
+				tuplen = LENTUP(state, false);
 			}
 
 			/*
@@ -1153,6 +1232,41 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 	}
 }
 
+bool
+tuplestore_getdatum(Tuplestorestate *state, bool forward,
+					bool *should_free, Datum *result)
+{
+	Datum datum;
+	*should_free = false;
+
+	datum = (Datum) tuplestore_gettuple(state, forward, should_free);
+
+	/* For by-value datum we may receive zero as valid value. */
+	if (state->datumTypeByVal)
+	{
+		/* Return false only on EOF */
+		if (state->readptrs[state->activeptr].eof_reached)
+		{
+			*result = PointerGetDatum(NULL);
+			return false;
+		}
+
+		*result = datum;
+		return true;
+	}
+
+	if (datum)
+	{
+		*result = datum;
+		return true;
+	}
+	else
+	{
+		*result = PointerGetDatum(NULL);
+		return false;
+	}
+}
+
 /*
  * tuplestore_gettupleslot_force - exported function to fetch a tuple
  *
@@ -1205,10 +1319,20 @@ tuplestore_advance(Tuplestorestate *state, bool forward)
 			pfree(tuple);
 		return true;
 	}
-	else
+
+	/*
+	 * A NULL return normally means end-of-data, but for by-value datum
+	 * stores a valid zero-valued datum (e.g., false, 0) is indistinguishable
+	 * from NULL via pointer check.  Use eof_reached to distinguish.
+	 */
+	if (state->datumTypeByVal)
 	{
-		return false;
+		TSReadPointer *readptr = &state->readptrs[state->activeptr];
+
+		return !readptr->eof_reached;
 	}
+
+	return false;
 }
 
 /*
@@ -1271,7 +1395,13 @@ tuplestore_skiptuples(Tuplestorestate *state, int64 ntuples, bool forward)
 				tuple = tuplestore_gettuple(state, forward, &should_free);
 
 				if (tuple == NULL)
-					return false;
+				{
+					/* See tuplestore_advance for why pointer check is insufficient */
+					if (!state->datumTypeByVal ||
+						state->readptrs[state->activeptr].eof_reached)
+						return false;
+					continue;
+				}
 				if (should_free)
 					pfree(tuple);
 				CHECK_FOR_INTERRUPTS();
@@ -1505,8 +1635,11 @@ tuplestore_trim(Tuplestorestate *state)
 	/* Release no-longer-needed tuples */
 	for (i = state->memtupdeleted; i < nremove; i++)
 	{
-		FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
-		pfree(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
+			pfree(state->memtuples[i]);
+		}
 		state->memtuples[i] = NULL;
 		/* As in dumptuples(), increment memtupdeleted synchronously */
 		state->memtupdeleted++;
@@ -1603,25 +1736,6 @@ tuplestore_in_memory(Tuplestorestate *state)
 	return (state->status == TSS_INMEM);
 }
 
-
-/*
- * Tape interface routines
- */
-
-static unsigned int
-getlen(Tuplestorestate *state, bool eofOK)
-{
-	unsigned int len;
-	size_t		nbytes;
-
-	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
-	if (nbytes == 0)
-		return 0;
-	else
-		return len;
-}
-
-
 /*
  * Routines specialized for HeapTuple case
  *
@@ -1632,6 +1746,19 @@ getlen(Tuplestorestate *state, bool eofOK)
  * to write that separately.
  */
 
+static unsigned int
+lentup_heap(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	else
+		return len;
+}
+
 static void *
 copytup_heap(Tuplestorestate *state, void *tup)
 {
@@ -1678,3 +1805,127 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 		BufFileReadExact(state->myfile, &tuplen, sizeof(tuplen));
 	return tuple;
 }
+
+/*
+ * Routines specialized for Datum case.
+ *
+ * Handles both fixed and variable-length Datums efficiently:
+ * - Fixed-length and Variable-length includes length prefix (and suffix if backward scan)
+ * - By-value types handled inline without extra copying, storing single extra byte
+ *   XXX: consider refactoring to avoid it, currently need it for correct rewind logic
+ */
+
+static unsigned int
+lentup_datum(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	Assert(state->datumType != InvalidOid);
+
+	if (state->datumTypeByVal)
+	{
+		uint8	junk;
+		nbytes = BufFileReadMaybeEOF(state->myfile, &junk, sizeof(uint8), eofOK);
+		if (nbytes == 0)
+			return 0;
+		Assert(junk == (uint8) state->datumTypeLen);
+		return state->datumTypeLen;
+	}
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	return len;
+}
+
+static void *
+copytup_datum(Tuplestorestate *state, void *datum)
+{
+	Datum d;
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+		return DatumGetPointer(PointerGetDatum(datum));
+
+	if (datum == NULL)
+		return NULL;
+
+	d = datumCopy(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+	USEMEM(state, GetMemoryChunkSpace(DatumGetPointer(d)));
+	return DatumGetPointer(d);
+}
+
+static void
+writetup_datum(Tuplestorestate *state, void *datum)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		uint8 junk = state->datumTypeLen; /* overflow is ok */
+		Datum v;
+		Assert(state->datumTypeLen > 0);
+
+		/* just marker byte used to track the end of data for rewind logic */
+		BufFileWrite(state->myfile, &junk, sizeof(junk));
+		store_att_byval(&v, PointerGetDatum(datum), state->datumTypeLen);
+		BufFileWrite(state->myfile, &v, state->datumTypeLen);
+		Assert(!state->backward);
+	}
+	else
+	{
+		unsigned int size;
+		unsigned int tuplen;
+
+		if (state->datumTypeLen < 0)
+			size = datumGetSize(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+		else
+			size = state->datumTypeLen;
+
+		/*
+		 * Include sizeof(unsigned int) in the stored length, matching the
+		 * convention used by writetup_heap.  The backward-scan seek
+		 * arithmetic in tuplestore_gettuple assumes this.
+		 */
+		tuplen = size + sizeof(unsigned int);
+		BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		BufFileWrite(state->myfile, datum, size);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		FREEMEM(state, GetMemoryChunkSpace(datum));
+		pfree(datum);
+	}
+}
+
+static void *
+readtup_datum(Tuplestorestate *state, unsigned int len)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		Datum datum = 0;
+
+		Assert(state->datumTypeLen > 0);
+		Assert(len == state->datumTypeLen);
+		BufFileReadExact(state->myfile, &datum, state->datumTypeLen);
+
+		Assert(!state->backward);
+		return DatumGetPointer(fetch_att(&datum, true, state->datumTypeLen));
+	}
+	else
+	{
+		unsigned int datalen = len - sizeof(unsigned int);
+		void *data = palloc(datalen);
+
+		BufFileReadExact(state->myfile, data, datalen);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileReadExact(state->myfile, &len, sizeof(len));
+
+		return data;
+	}
+}
diff --git a/src/include/utils/tuplestore.h b/src/include/utils/tuplestore.h
index f638b96e156..e16d9a3d352 100644
--- a/src/include/utils/tuplestore.h
+++ b/src/include/utils/tuplestore.h
@@ -1,17 +1,18 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.h
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
- * Also, it is possible to support multiple independent read pointers.
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
  * A temporary file is used to handle the data if it exceeds the
  * space limit specified by the caller.
@@ -39,14 +40,13 @@
  */
 typedef struct Tuplestorestate Tuplestorestate;
 
-/*
- * Currently we only need to store MinimalTuples, but it would be easy
- * to support the same behavior for IndexTuples and/or bare Datums.
- */
-
 extern Tuplestorestate *tuplestore_begin_heap(bool randomAccess,
 											  bool interXact,
 											  int maxKBytes);
+extern Tuplestorestate *tuplestore_begin_datum(Oid datumType,
+											   bool randomAccess,
+											   bool interXact,
+											   int maxKBytes);
 
 extern void tuplestore_set_eflags(Tuplestorestate *state, int eflags);
 
@@ -55,6 +55,7 @@ extern void tuplestore_puttupleslot(Tuplestorestate *state,
 extern void tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple);
 extern void tuplestore_putvalues(Tuplestorestate *state, TupleDesc tdesc,
 								 const Datum *values, const bool *isnull);
+extern void tuplestore_putdatum(Tuplestorestate *state, Datum datum);
 
 extern int	tuplestore_alloc_read_pointer(Tuplestorestate *state, int eflags);
 
@@ -72,6 +73,8 @@ extern bool tuplestore_in_memory(Tuplestorestate *state);
 
 extern bool tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 									bool copy, TupleTableSlot *slot);
+extern bool tuplestore_getdatum(Tuplestorestate *state, bool forward,
+								bool *should_free, Datum *result);
 
 extern bool tuplestore_gettupleslot_force(Tuplestorestate *state, bool forward,
 										  bool copy, TupleTableSlot *slot);
-- 
2.53.0



  [application/x-patch] v33-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch (98.0K, 4-v33-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch)
  download | inline diff:
From 8ae3087b8d1d04b62b623145036dfb4a83197a80 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 15:03:10 +0100
Subject: [PATCH v33 4/7] Use auxiliary indexes for concurrent index operations

Replace the second table full scan in concurrent index builds with an auxiliary index approach:
- create a STIR auxiliary index with the same predicate (if exists) as in main index
- use it to track tuples inserted during the first phase
- merge auxiliary index with main index during validation to catch up new index with any tuples missed during the first phase
- automatically drop auxiliary when main index is ready

To merge main and auxiliary indexes:
- index_bulk_delete called for both, TIDs put into tuplesort
- both tuplesort are being sorted
- both tuplesort scanned with two pointers looking for the TIDs present in auxiliary index, but absent in main one
- all such TIDs are put into tuplestore
- all TIDs in tuplestore are fetched using the stream, tuplestore used in heapam_index_validate_scan_read_stream_next to provide the next page to prefetch
- if fetched tuple is alive - it is inserted into the main index

This eliminates the need for a second full table scan during validation, improving performance, especially for large tables. Affects both CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY operations.
---
 doc/src/sgml/monitoring.sgml               |  26 +-
 doc/src/sgml/ref/create_index.sgml         |  34 +-
 doc/src/sgml/ref/reindex.sgml              |  40 +-
 src/backend/access/heap/README.HOT         |  13 +-
 src/backend/access/heap/heapam_handler.c   | 561 ++++++++++++++-------
 src/backend/catalog/index.c                | 322 ++++++++++--
 src/backend/catalog/system_views.sql       |  17 +-
 src/backend/commands/indexcmds.c           | 344 +++++++++++--
 src/backend/nodes/makefuncs.c              |   4 +-
 src/backend/utils/misc/guc_parameters.dat  |   9 +
 src/include/access/tableam.h               |  12 +-
 src/include/catalog/index.h                |   9 +-
 src/include/commands/progress.h            |  13 +-
 src/include/miscadmin.h                    |   1 +
 src/include/nodes/makefuncs.h              |   3 +-
 src/test/regress/expected/create_index.out |  42 ++
 src/test/regress/expected/indexing.out     |   3 +-
 src/test/regress/expected/rules.out        |  17 +-
 src/test/regress/sql/create_index.sql      |  21 +
 19 files changed, 1155 insertions(+), 336 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bb75ed1069b..835b4aeed77 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6787,6 +6787,18 @@ FROM pg_stat_get_backend_idset() AS backendid;
        information for this phase.
       </entry>
      </row>
+     <row>
+      <entry><literal>waiting for writers to use auxiliary index</literal></entry>
+      <entry>
+       <command>CREATE INDEX CONCURRENTLY</command> or <command>REINDEX CONCURRENTLY</command> is waiting for transactions
+       with write locks that can potentially see the table to finish, to ensure use of auxiliary index for new tuples in
+       future transactions.
+       This phase is skipped when not in concurrent mode.
+       Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
+       and <structname>current_locker_pid</structname> contain the progress
+       information for this phase.
+      </entry>
+     </row>
      <row>
       <entry><literal>building index</literal></entry>
       <entry>
@@ -6827,13 +6839,12 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </entry>
      </row>
      <row>
-      <entry><literal>index validation: scanning table</literal></entry>
+      <entry><literal>index validation: merging indexes</literal></entry>
       <entry>
-       <command>CREATE INDEX CONCURRENTLY</command> is scanning the table
-       to validate the index tuples collected in the previous two phases.
+       <command>CREATE INDEX CONCURRENTLY</command> is merging content of auxiliary index with the target index.
        This phase is skipped when not in concurrent mode.
-       Columns <structname>blocks_total</structname> (set to the total size of the table)
-       and <structname>blocks_done</structname> contain the progress information for this phase.
+       Columns <structname>tuples_total</structname> (set to the number of tuples to be merged)
+       and <structname>tuples_done</structname> contain the progress information for this phase.
       </entry>
      </row>
      <row>
@@ -6850,8 +6861,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
      <row>
       <entry><literal>waiting for readers before marking dead</literal></entry>
       <entry>
-       <command>REINDEX CONCURRENTLY</command> is waiting for transactions
-       with read locks on the table to finish, before marking the old index dead.
+       <command>CREATE INDEX CONCURRENTLY</command> is waiting for transactions
+        with read locks on the table to finish, before marking the auxiliary index as dead.
+       <command>REINDEX CONCURRENTLY</command> is also waiting before marking the old index as dead.
        This phase is skipped when not in concurrent mode.
        Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
        and <structname>current_locker_pid</structname> contain the progress
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index bb7505d171b..12c88587a79 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -620,10 +620,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     out writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>CREATE INDEX</command>.
     When this option is used,
-    <productname>PostgreSQL</productname> must perform two scans of the table, and in
-    addition it must wait for all existing transactions that could potentially
-    modify or use the index to terminate.  Thus
-    this method requires more total work than a standard index build and takes
+    <productname>PostgreSQL</productname> must perform table scan followed by
+    validation phase, and in addition it must wait for all existing transactions
+    that could potentially modify or use the index to terminate.  Thus
+    this method requires more total work than a standard index build and may take
     significantly longer to complete.  However, since it allows normal
     operations to continue while the index is built, this method is useful for
     adding new indexes in a production environment.  Of course, the extra CPU
@@ -631,14 +631,14 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </para>
 
    <para>
-    In a concurrent index build, the index is actually entered as an
-    <quote>invalid</quote> index into
-    the system catalogs in one transaction, then two table scans occur in
-    two more transactions.  Before each table scan, the index build must
+    In a concurrent index build, the main and auxiliary indexes are actually
+    entered as an <quote>invalid</quote> index into
+    the system catalogs in one transaction, then two phases occur in
+    multiple transactions.  Before each phase, the index build must
     wait for existing transactions that have modified the table to terminate.
-    After the second scan, the index build must wait for any transactions
+    After the second phase, the index build must wait for any transactions
     that have a snapshot (see <xref linkend="mvcc"/>) predating the second
-    scan to terminate, including transactions used by any phase of concurrent
+    phase to terminate, including transactions used by any phase of concurrent
     index builds on other tables, if the indexes involved are partial or have
     columns that are not simple column references.
     Then finally the index can be marked <quote>valid</quote> and ready for use,
@@ -651,10 +651,11 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    <para>
     If a problem arises while scanning the table, such as a deadlock or a
     uniqueness violation in a unique index, the <command>CREATE INDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> index. This index
-    will be ignored for querying purposes because it might be incomplete;
-    however it will still consume update overhead. The <application>psql</application>
-    <command>\d</command> command will report such an index as <literal>INVALID</literal>:
+    command will fail but leave behind an <quote>invalid</quote> index and its
+    associated auxiliary index. These indexes
+    will be ignored for querying purposes because they might be incomplete;
+    however they will still consume update overhead. The <application>psql</application>
+    <command>\d</command> command will report such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -664,11 +665,12 @@ postgres=# \d tab
  col    | integer |           |          |
 Indexes:
     "idx" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     The recommended recovery
-    method in such cases is to drop the index and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>.  (Another possibility is
+    method in such cases is to drop these indexes and try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
     to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
    </para>
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 185cd75ca30..9e0248261ae 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -368,9 +368,8 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <productname>PostgreSQL</productname> supports rebuilding indexes with minimum locking
     of writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>REINDEX</command>. When this option
-    is used, <productname>PostgreSQL</productname> must perform two scans of the table
-    for each index that needs to be rebuilt and wait for termination of
-    all existing transactions that could potentially use the index.
+    is used, <productname>PostgreSQL</productname> must perform several steps to ensure data
+    consistency while allowing normal operations to continue.
     This method requires more total work than a standard index
     rebuild and takes significantly longer to complete as it needs to wait
     for unfinished transactions that might modify the index. However, since
@@ -388,7 +387,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <orderedlist>
      <listitem>
       <para>
-       A new transient index definition is added to the catalog
+       A new transient index definition and an auxiliary index are added to the catalog
        <literal>pg_index</literal>.  This definition will be used to replace
        the old index.  A <literal>SHARE UPDATE EXCLUSIVE</literal> lock at
        session level is taken on the indexes being reindexed as well as their
@@ -398,7 +397,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       A first pass to build the index is done for each new index.  Once the
+       The auxiliary index is marked as "ready for inserts", making
+       it visible to other sessions. This index efficiently tracks all new
+       tuples during the reindex process.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The new main index is built by scanning the table.  Once the
        index is built, its flag <literal>pg_index.indisready</literal> is
        switched to <quote>true</quote> to make it ready for inserts, making it
        visible to other sessions once the transaction that performed the build
@@ -409,9 +416,9 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       Then a second pass is performed to add tuples that were added while the
-       first pass was running.  This step is also done in a separate
-       transaction for each index.
+       A validation phase merges any missing entries from the auxiliary index
+       into the main index, ensuring all concurrent changes are captured.
+       This step is also done in a separate transaction for each index.
       </para>
      </listitem>
 
@@ -428,7 +435,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes have <literal>pg_index.indisready</literal> switched to
+       The old and auxiliary indexes have <literal>pg_index.indisready</literal> switched to
        <quote>false</quote> to prevent any new tuple insertions, after waiting
        for running queries that might reference the old index to complete.
       </para>
@@ -436,7 +443,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes are dropped.  The <literal>SHARE UPDATE
+       The old and auxiliary indexes are dropped.  The <literal>SHARE UPDATE
        EXCLUSIVE</literal> session locks for the indexes and the table are
        released.
       </para>
@@ -447,11 +454,11 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
    <para>
     If a problem arises while rebuilding the indexes, such as a
     uniqueness violation in a unique index, the <command>REINDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> new index in addition to
-    the pre-existing one. This index will be ignored for querying purposes
-    because it might be incomplete; however it will still consume update
+    command will fail but leave behind an <quote>invalid</quote> new index and its auxiliary index in addition to
+    the pre-existing one. These indexes will be ignored for querying purposes
+    because they might be incomplete; however they will still consume update
     overhead. The <application>psql</application> <command>\d</command> command will report
-    such an index as <literal>INVALID</literal>:
+    such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -462,12 +469,13 @@ postgres=# \d tab
 Indexes:
     "idx" btree (col)
     "idx_ccnew" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     If the index marked <literal>INVALID</literal> is suffixed
-    <literal>_ccnew</literal>, then it corresponds to the transient
+    <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
-    recovery method is to drop it using <literal>DROP INDEX</literal>,
+    recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
     then attempt <command>REINDEX CONCURRENTLY</command> again.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 74e407f375a..b1c797517ee 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -375,6 +375,11 @@ constraint on which updates can be HOT.  Other transactions must include
 such an index when determining HOT-safety of updates, even though they
 must ignore it for both insertion and searching purposes.
 
+Also, special auxiliary index is created the same way. It is marked as
+"ready for inserts" without any actual table scan. Its purpose is to collect
+new tuples inserted into table while our target index is still "not ready
+for inserts".
+
 We must do this to avoid making incorrect index entries.  For example,
 suppose we are building an index on column X and we make an index entry for
 a non-HOT tuple with X=1.  Then some other backend, unaware that X is an
@@ -394,10 +399,10 @@ entry at the root of the HOT-update chain but we use the key value from the
 live tuple.
 
 We mark the index open for inserts (but still not ready for reads) then
-we again wait for transactions which have the table open.  Then we take
-a second reference snapshot and validate the index.  This searches for
-tuples missing from the index, and inserts any missing ones.  Again,
-the index entries have to have TIDs equal to HOT-chain root TIDs, but
+we again wait for transactions which have the table open.  Then validate
+the index.  This searches for tuples missing from the index in auxiliary
+index, and inserts any missing ones if they are visible to reference snapshot.
+Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cdd153c6b6d..3a04453ff5d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -42,15 +42,20 @@
 #include "storage/lmgr.h"
 #include "storage/lock.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/tuplesort.h"
+#include "utils/tuplestore.h"
+
+/* GUC: percentage of maintenance_work_mem for CIC validation tuplestore */
+int			debug_cic_validate_store_mem_pct = 10;
 
 static void reform_and_rewrite_tuple(HeapTuple tuple,
-									 Relation OldHeap, Relation NewHeap,
-									 Datum *values, bool *isnull, RewriteState rwstate);
+                                     Relation OldHeap, Relation NewHeap,
+                                     Datum *values, bool *isnull, RewriteState rwstate);
 
 static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer,
 								   HeapTuple tuple,
@@ -1773,242 +1778,422 @@ heapam_index_build_range_scan(Relation heapRelation,
 	return reltuples;
 }
 
+/*
+ * Calculate set difference (relative complement) of main and aux
+ * sets.
+ *
+ * All records which are present in auxiliary tuplesort but not in
+ * main are added to the store.
+ *
+ * In set theory notation store = aux - main or store = aux / main.
+ *
+ * returns number of items added to store
+ */
+static int64
+heapam_index_validate_tuplesort_difference(Tuplesortstate *main,
+										   Tuplesortstate *aux,
+										   Tuplestorestate *store)
+{
+	int64		num = 0;
+	/* state variables for the merge */
+	ItemPointer	indexcursor = NULL,
+					auxindexcursor = NULL;
+	ItemPointerData decoded,
+					auxdecoded;
+	bool			tuplesort_empty = false,
+					auxtuplesort_empty = false;
+
+	/* Initialize pointers. */
+	ItemPointerSetInvalid(&decoded);
+	ItemPointerSetInvalid(&auxdecoded);
+
+	/*
+	 * Main loop: we step through the auxiliary sort (auxState->tuplesort),
+	 * which holds TIDs that must compared to those from the "main" sort
+	 * (state->tuplesort).
+	 */
+	while (!auxtuplesort_empty)
+	{
+		Datum		ts_val;
+		bool		ts_isnull;
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Attempt to fetch the next TID from the auxiliary sort. If it's
+		 * empty, we set auxindexcursor to NULL.
+		 */
+		auxtuplesort_empty = !tuplesort_getdatum(aux, true,
+												 false, &ts_val, &ts_isnull,
+												 NULL);
+		Assert(auxtuplesort_empty || !ts_isnull);
+		if (!auxtuplesort_empty)
+		{
+			itemptr_decode(&auxdecoded, DatumGetInt64(ts_val));
+			auxindexcursor = &auxdecoded;
+		}
+		else
+		{
+			auxindexcursor = NULL;
+		}
+
+		/*
+		 * If the auxiliary sort is not yet empty, we now try to synchronize
+		 * the "main" sort cursor (indexcursor) with auxindexcursor. We advance
+		 * the main sort cursor until we've reached or passed the auxiliary TID.
+		 */
+		if (!auxtuplesort_empty)
+		{
+			/*
+			 * Move the main sort forward while:
+			 *   (1) It's not exhausted (tuplesort_empty == false), and
+			 *   (2) Either indexcursor is NULL (first iteration) or
+			 *       indexcursor < auxindexcursor in TID order.
+			 */
+			while (!tuplesort_empty && (indexcursor == NULL || /* null on first time here */
+						ItemPointerCompare(indexcursor, auxindexcursor) < 0))
+			{
+				/*
+				 * Get the next TID from the main sort. If it's empty,
+				 * we set indexcursor to NULL.
+				 */
+				tuplesort_empty = !tuplesort_getdatum(main, true,
+													  false, &ts_val, &ts_isnull,
+													  NULL);
+				Assert(tuplesort_empty || !ts_isnull);
+
+				if (!tuplesort_empty)
+				{
+					itemptr_decode(&decoded, DatumGetInt64(ts_val));
+					indexcursor = &decoded;
+				}
+				else
+				{
+					indexcursor = NULL;
+				}
+
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * Now, if either:
+			 *  - the main sort is empty, or
+			 *  - indexcursor > auxindexcursor,
+			 *
+			 * then auxindexcursor identifies a TID that doesn't appear in
+			 * the main sort. We likely need to insert it
+			 * into the target index if it’s visible in the heap.
+			 */
+			if (tuplesort_empty || ItemPointerCompare(indexcursor, auxindexcursor) > 0)
+			{
+				tuplestore_putdatum(store, Int64GetDatum(itemptr_encode(auxindexcursor)));
+				num++;
+			}
+		}
+	}
+
+	return num;
+}
+
+typedef struct ValidateIndexScanState
+{
+	Tuplestorestate		*store;
+	BlockNumber			prev_block_number;
+	OffsetNumber		prev_offset_number;
+} ValidateIndexScanState;
+
+/*
+ * This is ReadStreamBlockNumberCB implementation which works as follows:
+ *
+ * 1) It iterates over a sorted tuplestore, where each element is an encoded
+ *    ItemPointer
+ *
+ * 2) It returns the current BlockNumber and collects all OffsetNumbers
+ *    for that block in per_buffer_data.
+ *
+ * 3) Once the code encounters a new BlockNumber, it stops reading more
+ *    offsets and saves the OffsetNumber of the new block for the next call.
+ *
+ * 4) The list of offsets for a block is always terminated with InvalidOffsetNumber.
+ *
+ * This function is intended to be repeatedly called, each time returning
+ * the next block and its corresponding set of offsets.
+ */
+static BlockNumber
+heapam_index_validate_scan_read_stream_next(
+								  ReadStream *stream,
+								  void *void_callback_private_data,
+								  void *void_per_buffer_data
+								  )
+{
+	bool should_free;
+	Datum datum;
+	BlockNumber result = InvalidBlockNumber;
+	int i = 0;
+
+	/*
+	 * Retrieve the specialized callback state and the output buffer.
+	 * callback_private_data keeps track of the previous block and offset
+	 * from a prior invocation, if any.
+	 */
+	ValidateIndexScanState *callback_private_data = void_callback_private_data;
+	OffsetNumber *per_buffer_data = void_per_buffer_data;
+
+	/*
+	 * If there is a "leftover" offset number from the previous invocation,
+	 * it means we had switched to a new block in the middle of the last call.
+	 * We place that leftover offset number into the buffer first.
+	 */
+	if (callback_private_data->prev_offset_number != InvalidOffsetNumber)
+	{
+		Assert(callback_private_data->prev_block_number != InvalidBlockNumber);
+		/*
+		 * 'result' is the block number to return. We set it to the block
+		 * from the previous leftover offset.
+		 */
+		result = callback_private_data->prev_block_number;
+		/* Place leftover offset number in the output buffer. */
+		per_buffer_data[i++] = callback_private_data->prev_offset_number;
+		/*
+		 * Clear the leftover offset number so it won't be reused unless
+		 * we encounter another block change.
+		 */
+		callback_private_data->prev_offset_number = InvalidOffsetNumber;
+	}
+
+	/*
+	 * Read from the tuplestore until we either run out of tuples or we
+	 * encounter a block change. For each tuple:
+	 *
+	 *   1) Decode its block/offset from the Datum.
+	 *   2) If it's the first time in this call (prev_block_number == InvalidBlockNumber),
+	 *      initialize prev_block_number.
+	 *   3) If the block number matches the current block, collect the offset.
+	 *   4) If the block number differs, save that offset as leftover and break
+	 *      so that the next call can handle the new block.
+	 */
+	while (tuplestore_getdatum(callback_private_data->store, true, &should_free, &datum))
+	{
+		BlockNumber next_block_number;
+		ItemPointerData next_data;
+
+		/* Decode the datum into an ItemPointer (block + offset). */
+		itemptr_decode(&next_data, DatumGetInt64(datum));
+		next_block_number = ItemPointerGetBlockNumber(&next_data);
+
+		/*
+		 * If we haven't set a block number yet this round, initialize it
+		 * using the first tuple we read.
+		 */
+		if (callback_private_data->prev_block_number == InvalidBlockNumber)
+			callback_private_data->prev_block_number = next_block_number;
+
+		/*
+		 * Always set the result to be the "current" block number
+		 * we are filling offsets for.
+		 */
+		result = callback_private_data->prev_block_number;
+
+		/*
+		 * If this tuple is from the same block, just store its offset
+		 * in our per_buffer_data array.
+		 */
+		if (next_block_number == callback_private_data->prev_block_number)
+		{
+			per_buffer_data[i++] = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+		}
+		else
+		{
+			/*
+			 * If the block just changed, store the offset of the new block
+			 * as leftover for the next invocation and break out.
+			 */
+			callback_private_data->prev_block_number = next_block_number;
+			callback_private_data->prev_offset_number = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+
+			/* Break to let the next call handle the new block. */
+			break;
+		}
+	}
+
+	/*
+	 * Terminate the list of offsets for this block with an InvalidOffsetNumber.
+	 */
+	per_buffer_data[i] = InvalidOffsetNumber;
+	return result;
+}
+
 static void
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
 						   Snapshot snapshot,
-						   ValidateIndexState *state)
+						   ValidateIndexState *state,
+						   ValidateIndexState *auxState)
 {
-	TableScanDesc scan;
-	HeapScanDesc hscan;
-	HeapTuple	heapTuple;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-	BlockNumber previous_blkno = InvalidBlockNumber;
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
+
+	TupleTableSlot  *slot;
+	EState			*estate;
+	ExprContext		*econtext;
+	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	int64			num_to_check;
+	Tuplestorestate *tuples_for_check;
+	ValidateIndexScanState callback_private_data;
+
+	Buffer buf;
+	OffsetNumber *tuples;
+	ReadStream *read_stream;
+
+	/* Use a percentage of maintenance_work_mem for tuple store. */
+	int		store_work_mem_part = maintenance_work_mem * debug_cic_validate_store_mem_pct / 100;
+
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting
+	 * item pointers.  This can be significantly faster, primarily because TID
+	 * is a pass-by-reference type on all platforms, whereas int8 is
+	 * pass-by-value on most platforms.
+	 */
+	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
 	/*
 	 * sanity checks
 	 */
 	Assert(OidIsValid(indexRelation->rd_rel->relam));
 
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
+	num_to_check = heapam_index_validate_tuplesort_difference(state->tuplesort,
+														 auxState->tuplesort,
+														 tuples_for_check);
+
+	/* It is our responsibility to close tuple sort as fast as we can */
+	tuplesort_end(state->tuplesort);
+	tuplesort_end(auxState->tuplesort);
+
+	state->tuplesort = auxState->tuplesort = NULL;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+									&TTSOpsBufferHeapTuple);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
 
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+	callback_private_data.prev_block_number = InvalidBlockNumber;
+	callback_private_data.store = tuples_for_check;
+	callback_private_data.prev_offset_number = InvalidOffsetNumber;
 
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	scan = table_beginscan_strat(heapRelation,	/* relation */
-								 snapshot,	/* snapshot */
-								 0, /* number of keys */
-								 NULL,	/* scan key */
-								 true,	/* buffer access strategy OK */
-								 false);	/* syncscan not OK */
-	hscan = (HeapScanDesc) scan;
+	read_stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING,
+														 bstrategy,
+														 heapRelation, MAIN_FORKNUM,
+														 heapam_index_validate_scan_read_stream_next,
+														 &callback_private_data,
+														 (MaxHeapTuplesPerPage + 1) * sizeof(OffsetNumber));
 
-	pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_TOTAL,
-								 hscan->rs_nblocks);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_TOTAL, num_to_check);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE, 0);
 
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while ((buf = read_stream_next_buffer(read_stream, (void **) &tuples)) != InvalidBuffer)
 	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
+		HeapTupleData	heap_tuple_data[MaxHeapTuplesPerPage];
+		int i;
+		OffsetNumber off;
+		BlockNumber block_number;
 
 		CHECK_FOR_INTERRUPTS();
 
-		state->htups += 1;
+		LockBuffer(buf, BUFFER_LOCK_SHARE);
+		block_number = BufferGetBlockNumber(buf);
 
-		if ((previous_blkno == InvalidBlockNumber) ||
-			(hscan->rs_cblock != previous_blkno))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
-										 hscan->rs_cblock);
-			previous_blkno = hscan->rs_cblock;
+			ItemPointerData tid;
+			bool		all_dead, found;
+			ItemPointerSet(&tid, block_number, off);
+
+			found = heap_hot_search_buffer(&tid, heapRelation, buf, snapshot,
+										   &heap_tuple_data[i], &all_dead, true);
+			if (!found)
+				ItemPointerSetInvalid(&heap_tuple_data[i].t_self);
+			i++;
+			state->htups += 1;
 		}
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		/*
-		 * As commented in table_index_build_scan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
+		 * It is safe to access tuple data after releasing the buffer lock
+		 * because the buffer pin is still held, and the only operation that
+		 * could physically move tuple data on the page is
+		 * PageRepairFragmentation via heap_page_prune.  VACUUM conflicts with
+		 * CIC (both take ShareUpdateExclusiveLock), and opportunistic pruning
+		 * from concurrent DML cannot affect root tuples we are referencing.
 		 */
-		if (hscan->rs_cblock != root_blkno)
-		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
 		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
+		 * No predicate evaluation is needed here: the auxiliary STIR index
+		 * only contains TIDs for tuples that already satisfied the partial
+		 * index predicate at DML time (checked in ExecInsertIndexTuples).
 		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
+			if (ItemPointerIsValid(&heap_tuple_data[i].t_self))
 			{
+				ItemPointerData root_tid;
+				ItemPointerSet(&root_tid, block_number, off);
+
+				/* Reset the per-tuple memory context for the next fetch. */
+				MemoryContextReset(econtext->ecxt_per_tuple_memory);
+				ExecStoreBufferHeapTuple(&heap_tuple_data[i], slot, buf);
+
+				/* Compute the key values and null flags for this tuple. */
+				FormIndexDatum(indexInfo,
+							   slot,
+							   estate,
+							   values,
+							   isnull);
+
 				/*
-				 * Remember index items seen earlier on the current heap page
+				 * Insert the tuple into the target index.
 				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+				index_insert(indexRelation,
+							 values,
+							 isnull,
+							 &root_tid, /* insert root tuple */
+							 heapRelation,
+							 indexInfo->ii_Unique ?
+							 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+							 false,
+							 indexInfo);
+
+				state->tups_inserted += 1;
 			}
 
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  false, &ts_val, &ts_isnull,
-												  NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_TUPLES_DONE, 1);
+			i++;
 		}
 
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 false,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
+		ReleaseBuffer(buf);
 	}
 
-	table_endscan(scan);
-
 	ExecDropSingleTupleTableSlot(slot);
 
 	FreeExecutorState(estate);
 
+	read_stream_end(read_stream);
+	tuplestore_end(tuples_for_check);
+
+	FreeAccessStrategy(bstrategy);
+
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0ceeda1fdd9..f6d0ac3f784 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -715,11 +715,16 @@ UpdateIndexRelation(Oid indexoid,
  *			already exists.
  *		INDEX_CREATE_PARTITIONED:
  *			create a partitioned index (table must be partitioned)
+ *		INDEX_CREATE_AUXILIARY:
+ *			mark index as auxiliary index
  * constr_flags: flags passed to index_constraint_create
  *		(only if INDEX_CREATE_ADD_CONSTRAINT is set)
  * allow_system_table_mods: allow table to be a system catalog
  * is_internal: if true, post creation hook for new index
  * constraintId: if not NULL, receives OID of created constraint
+ * relpersistence: persistence level to use for index. In most of the
+ *		cases it should be equal to the persistence level of the table,
+ *		auxiliary indexes are only exception here.
  *
  * Returns the OID of the created index.
  */
@@ -760,6 +765,7 @@ index_create(Relation heapRelation,
 	bool		invalid = (flags & INDEX_CREATE_INVALID) != 0;
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
+	bool		auxiliary = (flags & INDEX_CREATE_AUXILIARY) != 0;
 	char		relkind;
 	TransactionId relfrozenxid;
 	MultiXactId relminmxid;
@@ -785,7 +791,10 @@ index_create(Relation heapRelation,
 	namespaceId = RelationGetNamespace(heapRelation);
 	shared_relation = heapRelation->rd_rel->relisshared;
 	mapped_relation = RelationIsMapped(heapRelation);
-	relpersistence = heapRelation->rd_rel->relpersistence;
+	if (auxiliary)
+		relpersistence = RELPERSISTENCE_UNLOGGED; /* aux indexes are always unlogged */
+	else
+		relpersistence = heapRelation->rd_rel->relpersistence;
 
 	/*
 	 * check parameters
@@ -793,6 +802,11 @@ index_create(Relation heapRelation,
 	if (indexInfo->ii_NumIndexAttrs < 1)
 		elog(ERROR, "must index at least one column");
 
+	if (indexInfo->ii_Am == STIR_AM_OID && !auxiliary)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("user-defined indexes with STIR access method are not supported")));
+
 	if (!allow_system_table_mods &&
 		IsSystemRelation(heapRelation) &&
 		IsNormalProcessingMode())
@@ -1398,20 +1412,24 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							false,	/* not ready for inserts */
 							true,
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							false);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
 	 * index information.  All this information will be used for the index
 	 * creation.
 	 */
-	for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
 	{
 		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
-		Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
 
-		indexColNames = lappend(indexColNames, NameStr(att->attname));
-		newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+		{
+			Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+			indexColNames = lappend(indexColNames, NameStr(att->attname));
+			newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		}
 	}
 
 	/* Extract opclass options for each attribute */
@@ -1473,6 +1491,157 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	return newIndexId;
 }
 
+/*
+ * index_concurrently_create_aux
+ *
+ * Create concurrently an auxiliary index based on the definition of the one
+ * provided by caller.  The index is inserted into catalogs and needs to be
+ * built later on. This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ */
+Oid
+index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
+							   Oid tablespaceOid, const char *newName)
+{
+	Relation	indexRelation;
+	IndexInfo  *oldInfo,
+			*newInfo;
+	Oid			newIndexId = InvalidOid;
+	HeapTuple	indexTuple;
+
+	List	   *indexColNames = NIL;
+	List	   *indexExprs = NIL;
+	List	   *indexPreds = NIL;
+
+	Oid *auxOpclassIds;
+	int16 *auxColoptions;
+
+	indexRelation = index_open(mainIndexId, RowExclusiveLock);
+
+	/* The new index needs some information from the old index */
+	oldInfo = BuildIndexInfo(indexRelation);
+
+	/*
+	 * Build of an auxiliary index with exclusion constraints is not
+	 * supported.
+	 */
+	if (oldInfo->ii_ExclusionOps != NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("auxiliary index creation for exclusion constraints is not supported")));
+
+	/* Get the array of class and column options IDs from index info */
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(mainIndexId));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", mainIndexId);
+
+
+	/*
+	 * Fetch the list of expressions and predicates directly from the
+	 * catalogs.  This cannot rely on the information from IndexInfo of the
+	 * old index as these have been flattened for the planner.
+	 */
+	if (oldInfo->ii_Expressions != NIL)
+	{
+		Datum		exprDatum;
+		char	   *exprString;
+
+		exprDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indexprs);
+		exprString = TextDatumGetCString(exprDatum);
+		indexExprs = (List *) stringToNode(exprString);
+		pfree(exprString);
+	}
+	if (oldInfo->ii_Predicate != NIL)
+	{
+		Datum		predDatum;
+		char	   *predString;
+
+		predDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indpred);
+		predString = TextDatumGetCString(predDatum);
+		indexPreds = (List *) stringToNode(predString);
+
+		/* Also convert to implicit-AND format */
+		indexPreds = make_ands_implicit((Expr *) indexPreds);
+		pfree(predString);
+	}
+
+	/*
+	 * Build the index information for the new index.  Note that rebuild of
+	 * indexes with exclusion constraints is not supported, hence there is no
+	 * need to fill all the ii_Exclusion* fields.
+	 */
+	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
+							oldInfo->ii_NumIndexKeyAttrs,
+							STIR_AM_OID, /* special AM for aux indexes */
+							indexExprs,
+							indexPreds,
+							false,	/* aux index are not unique */
+							oldInfo->ii_NullsNotDistinct,
+							false,	/* not ready for inserts */
+							true,
+							false,	/* aux are not summarizing */
+							false,	/* aux are not without overlaps */
+							true	/* auxiliary */);
+
+	/*
+	 * Extract the list of column names and the column numbers for the new
+	 * index information.  All this information will be used for the index
+	 * creation.
+	 */
+	{
+		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
+
+		for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+		{
+			Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+			indexColNames = lappend(indexColNames, NameStr(att->attname));
+			newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		}
+	}
+
+	auxOpclassIds = palloc0(sizeof(Oid) * newInfo->ii_NumIndexAttrs);
+	auxColoptions = palloc0(sizeof(int16) * newInfo->ii_NumIndexAttrs);
+
+	/* Fill with "any ops" */
+	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
+	{
+		auxOpclassIds[i] = ANY_STIR_OPS_OID;
+		auxColoptions[i] = 0;
+	}
+
+	newIndexId = index_create(heapRelation,
+							  newName,
+							  InvalidOid,    /* indexRelationId */
+							  InvalidOid,    /* parentIndexRelid */
+							  InvalidOid,    /* parentConstraintId */
+							  InvalidRelFileNumber, /* relFileNumber */
+							  newInfo,
+							  indexColNames,
+							  STIR_AM_OID,
+							  tablespaceOid,
+							  indexRelation->rd_indcollation,
+							  auxOpclassIds,
+							  NULL,
+							  auxColoptions,
+							  NULL,
+							  (Datum) 0,
+							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT | INDEX_CREATE_AUXILIARY,
+							  0,
+							  true, /* allow table to be a system catalog? */
+							  false,    /* is_internal? */
+							  NULL);
+
+	/* Close the relations used and clean up */
+	index_close(indexRelation, NoLock);
+	ReleaseSysCache(indexTuple);
+
+	return newIndexId;
+}
+
 /*
  * index_concurrently_build
  *
@@ -2453,7 +2622,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2513,7 +2683,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3289,12 +3460,21 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * We do a concurrent index build by first inserting the catalog entry for the
  * index via index_create(), marking it not indisready and not indisvalid.
+ * Then we create special auxiliary index the same way. It based on STIR AM.
  * Then we commit our transaction and start a new one, then we wait for all
  * transactions that could have been modifying the table to terminate.  Now
- * we know that any subsequently-started transactions will see the index and
+ * we know that any subsequently-started transactions will see indexes and
  * honor its constraints on HOT updates; so while existing HOT-chains might
  * be broken with respect to the index, no currently live tuple will have an
- * incompatible HOT update done to it.  We now build the index normally via
+ * incompatible HOT update done to it.
+ *
+ * After that, we build the auxiliary index. It is fast operation without any actual
+ * table scan. As result, we have empty STIR index. We commit transaction and
+ * again wait for all transactions that could have been modifying the table
+ * to terminate. At that moment all new tuples are going to be inserted into
+ * auxiliary index.
+ *
+ * We now build the index normally via
  * index_build(), while holding a weak lock that allows concurrent
  * insert/update/delete.  Also, we index only tuples that are valid
  * as of the start of the scan (see table_index_build_scan), whereas a normal
@@ -3304,14 +3484,17 @@ IndexCheckExclusion(Relation heapRelation,
  * bogus unique-index failures due to concurrent UPDATEs (we might see
  * different versions of the same row as being valid when we pass over them,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
- * does not contain any tuples added to the table while we built the index.
+ * does not contain any tuples added to the table while we built the index
+ * (but these tuples contained in auxiliary index).
  *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
- * commit the second transaction and start a third.  Again we wait for all
+ * commit the third transaction and start a fourth.  Again we wait for all
  * transactions that could have been modifying the table to terminate.  Now
  * we know that any subsequently-started transactions will see the index and
- * insert their new tuples into it.  We then take a new reference snapshot
- * which is passed to validate_index().  Any tuples that are valid according
+ * insert their new tuples into it. At the same moment we clear "indisready" for
+ * auxiliary index, since it is no more required to be updated.
+ *
+ * We then take a new reference snapshot, any tuples that are valid according
  * to this snap, but are not in the index, must be added to the index.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
@@ -3319,12 +3502,14 @@ IndexCheckExclusion(Relation heapRelation,
  * that might care about them before we mark the index valid.)
  *
  * validate_index() works by first gathering all the TIDs currently in the
- * index, using a bulkdelete callback that just stores the TIDs and doesn't
+ * indexes, using a bulkdelete callback that just stores the TIDs and doesn't
  * ever say "delete it".  (This should be faster than a plain indexscan;
  * also, not all index AMs support full-index indexscan.)  Then we sort the
- * TIDs, and finally scan the table doing a "merge join" against the TID list
- * to see which tuples are missing from the index.  Thus we will ensure that
- * all tuples valid according to the reference snapshot are in the index.
+ * TIDs of both auxiliary and target indexes, and doing a "merge join" against
+ * the TID lists to see which tuples from auxiliary index are missing from the
+ * target index.  Thus we will ensure that all tuples valid according to the
+ * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
  * tuple that is already dead or is in process of being deleted, and we
@@ -3342,22 +3527,26 @@ IndexCheckExclusion(Relation heapRelation,
  * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
  * transactions will be able to use it for queries.
  *
- * Doing two full table scans is a brute-force strategy.  We could try to be
- * cleverer, eg storing new tuples in a special area of the table (perhaps
- * making the table append-only by setting use_fsm).  However that would
- * add yet more locking issues.
+ * Also, some actions to concurrent drop the auxiliary index are performed.
  */
 void
-validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 {
 	Relation	heapRelation,
-				indexRelation;
+				indexRelation,
+				auxIndexRelation;
 	IndexInfo  *indexInfo;
-	IndexVacuumInfo ivinfo;
-	ValidateIndexState state;
+	IndexVacuumInfo ivinfo, auxivinfo;
+	ValidateIndexState state, auxState;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	/* Use 80% of maintenance_work_mem to target index sorting and
+	 * 10% rest for auxiliary.
+	 *
+	 * Rest 10% will be used for tuplestore later. */
+	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
+	int			aux_work_mem_part = maintenance_work_mem / 10;
 
 	{
 		const int	progress_index[] = {
@@ -3390,6 +3579,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	RestrictSearchPath();
 
 	indexRelation = index_open(indexId, RowExclusiveLock);
+	auxIndexRelation = index_open(auxIndexId, RowExclusiveLock);
 
 	/*
 	 * Fetch info needed for index_insert.  (You might think this should be
@@ -3414,15 +3604,49 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.strategy = NULL;
 	ivinfo.validate_index = true;
 
+	/*
+	 * Copy all info to auxiliary info, changing only relation.
+	 */
+	auxivinfo = ivinfo;
+	auxivinfo.index = auxIndexRelation;
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
 	 * is a pass-by-reference type on all platforms, whereas int8 is
 	 * pass-by-value on most platforms.
 	 */
+	auxState.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
+										   InvalidOid, false,
+										   aux_work_mem_part,
+										   NULL, TUPLESORT_NONE);
+	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
+
+	(void) index_bulk_delete(&auxivinfo, NULL,
+							 validate_index_callback, &auxState);
+	/* If aux index is empty, merge may be skipped */
+	if (auxState.itups == 0)
+	{
+		tuplesort_end(auxState.tuplesort);
+		auxState.tuplesort = NULL;
+
+		/* Roll back any GUC changes executed by index functions */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+
+		/* Close rels, but keep locks */
+		index_close(auxIndexRelation, NoLock);
+		index_close(indexRelation, NoLock);
+		table_close(heapRelation, NoLock);
+
+		return;
+	}
+
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
 											InvalidOid, false,
-											maintenance_work_mem,
+											(int) main_work_mem_part,
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
@@ -3445,27 +3669,30 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	tuplesort_performsort(auxState.tuplesort);
 
 	/*
-	 * Now scan the heap and "merge" it with the index
+	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN);
+								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
 	table_index_validate_scan(heapRelation,
 							  indexRelation,
 							  indexInfo,
 							  snapshot,
-							  &state);
+							  &state,
+							  &auxState);
 
-	/* Done with tuplesort object */
-	tuplesort_end(state.tuplesort);
+	/* Tuple sort closed by table_index_validate_scan */
+	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
 
 	/* Make sure to release resources cached in indexInfo (if needed). */
 	index_insert_cleanup(indexRelation, indexInfo);
 
 	elog(DEBUG2,
-		 "validate_index found %.0f heap tuples, %.0f index tuples; inserted %.0f missing tuples",
-		 state.htups, state.itups, state.tups_inserted);
+		 "validate_index fetched %.0f heap tuples, %.0f index tuples;"
+						" %.0f aux index tuples; inserted %.0f missing tuples",
+		 state.htups, state.itups, auxState.itups, state.tups_inserted);
 
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
@@ -3474,6 +3701,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
 	/* Close rels, but keep locks */
+	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
 }
@@ -3534,6 +3762,12 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			Assert(!indexForm->indisvalid);
 			indexForm->indisvalid = true;
 			break;
+		case INDEX_DROP_CLEAR_READY:
+			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
+			Assert(indexForm->indisready);
+			Assert(!indexForm->indisvalid);
+			indexForm->indisready = false;
+			break;
 		case INDEX_DROP_CLEAR_VALID:
 
 			/*
@@ -3805,6 +4039,13 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		indexInfo->ii_ExclusionStrats = NULL;
 	}
 
+	/* Auxiliary indexes are not allowed to be rebuilt */
+	if (indexInfo->ii_Auxiliary)
+		ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("reindex of auxiliary index \"%s\" not supported",
+					RelationGetRelationName(iRel))));
+
 	/* Suppress use of the target index while rebuilding it */
 	SetReindexProcessing(heapId, indexId);
 
@@ -4047,6 +4288,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
+		Oid			indexAm = get_rel_relam(indexOid);
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4072,6 +4314,18 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
+		if (indexAm == STIR_AM_OID)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+							get_namespace_name(indexNamespaceId),
+							get_rel_name(indexOid))));
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			continue;
+		}
+
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e54018004db..08634c43ea6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1388,16 +1388,17 @@ CREATE VIEW pg_stat_progress_create_index AS
                       END AS command,
         CASE S.param10 WHEN 0 THEN 'initializing'
                        WHEN 1 THEN 'waiting for writers before build'
-                       WHEN 2 THEN 'building index' ||
+                       WHEN 2 THEN 'waiting for writers to use auxiliary index'
+                       WHEN 3 THEN 'building index' ||
                            COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)),
                                     '')
-                       WHEN 3 THEN 'waiting for writers before validation'
-                       WHEN 4 THEN 'index validation: scanning index'
-                       WHEN 5 THEN 'index validation: sorting tuples'
-                       WHEN 6 THEN 'index validation: scanning table'
-                       WHEN 7 THEN 'waiting for old snapshots'
-                       WHEN 8 THEN 'waiting for readers before marking dead'
-                       WHEN 9 THEN 'waiting for readers before dropping'
+                       WHEN 4 THEN 'waiting for writers before validation'
+                       WHEN 5 THEN 'index validation: scanning index'
+                       WHEN 6 THEN 'index validation: sorting tuples'
+                       WHEN 7 THEN 'index validation: merging indexes'
+                       WHEN 8 THEN 'waiting for old snapshots'
+                       WHEN 9 THEN 'waiting for readers before marking dead'
+                       WHEN 10 THEN 'waiting for readers before dropping'
                        END as phase,
         S.param4 AS lockers_total,
         S.param5 AS lockers_done,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 373e8234794..e06353f3fde 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -183,6 +183,7 @@ CheckIndexCompatible(Oid oldId,
 					 bool isWithoutOverlaps)
 {
 	bool		isconstraint;
+	bool		isauxiliary;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
 	Oid		   *opclassIds;
@@ -233,6 +234,7 @@ CheckIndexCompatible(Oid oldId,
 
 	amcanorder = amRoutine->amcanorder;
 	amsummarizing = amRoutine->amsummarizing;
+	isauxiliary = accessMethodId == STIR_AM_OID;
 
 	/*
 	 * Compute the operator classes, collations, and exclusion operators for
@@ -244,7 +246,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing,
+							  isWithoutOverlaps, isauxiliary);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -557,6 +560,7 @@ DefineIndex(ParseState *pstate,
 {
 	bool		concurrent;
 	char	   *indexRelationName;
+	char	   *auxIndexRelationName = NULL;
 	char	   *accessMethodName;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
@@ -566,6 +570,7 @@ DefineIndex(ParseState *pstate,
 	Oid			namespaceId;
 	Oid			tablespaceId;
 	Oid			createdConstraintId = InvalidOid;
+	Oid			auxIndexRelationId = InvalidOid;
 	List	   *indexColNames;
 	List	   *allIndexParams;
 	Relation	rel;
@@ -587,6 +592,7 @@ DefineIndex(ParseState *pstate,
 	int			numberOfKeyAttributes;
 	TransactionId limitXmin;
 	ObjectAddress address;
+	ObjectAddress auxAddress;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
@@ -837,6 +843,15 @@ DefineIndex(ParseState *pstate,
 											stmt->excludeOpNames,
 											stmt->primary,
 											stmt->isconstraint);
+	/*
+	 * Select name for auxiliary index
+	 */
+	if (concurrent)
+		auxIndexRelationName = ChooseRelationName(indexRelationName,
+												  NULL,
+												  "ccaux",
+												  namespaceId,
+												  false);
 
 	/*
 	 * look up the access method, verify it can handle the requested features
@@ -931,7 +946,8 @@ DefineIndex(ParseState *pstate,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  false);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -1603,6 +1619,16 @@ DefineIndex(ParseState *pstate,
 		return address;
 	}
 
+	/*
+	 * In case of concurrent build - create auxiliary index record.
+	 */
+	if (concurrent)
+	{
+		auxIndexRelationId = index_concurrently_create_aux(rel, indexRelationId,
+											tablespaceId, auxIndexRelationName);
+		ObjectAddressSet(auxAddress, RelationRelationId, auxIndexRelationId);
+	}
+
 	AtEOXact_GUC(false, root_save_nestlevel);
 	SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 
@@ -1631,11 +1657,11 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * For a concurrent build, it's important to make the catalog entries
 	 * visible to other transactions before we start to build the index. That
-	 * will prevent them from making incompatible HOT updates.  The new index
-	 * will be marked not indisready and not indisvalid, so that no one else
-	 * tries to either insert into it or use it for queries.
+	 * will prevent them from making incompatible HOT updates. New indexes
+	 * (main and auxiliary) will be marked not indisready and not indisvalid,
+	 * so that no one else tries to either insert into it or use it for queries.
 	 *
-	 * We must commit our current transaction so that the index becomes
+	 * We must commit our current transaction so that the indexes becomes
 	 * visible; then start another.  Note that all the data structures we just
 	 * built are lost in the commit.  The only data we keep past here are the
 	 * relation IDs.
@@ -1645,7 +1671,7 @@ DefineIndex(ParseState *pstate,
 	 * cannot block, even if someone else is waiting for access, because we
 	 * already have the same lock within our transaction.
 	 *
-	 * Note: we don't currently bother with a session lock on the index,
+	 * Note: we don't currently bother with a session lock on the indexes,
 	 * because there are no operations that could change its state while we
 	 * hold lock on the parent table.  This might need to change later.
 	 */
@@ -1684,7 +1710,7 @@ DefineIndex(ParseState *pstate,
 	 * with the old list of indexes.  Use ShareLock to consider running
 	 * transactions that hold locks that permit writing to the table.  Note we
 	 * do not need to worry about xacts that open the table for writing after
-	 * this point; they will see the new index when they open it.
+	 * this point; they will see the new indexes when they open it.
 	 *
 	 * Note: the reason we use actual lock acquisition here, rather than just
 	 * checking the ProcArray and sleeping, is that deadlock is possible if
@@ -1696,14 +1722,44 @@ DefineIndex(ParseState *pstate,
 
 	/*
 	 * At this moment we are sure that there are no transactions with the
-	 * table open for write that don't have this new index in their list of
+	 * table open for write that don't have this new indexes in their list of
 	 * indexes.  We have waited out all the existing transactions and any new
-	 * transaction will have the new index in its list, but the index is still
-	 * marked as "not-ready-for-inserts".  The index is consulted while
+	 * transaction will have both new indexes in its list, but indexes are still
+	 * marked as "not-ready-for-inserts". The indexes are consulted while
 	 * deciding HOT-safety though.  This arrangement ensures that no new HOT
 	 * chains can be created where the new tuple and the old tuple in the
 	 * chain have different index keys.
 	 *
+	 * Now call build on auxiliary index. Index will be created empty without
+	 * any actual heap scan, but marked as "ready-for-inserts". The goal of
+	 * that index is accumulate new tuples while main index is actually built.
+	 */
+
+	/* Set ActiveSnapshot since functions in the indexes may need it */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	index_concurrently_build(tableId, auxIndexRelationId);
+	/* we can do away with our snapshot */
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Now we need to ensure there are no transactions with the auxiliary index
+	 * marked as "not-ready-for-inserts".
+	 */
+	WaitForLockers(heaplocktag, ShareLock, true);
+
+	/*
+	 * At this moment we are sure that all new tuples in table are inserted into
+	 * the auxiliary index. Now it is time to build the target index itself.
+	 *
 	 * We now take a new snapshot, and build the index using all tuples that
 	 * are visible in this snapshot.  We can be sure that any HOT updates to
 	 * these tuples will be compatible with the index, since any updates made
@@ -1738,9 +1794,28 @@ DefineIndex(ParseState *pstate,
 	 * the index marked as read-only for updates.
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForLockers(heaplocktag, ShareLock, true);
 
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/*
+	 * Now target index is marked as "ready" for all transactions. So, auxiliary
+	 * index is no longer needed. So, start removing process by reverting "ready"
+	 * flag.
+	 */
+	index_set_state_flags(auxIndexRelationId, INDEX_DROP_CLEAR_READY);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * Now take the "reference snapshot" that will be used by validate_index()
 	 * to filter candidate tuples.  Beware!  There might still be snapshots in
@@ -1758,24 +1833,14 @@ DefineIndex(ParseState *pstate,
 	 */
 	snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	PushActiveSnapshot(snapshot);
-
 	/*
-	 * Scan the index and the heap, insert any missing index entries.
-	 */
-	validate_index(tableId, indexRelationId, snapshot);
-
-	/*
-	 * Drop the reference snapshot.  We must do this before waiting out other
-	 * snapshot holders, else we will deadlock against other processes also
-	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
-	 * they must wait for.  But first, save the snapshot's xmin to use as
-	 * limitXmin for GetCurrentVirtualXIDs().
+	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
+	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
 	limitXmin = snapshot->xmin;
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
-
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1802,7 +1867,7 @@ DefineIndex(ParseState *pstate,
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForOlderSnapshots(limitXmin, true);
 
 	/*
@@ -1827,6 +1892,53 @@ DefineIndex(ParseState *pstate,
 	 * to replan; so relcache flush on the index itself was sufficient.)
 	 */
 	CacheInvalidateRelcacheByRelid(heaprelid.relId);
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+	/* Now wait for all transaction to see auxiliary as "non-ready for inserts" */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/* Now it is time to mark auxiliary index as dead */
+	index_concurrently_set_dead(tableId, auxIndexRelationId);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_6);
+	/* Now wait for all transaction to ignore auxiliary because it is dead */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Drop auxiliary index.
+	 *
+	 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
+	 * right lock level.
+	 */
+	performDeletion(&auxAddress, DROP_RESTRICT,
+							 PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
 
 	/*
 	 * Last thing to do is release the session-level lock on the parent table.
@@ -3598,6 +3710,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	typedef struct ReindexIndexInfo
 	{
 		Oid			indexId;
+		Oid			auxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -3703,8 +3816,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
 														   ShareUpdateExclusiveLock);
+					IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-					if (!indexRelation->rd_index->indisvalid)
+
+					if (indexInfo->ii_Auxiliary)
+						ereport(WARNING,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+									get_namespace_name(get_rel_namespace(cellOid)),
+									get_rel_name(cellOid))));
+					else if (!indexRelation->rd_index->indisvalid)
 						ereport(WARNING,
 								(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 								 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3756,8 +3876,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
 															   ShareUpdateExclusiveLock);
+						IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-						if (!indexRelation->rd_index->indisvalid)
+						if (indexInfo->ii_Auxiliary)
+							ereport(WARNING,
+									(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+									 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+											get_namespace_name(get_rel_namespace(cellOid)),
+											get_rel_name(cellOid))));
+						else if (!indexRelation->rd_index->indisvalid)
 							ereport(WARNING,
 									(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 									 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3818,6 +3945,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot reindex invalid index on TOAST table")));
 
+				/* Auxiliary indexes are not allowed to be rebuilt */
+				if (get_rel_relam(relationOid) == STIR_AM_OID)
+					ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("reindex of auxiliary index \"%s\" not supported",
+								get_rel_name(relationOid))));
+
 				/*
 				 * Check if parent relation can be locked and if it exists,
 				 * this needs to be done at this stage as the list of indexes
@@ -3921,15 +4055,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
+		char	   *auxConcurrentName;
 		ReindexIndexInfo *idx = lfirst(lc);
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
+		Oid			auxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
 		int			save_sec_context;
 		int			save_nestlevel;
 		Relation	newIndexRel;
+		Relation	auxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -3980,6 +4117,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 											"ccnew",
 											get_rel_namespace(indexRel->rd_index->indrelid),
 											false);
+		auxConcurrentName = ChooseRelationName(get_rel_name(idx->indexId),
+											NULL,
+											"ccaux",
+											get_rel_namespace(indexRel->rd_index->indrelid),
+											false);
 
 		/* Choose the new tablespace, indexes of toast tables are not moved */
 		if (OidIsValid(params->tablespaceOid) &&
@@ -3993,12 +4135,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 													idx->indexId,
 													tablespaceid,
 													concurrentName);
+		auxIndexId = index_concurrently_create_aux(heapRel,
+												   newIndexId,
+												   tablespaceid,
+												   auxConcurrentName);
 
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
+		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4007,6 +4154,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
+		newidx->auxIndexId = auxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4025,10 +4173,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		lockrelid = palloc_object(LockRelId);
+		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
+		relationLocks = lappend(relationLocks, lockrelid);
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
 		/* Roll back any GUC changes executed by index functions */
@@ -4109,13 +4261,60 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * doing that, wait until no running transactions could have the table of
 	 * the index open with the old list of indexes.  See "phase 2" in
 	 * DefineIndex() for more details.
+	*/
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * Now build all auxiliary indexes and mark them as "ready-for-inserts".
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		StartTransactionCommand();
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/* Set ActiveSnapshot since functions in the indexes may need it */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Build auxiliary index, it is fast - without any actual heap scan, just an empty index. */
+		index_concurrently_build(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	StartTransactionCommand();
+
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Wait until all auxiliary indexes are taken into account by all
+	 * transactions.
+	 */
 	WaitForLockersMultiple(lockTags, ShareLock, true);
 	CommitTransactionCommand();
 
+	/* Now it is time to perform target index build. */
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4162,6 +4361,41 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * need to set the PROC_IN_SAFE_IC flag here.
 	 */
 
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * At this moment all target indexes are marked as "ready-to-insert". So,
+	 * we are free to start process of dropping auxiliary indexes.
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+		StartTransactionCommand();
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		PopActiveSnapshot();
+
+		CommitTransactionCommand();
+	}
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -4169,12 +4403,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * were created during the previous phase.  See "phase 3" in DefineIndex()
 	 * for more details.
 	 */
-
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
-	WaitForLockersMultiple(lockTags, ShareLock, true);
-	CommitTransactionCommand();
-
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4212,7 +4440,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -4241,7 +4469,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * there's no need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-									 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+									 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 		WaitForOlderSnapshots(limitXmin, true);
 
 		CommitTransactionCommand();
@@ -4332,14 +4560,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
-	 * Mark the old indexes as dead.  First we must wait until no running
-	 * transaction could be using the index for a query.  See also
+	 * Mark the old and auxiliary indexes as dead. First we must wait until no
+	 * running transaction could be using the index for a query.  See also
 	 * index_drop() for more details.
 	 */
 
 	INJECTION_POINT("reindex-relation-concurrently-before-set-dead", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	foreach(lc, indexIds)
@@ -4364,6 +4592,28 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PopActiveSnapshot();
 	}
 
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+	}
+
 	/* Commit this transaction to make the updates visible. */
 	CommitTransactionCommand();
 	StartTransactionCommand();
@@ -4377,11 +4627,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
-	 * Drop the old indexes.
+	 * Drop the old and auxiliary indexes.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_6);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -4401,6 +4651,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			add_exact_object_address(&object, objects);
 		}
 
+		foreach(lc, newIndexIds)
+		{
+			ReindexIndexInfo *idx = lfirst(lc);
+			ObjectAddress object;
+
+			object.classId = RelationRelationId;
+			object.objectId = idx->auxIndexId;
+			object.objectSubId = 0;
+
+			add_exact_object_address(&object, objects);
+		}
+
 		/*
 		 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
 		 * right lock level.
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 5359dab1176..84f7cf9824e 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, bool auxiliary)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -850,6 +850,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Concurrent = concurrent;
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
+	n->ii_Auxiliary = auxiliary;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
@@ -875,7 +876,6 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
-	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 0a862693fcd..a80ee4fb03f 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -631,6 +631,15 @@
   boot_val => 'DEFAULT_ASSERT_ENABLED',
 },
 
+{ name => 'debug_cic_validate_store_mem_pct', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Percentage of maintenance_work_mem used for CIC validation tuplestore.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'debug_cic_validate_store_mem_pct',
+  boot_val => '10',
+  min => '1',
+  max => '90',
+},
+
 { name => 'debug_copy_parse_plan_trees', type => 'bool', context => 'PGC_SUSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Set this to force all parse and plan trees to be passed through copyObject(), to facilitate catching errors and omissions in copyObject().',
   flags => 'GUC_NOT_IN_SAMPLE',
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 57892152957..3705e21b588 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -731,7 +731,8 @@ typedef struct TableAmRoutine
 										Relation index_rel,
 										IndexInfo *index_info,
 										Snapshot snapshot,
-										ValidateIndexState *state);
+										ValidateIndexState *state,
+										ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1882,19 +1883,24 @@ table_index_build_range_scan(Relation table_rel,
  * table_index_validate_scan - second table scan for concurrent index build
  *
  * See validate_index() for an explanation.
+ *
+ * Note: it is responsibility of that function to close sortstates in
+ * both `state` and `auxstate`.
  */
 static inline void
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
 						  Snapshot snapshot,
-						  ValidateIndexState *state)
+						  ValidateIndexState *state,
+						  ValidateIndexState *auxstate)
 {
 	table_rel->rd_tableam->index_validate_scan(table_rel,
 											   index_rel,
 											   index_info,
 											   snapshot,
-											   state);
+											   state,
+											   auxstate);
 }
 
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index a38e95bc0eb..378701b19f1 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -31,6 +31,7 @@ typedef enum
 {
 	INDEX_CREATE_SET_READY,
 	INDEX_CREATE_SET_VALID,
+	INDEX_DROP_CLEAR_READY,
 	INDEX_DROP_CLEAR_VALID,
 	INDEX_DROP_SET_DEAD,
 } IndexStateFlagsAction;
@@ -71,6 +72,7 @@ extern void index_check_primary_key(Relation heapRel,
 #define	INDEX_CREATE_IF_NOT_EXISTS			(1 << 4)
 #define	INDEX_CREATE_PARTITIONED			(1 << 5)
 #define INDEX_CREATE_INVALID				(1 << 6)
+#define INDEX_CREATE_AUXILIARY				(1 << 7)
 
 extern Oid	index_create(Relation heapRelation,
 						 const char *indexRelationName,
@@ -106,6 +108,11 @@ extern Oid	index_concurrently_create_copy(Relation heapRelation,
 										   Oid tablespaceOid,
 										   const char *newName);
 
+extern Oid	index_concurrently_create_aux(Relation heapRelation,
+										  Oid mainIndexId,
+										  Oid tablespaceOid,
+										  const char *newName);
+
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
 
@@ -151,7 +158,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
+extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 9c40772706c..8e5f98c6fad 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -117,14 +117,15 @@
 
 /* Phases of CREATE INDEX (as advertised via PROGRESS_CREATEIDX_PHASE) */
 #define PROGRESS_CREATEIDX_PHASE_WAIT_1			1
-#define PROGRESS_CREATEIDX_PHASE_BUILD			2
-#define PROGRESS_CREATEIDX_PHASE_WAIT_2			3
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	4
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		5
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN	6
-#define PROGRESS_CREATEIDX_PHASE_WAIT_3			7
+#define PROGRESS_CREATEIDX_PHASE_WAIT_2			2
+#define PROGRESS_CREATEIDX_PHASE_BUILD			3
+#define PROGRESS_CREATEIDX_PHASE_WAIT_3			4
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	5
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		6
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE	7
 #define PROGRESS_CREATEIDX_PHASE_WAIT_4			8
 #define PROGRESS_CREATEIDX_PHASE_WAIT_5			9
+#define PROGRESS_CREATEIDX_PHASE_WAIT_6			10
 
 /*
  * Subphases of CREATE INDEX, for index_build.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 04f29748be7..eea3f818a86 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -268,6 +268,7 @@ extern PGDLLIMPORT bool allowSystemTableMods;
 extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
+extern PGDLLIMPORT int debug_cic_validate_store_mem_pct;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
 
 /*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index bf54d39feb0..cd7f1eb0592 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,8 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								bool auxiliary);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 55538c4c41e..d1723f47e89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1437,6 +1437,7 @@ DETAIL:  Key (f1)=(b) already exists.
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
 ERROR:  could not create unique index "concur_index3"
 DETAIL:  Key (f2)=(b) is duplicated.
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -3211,6 +3212,7 @@ INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
 ERROR:  could not create unique index "concur_reindex_ind5"
 DETAIL:  Key (c1)=(1) is duplicated.
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
@@ -3223,8 +3225,10 @@ DETAIL:  Key (c1)=(1) is duplicated.
  c1     | integer |           |          | 
 Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1) INVALID
+    "concur_reindex_ind5_ccaux" stir (c1) INVALID
     "concur_reindex_ind5_ccnew" UNIQUE, btree (c1) INVALID
 
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -3252,6 +3256,44 @@ Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1)
 
 DROP TABLE concur_reindex_tab4;
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+ERROR:  relation "concur_reindex_tab4" does not exist
+LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+                    ^
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
+HINT:  Use DROP INDEX or REINDEX INDEX.
+WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index f50868ca6a6..b34009f868c 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1585,10 +1585,11 @@ select indexrelid::regclass, indisvalid,
 --------------------------------+------------+-----------------------+-------------------------------
  parted_isvalid_idx             | f          | parted_isvalid_tab    | 
  parted_isvalid_idx_11          | f          | parted_isvalid_tab_11 | parted_isvalid_tab_1_expr_idx
+ parted_isvalid_idx_11_ccaux    | f          | parted_isvalid_tab_11 | 
  parted_isvalid_tab_12_expr_idx | t          | parted_isvalid_tab_12 | parted_isvalid_tab_1_expr_idx
  parted_isvalid_tab_1_expr_idx  | f          | parted_isvalid_tab_1  | parted_isvalid_idx
  parted_isvalid_tab_2_expr_idx  | t          | parted_isvalid_tab_2  | parted_isvalid_idx
-(5 rows)
+(6 rows)
 
 drop table parted_isvalid_tab;
 -- Check state of replica indexes when attaching a partition.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2b3cf6d8569..b01fa1e61e3 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2064,14 +2064,15 @@ pg_stat_progress_create_index| SELECT s.pid,
         CASE s.param10
             WHEN 0 THEN 'initializing'::text
             WHEN 1 THEN 'waiting for writers before build'::text
-            WHEN 2 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
-            WHEN 3 THEN 'waiting for writers before validation'::text
-            WHEN 4 THEN 'index validation: scanning index'::text
-            WHEN 5 THEN 'index validation: sorting tuples'::text
-            WHEN 6 THEN 'index validation: scanning table'::text
-            WHEN 7 THEN 'waiting for old snapshots'::text
-            WHEN 8 THEN 'waiting for readers before marking dead'::text
-            WHEN 9 THEN 'waiting for readers before dropping'::text
+            WHEN 2 THEN 'waiting for writers to use auxiliary index'::text
+            WHEN 3 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
+            WHEN 4 THEN 'waiting for writers before validation'::text
+            WHEN 5 THEN 'index validation: scanning index'::text
+            WHEN 6 THEN 'index validation: sorting tuples'::text
+            WHEN 7 THEN 'index validation: merging indexes'::text
+            WHEN 8 THEN 'waiting for old snapshots'::text
+            WHEN 9 THEN 'waiting for readers before marking dead'::text
+            WHEN 10 THEN 'waiting for readers before dropping'::text
             ELSE NULL::text
         END AS phase,
     s.param4 AS lockers_total,
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 82e4062a215..c2c1b031527 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -503,6 +503,7 @@ CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS concur_index2 ON concur_heap(f1);
 INSERT INTO concur_heap VALUES ('b','x');
 -- check if constraint is enforced properly at build time
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -1315,10 +1316,12 @@ CREATE TABLE concur_reindex_tab4 (c1 int);
 INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 -- This trick creates an invalid index.
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -1330,6 +1333,24 @@ REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
 DROP TABLE concur_reindex_tab4;
 
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+\d aux_index_tab5
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+DROP TABLE aux_index_tab5;
+
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
-- 
2.53.0



  [application/x-patch] v33-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch (31.7K, 5-v33-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch)
  download | inline diff:
From f063c0e1813519757a201a361d1b483719ca5a8e Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 14:36:31 +0100
Subject: [PATCH v33 5/7] Track and drop auxiliary indexes in DROP/REINDEX

During concurrent index operations, auxiliary indexes may be left as orphaned objects when errors occur (junk auxiliary indexes).

This patch improves the handling of such auxiliary indexes:
- add auxiliaryForIndexId parameter to index_create() to track dependencies between main and auxiliary indexes
- automatically drop auxiliary indexes when the main index is dropped
- delete junk auxiliary indexes properly during REINDEX operations
---
 doc/src/sgml/ref/create_index.sgml         |  14 ++-
 doc/src/sgml/ref/reindex.sgml              |   8 +-
 src/backend/catalog/dependency.c           |   2 +-
 src/backend/catalog/index.c                |  78 +++++++++++----
 src/backend/catalog/pg_depend.c            |  62 ++++++++++++
 src/backend/catalog/toasting.c             |   1 +
 src/backend/commands/indexcmds.c           |  37 +++++++-
 src/backend/commands/tablecmds.c           |  52 +++++++++-
 src/backend/nodes/makefuncs.c              |   3 +-
 src/include/catalog/dependency.h           |   1 +
 src/include/nodes/execnodes.h              |   2 +
 src/include/nodes/makefuncs.h              |   2 +-
 src/test/regress/expected/create_index.out | 105 +++++++++++++++++++--
 src/test/regress/sql/create_index.sql      |  57 ++++++++++-
 14 files changed, 380 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 12c88587a79..406c02e866e 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -668,10 +668,16 @@ Indexes:
     "idx_ccaux" stir (col) INVALID
 </programlisting>
 
-    The recommended recovery
-    method in such cases is to drop these indexes and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
-    to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
+    The recommended recovery method in such cases is to drop the index with
+    <command>DROP INDEX</command>. The auxiliary index (suffixed with
+    <literal>_ccaux</literal>) will be automatically dropped when the main
+    index is dropped. After dropping the indexes, you can try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is to
+    rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>,
+    which will also handle cleanup of any invalid auxiliary indexes.)
+    If the only invalid index is one suffixed <literal>_ccaux</literal>,
+    the recommended recovery method is just <literal>DROP INDEX</literal>
+    for that index.
    </para>
 
    <para>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 9e0248261ae..ac9cfec5c55 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -476,11 +476,15 @@ Indexes:
     <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
     recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
-    then attempt <command>REINDEX CONCURRENTLY</command> again.
+    then attempt <command>REINDEX CONCURRENTLY</command> again. The auxiliary index
+    (suffixed with <literal>_ccaux</literal>) will be automatically dropped
+    along with its main index.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
     the recommended recovery method is to just drop said index, since the
-    rebuild proper has been successful.
+    rebuild proper has been successful. If the only
+    invalid index is one suffixed <literal>_ccaux</literal>, the recommended
+    recovery method is just <literal>DROP INDEX</literal> for that index.
     A nonzero number may be appended to the suffix of the invalid index
     names to keep them unique, like <literal>_ccnew1</literal>,
     <literal>_ccold2</literal>, etc.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fdb8e67e1f5..c6941fb19d1 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -292,7 +292,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.  (Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object, 0);
+	AcquireDeletionLock(object, flags);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f6d0ac3f784..aaf0b30ff9d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -776,6 +776,8 @@ index_create(Relation heapRelation,
 		   ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
 	/* partitioned indexes must never be "built" by themselves */
 	Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
+	/* ii_AuxiliaryForIndexId and INDEX_CREATE_AUXILIARY are required both or neither */
+	Assert(OidIsValid(indexInfo->ii_AuxiliaryForIndexId) == auxiliary);
 
 	relkind = partitioned ? RELKIND_PARTITIONED_INDEX : RELKIND_INDEX;
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -1181,6 +1183,15 @@ index_create(Relation heapRelation,
 			recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_SEC);
 		}
 
+		/*
+		 * Record dependency on the main index in case of auxiliary index.
+		 */
+		if (OidIsValid(indexInfo->ii_AuxiliaryForIndexId))
+		{
+			ObjectAddressSet(referenced, RelationRelationId, indexInfo->ii_AuxiliaryForIndexId);
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+		}
+
 		/* placeholder for normal dependencies */
 		addrs = new_object_addresses();
 
@@ -1413,7 +1424,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							true,
 							indexRelation->rd_indam->amsummarizing,
 							oldInfo->ii_WithoutOverlaps,
-							false);
+							false,
+							InvalidOid);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -1584,7 +1596,8 @@ index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
 							true,
 							false,	/* aux are not summarizing */
 							false,	/* aux are not without overlaps */
-							true	/* auxiliary */);
+							true	/* auxiliary */,
+							mainIndexId /* auxiliaryForIndexId */);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -2623,7 +2636,8 @@ BuildIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid /* auxiliary_for_index_id is set only during build */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2684,7 +2698,8 @@ BuildDummyIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3763,8 +3778,11 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			indexForm->indisvalid = true;
 			break;
 		case INDEX_DROP_CLEAR_READY:
-			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
-			Assert(indexForm->indisready);
+			/*
+			 * Clear indisready during a CREATE INDEX CONCURRENTLY sequence.
+			 * indisready may already be false if the CIC failed before
+			 * index_concurrently_build had a chance to set it.
+			 */
 			Assert(!indexForm->indisvalid);
 			indexForm->indisready = false;
 			break;
@@ -3849,6 +3867,7 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 				heapRelation;
 	Oid			heapId;
 	Oid			save_userid;
+	Oid			junkAuxIndexId;
 	int			save_sec_context;
 	int			save_nestlevel;
 	IndexInfo  *indexInfo;
@@ -3905,6 +3924,19 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
 	}
 
+	/* Check for the auxiliary index for that index, it needs to be dropped */
+	junkAuxIndexId = get_auxiliary_index(indexId);
+	if (OidIsValid(junkAuxIndexId))
+	{
+		ObjectAddress object;
+		object.classId = RelationRelationId;
+		object.objectId = junkAuxIndexId;
+		object.objectSubId = 0;
+		performDeletion(&object, DROP_RESTRICT,
+								 PERFORM_DELETION_INTERNAL |
+								 PERFORM_DELETION_QUIETLY);
+	}
+
 	/*
 	 * Open the target index relation and get an exclusive lock on it, to
 	 * ensure that no one else is touching this particular index.
@@ -4193,7 +4225,8 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 {
 	Relation	rel;
 	Oid			toast_relid;
-	List	   *indexIds;
+	List	   *indexIds,
+			   *auxIndexIds = NIL;
 	char		persistence;
 	bool		result = false;
 	ListCell   *indexId;
@@ -4282,13 +4315,30 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	else
 		persistence = rel->rd_rel->relpersistence;
 
+	foreach(indexId, indexIds)
+	{
+		Oid			indexOid = lfirst_oid(indexId);
+		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* All STIR indexes are auxiliary indexes */
+		if (indexAm == STIR_AM_OID)
+		{
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			auxIndexIds = lappend_oid(auxIndexIds, indexOid);
+		}
+	}
+
 	/* Reindex all the indexes. */
 	i = 1;
 	foreach(indexId, indexIds)
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
-		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* Auxiliary indexes are going to be dropped during main index rebuild */
+		if (list_member_oid(auxIndexIds, indexOid))
+			continue;
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4314,18 +4364,6 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
-		if (indexAm == STIR_AM_OID)
-		{
-			ereport(WARNING,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
-							get_namespace_name(indexNamespaceId),
-							get_rel_name(indexOid))));
-			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
-				RemoveReindexPending(indexOid);
-			continue;
-		}
-
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 07c2d41c189..deacd2f7c95 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -20,6 +20,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am_d.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_depend.h"
 #include "catalog/pg_extension.h"
@@ -1108,6 +1109,67 @@ get_index_constraint(Oid indexId)
 	return constraintId;
 }
 
+/*
+ * get_auxiliary_index
+ *		Given the OID of an index, return the OID of its auxiliary
+ *		index, or InvalidOid if there is no auxiliary index.
+ */
+Oid
+get_auxiliary_index(Oid indexId)
+{
+	Oid			auxiliaryIndexOid = InvalidOid;
+	Relation	depRel;
+	ScanKeyData key[3];
+	SysScanDesc scan;
+	HeapTuple	tup;
+
+	/* Search the dependency table for the index */
+	depRel = table_open(DependRelationId, AccessShareLock);
+
+	ScanKeyInit(&key[0],
+				Anum_pg_depend_refclassid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationRelationId));
+	ScanKeyInit(&key[1],
+				Anum_pg_depend_refobjid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indexId));
+	ScanKeyInit(&key[2],
+				Anum_pg_depend_refobjsubid,
+				BTEqualStrategyNumber, F_INT4EQ,
+				Int32GetDatum(0));
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+							  NULL, 3, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend deprec = (Form_pg_depend) GETSTRUCT(tup);
+
+		/*
+		 * Look for an AUTO dependency on a STIR index.  There can be at most
+		 * one STIR auxiliary per index, so we stop at the first match.
+		 * Transitive auxiliaries (e.g. ccnew_ccaux from a failed REINDEX
+		 * CONCURRENTLY) are found by calling this with the ccnew OID, and
+		 * are also cleaned up automatically via cascading AUTO dependency
+		 * when the intermediate index is dropped.
+		 */
+		if (deprec->classid == RelationRelationId &&
+			(deprec->deptype == DEPENDENCY_AUTO) &&
+			get_rel_relkind(deprec->objid) == RELKIND_INDEX &&
+			get_rel_relam(deprec->objid) == STIR_AM_OID)
+		{
+			auxiliaryIndexOid = deprec->objid;
+			break;
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(depRel, AccessShareLock);
+
+	return auxiliaryIndexOid;
+}
+
 /*
  * get_index_ref_constraints
  *		Given the OID of an index, return the OID of all foreign key
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index d7ea86b2805..f428dcdf10f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -315,6 +315,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
 	indexInfo->ii_Auxiliary = false;
+	indexInfo->ii_AuxiliaryForIndexId = InvalidOid;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index e06353f3fde..0709e4f986b 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -247,7 +247,7 @@ CheckIndexCompatible(Oid oldId,
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
 							  false, false, amsummarizing,
-							  isWithoutOverlaps, isauxiliary);
+							  isWithoutOverlaps, isauxiliary, InvalidOid);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -947,7 +947,8 @@ DefineIndex(ParseState *pstate,
 							  concurrent,
 							  amissummarizing,
 							  stmt->iswithoutoverlaps,
-							  false);
+							  false,
+							  InvalidOid);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -3711,6 +3712,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		Oid			indexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -4060,6 +4062,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
@@ -4067,6 +4070,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		int			save_nestlevel;
 		Relation	newIndexRel;
 		Relation	auxIndexRel;
+		Relation	junkAuxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -4140,12 +4144,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 												   tablespaceid,
 												   auxConcurrentName);
 
+		/* Search for auxiliary index for reindexed index, to drop it */
+		junkAuxIndexId = get_auxiliary_index(idx->indexId);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
+		if (OidIsValid(junkAuxIndexId))
+			junkAuxIndexRel = index_open(junkAuxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4155,6 +4164,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
 		newidx->auxIndexId = auxIndexId;
+		newidx->junkAuxIndexId = junkAuxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4176,10 +4186,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		if (OidIsValid(junkAuxIndexId))
+		{
+			lockrelid = palloc_object(LockRelId);
+			*lockrelid = junkAuxIndexRel->rd_lockInfo.lockRelId;
+			relationLocks = lappend(relationLocks, lockrelid);
+		}
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		if (OidIsValid(junkAuxIndexId))
+			index_close(junkAuxIndexRel, NoLock);
 		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
@@ -4368,7 +4386,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 	/*
 	 * At this moment all target indexes are marked as "ready-to-insert". So,
-	 * we are free to start process of dropping auxiliary indexes.
+	 * we are free to start process of dropping auxiliary indexes - including
+	 * junk indexes detected earlier.
 	 */
 	foreach(lc, newIndexIds)
 	{
@@ -4391,6 +4410,9 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 */
 		PushActiveSnapshot(GetTransactionSnapshot());
 		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		/* Ensure the junk index is marked as non-ready */
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_set_state_flags(newidx->junkAuxIndexId, INDEX_DROP_CLEAR_READY);
 		PopActiveSnapshot();
 
 		CommitTransactionCommand();
@@ -4610,6 +4632,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_concurrently_set_dead(newidx->tableId, newidx->junkAuxIndexId);
 
 		PopActiveSnapshot();
 	}
@@ -4661,6 +4685,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			object.objectSubId = 0;
 
 			add_exact_object_address(&object, objects);
+
+			if (OidIsValid(idx->junkAuxIndexId))
+			{
+				object.objectId = idx->junkAuxIndexId;
+				object.objectSubId = 0;
+				add_exact_object_address(&object, objects);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8b4ebc6f226..24171a1f165 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1567,6 +1567,8 @@ RemoveRelations(DropStmt *drop)
 	ListCell   *cell;
 	int			flags = 0;
 	LOCKMODE	lockmode = AccessExclusiveLock;
+	MemoryContext private_context,
+				  oldcontext;
 
 	/* DROP CONCURRENTLY uses a weaker lock, and has some restrictions */
 	if (drop->concurrent)
@@ -1631,9 +1633,20 @@ RemoveRelations(DropStmt *drop)
 			relkind = 0;		/* keep compiler quiet */
 			break;
 	}
+	/*
+	 * Create a memory context that will survive forced transaction commits we
+	 * may need to do below (in case of concurrent index drop).
+	 * Since it is a child of PortalContext, it will go away eventually even if
+	 * we suffer an error; there's no need for special abort cleanup logic.
+	 */
+	private_context = AllocSetContextCreate(PortalContext,
+											"RemoveRelations",
+											ALLOCSET_SMALL_SIZES);
 
+	oldcontext = MemoryContextSwitchTo(private_context);
 	/* Lock and validate each relation; build a list of object addresses */
 	objects = new_object_addresses();
+	MemoryContextSwitchTo(oldcontext);
 
 	foreach(cell, drop->objects)
 	{
@@ -1685,6 +1698,38 @@ RemoveRelations(DropStmt *drop)
 			flags |= PERFORM_DELETION_CONCURRENTLY;
 		}
 
+		/*
+		 * Concurrent index drop requires it to be the first transaction. But in
+		 * case we have junk auxiliary index - we want to drop it too (and also
+		 * in a concurrent way). In this case perform silent internal deletion
+		 * of auxiliary index, and restore transaction state. It is fine to do it
+		 * in the loop because there is only single element in drop->objects.
+		 */
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
+			state.actual_relkind == RELKIND_INDEX)
+		{
+			Oid junkAuxIndexOid = get_auxiliary_index(relOid);
+			if (OidIsValid(junkAuxIndexOid))
+			{
+				ObjectAddress object;
+				object.classId = RelationRelationId;
+				object.objectId = junkAuxIndexOid;
+				object.objectSubId = 0;
+				performDeletion(&object, DROP_RESTRICT,
+										 PERFORM_DELETION_CONCURRENTLY |
+										 PERFORM_DELETION_INTERNAL |
+										 PERFORM_DELETION_QUIETLY);
+				CommitTransactionCommand();
+				MemoryContextDelete(private_context);
+
+				/* And start again - now without auxiliary index. */
+				StartTransactionCommand();
+				PushActiveSnapshot(GetTransactionSnapshot());
+				RemoveRelations(drop);
+				return;
+			}
+		}
+
 		/*
 		 * Concurrent index drop cannot be used with partitioned indexes,
 		 * either.
@@ -1713,12 +1758,17 @@ RemoveRelations(DropStmt *drop)
 		obj.objectId = relOid;
 		obj.objectSubId = 0;
 
+		oldcontext = MemoryContextSwitchTo(private_context);
 		add_exact_object_address(&obj, objects);
+		MemoryContextSwitchTo(oldcontext);
 	}
 
+	/* Deletion may involve multiple commits, so, switch to memory context */
+	oldcontext = MemoryContextSwitchTo(private_context);
 	performMultipleDeletions(objects, drop->behavior, flags);
+	MemoryContextSwitchTo(oldcontext);
 
-	free_object_addresses(objects);
+	MemoryContextDelete(private_context);
 }
 
 /*
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 84f7cf9824e..c54748ff644 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps, bool auxiliary)
+			  bool withoutoverlaps, bool auxiliary, Oid auxiliary_for_index_id)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -851,6 +851,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
 	n->ii_Auxiliary = auxiliary;
+	n->ii_AuxiliaryForIndexId = auxiliary_for_index_id;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 2f3c1eae3c7..6ae210c584e 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -193,6 +193,7 @@ extern List *getOwnedSequences(Oid relid);
 extern Oid	getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok);
 
 extern Oid	get_index_constraint(Oid indexId);
+extern Oid	get_auxiliary_index(Oid indexId);
 
 extern List *get_index_ref_constraints(Oid indexId);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 74efa237212..136dddbbf11 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -229,6 +229,8 @@ typedef struct IndexInfo
 	int			ii_ParallelWorkers;
 	/* is auxiliary for concurrent index build? */
 	bool		ii_Auxiliary;
+	/* if creating an auxiliary index, the OID of the main index; otherwise InvalidOid. */
+	Oid			ii_AuxiliaryForIndexId;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index cd7f1eb0592..3a704781c8b 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -100,7 +100,7 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
 								bool summarizing, bool withoutoverlaps,
-								bool auxiliary);
+								bool auxiliary, Oid auxiliary_for_index_id);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index d1723f47e89..2d6abb15a89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3279,20 +3279,109 @@ ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-ERROR:  relation "concur_reindex_tab4" does not exist
-LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-                    ^
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
-ERROR:  could not create unique index "aux_index_ind6"
-DETAIL:  Key (c1)=(1) is duplicated.
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
 WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
 HINT:  Use DROP INDEX or REINDEX INDEX.
 WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
 NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+-- Make sure it is still exists
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
+DROP INDEX aux_index_ind6;
+ERROR:  index "aux_index_ind6" does not exist
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index c2c1b031527..fd96d80abbc 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1344,11 +1344,62 @@ REINDEX INDEX aux_index_ind6_ccaux;
 -- Concurrently also
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
+-- Make sure it is still exists
+\d aux_index_tab5
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.53.0



  [application/x-patch] v33-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch (36.6K, 6-v33-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch)
  download | inline diff:
From 5fddbc07e84d02a73ed43d0c958ee32d65b0ed33 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sun, 11 Jan 2026 19:27:52 +0300
Subject: [PATCH v33 2/7] Add STIR access method and flags related to auxiliary
 indexes

This patch provides infrastructure for following enhancements to concurrent index builds by:
- ii_Auxiliary in IndexInfo: indicates that an index is an auxiliary index used during concurrent index build
- validate_index in IndexVacuumInfo: set if index_bulk_delete called during the validation phase of concurrent index build
- STIR (Short-Term Index Replacement) access method is introduced, intended solely for short-lived, auxiliary usage

STIR functions are designed as an ephemeral helper during concurrent index builds, temporarily storing TIDs without providing the full features of a typical access method. As such, it raises warnings or errors when accessed outside its specialized usage path.

Planned to be used in following commits.
---
 contrib/pgstattuple/pgstattuple.c        |   3 +
 src/backend/access/Makefile              |   1 +
 src/backend/access/heap/vacuumlazy.c     |   2 +
 src/backend/access/meson.build           |   1 +
 src/backend/access/stir/Makefile         |  18 +
 src/backend/access/stir/meson.build      |   5 +
 src/backend/access/stir/stir.c           | 567 +++++++++++++++++++++++
 src/backend/catalog/index.c              |   1 +
 src/backend/catalog/toasting.c           |   1 +
 src/backend/commands/analyze.c           |   1 +
 src/backend/commands/vacuumparallel.c    |   1 +
 src/backend/nodes/makefuncs.c            |   1 +
 src/include/access/genam.h               |   1 +
 src/include/access/reloptions.h          |   3 +-
 src/include/access/stir.h                | 110 +++++
 src/include/catalog/pg_am.dat            |   3 +
 src/include/catalog/pg_opclass.dat       |   4 +
 src/include/catalog/pg_opfamily.dat      |   2 +
 src/include/catalog/pg_proc.dat          |   4 +
 src/include/nodes/execnodes.h            |   7 +-
 src/include/utils/index_selfuncs.h       |   8 +
 src/test/regress/expected/amutils.out    |   8 +-
 src/test/regress/expected/opr_sanity.out |   7 +-
 src/test/regress/expected/psql.out       |  24 +-
 24 files changed, 765 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/stir/Makefile
 create mode 100644 src/backend/access/stir/meson.build
 create mode 100644 src/backend/access/stir/stir.c
 create mode 100644 src/include/access/stir.h

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 6a7f8cb4a7c..5b5984e3aa2 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -285,6 +285,9 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 			case SPGIST_AM_OID:
 				err = "spgist index";
 				break;
+			case STIR_AM_OID:
+				err = "stir index";
+				break;
 			case BRIN_AM_OID:
 				err = "brin index";
 				break;
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index e88d72ea039..ebbcfa90715 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -19,6 +19,7 @@ SUBDIRS	    = \
 	nbtree \
 	rmgrdesc \
 	spgist \
+	stir \
 	sequence \
 	table \
 	tablesample \
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 88c71cd85b6..19cfdfd2640 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3012,6 +3012,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
@@ -3063,6 +3064,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
diff --git a/src/backend/access/meson.build b/src/backend/access/meson.build
index 5fd18de74f9..7219c65f365 100644
--- a/src/backend/access/meson.build
+++ b/src/backend/access/meson.build
@@ -11,6 +11,7 @@ subdir('nbtree')
 subdir('rmgrdesc')
 subdir('sequence')
 subdir('spgist')
+subdir('stir')
 subdir('table')
 subdir('tablesample')
 subdir('transam')
diff --git a/src/backend/access/stir/Makefile b/src/backend/access/stir/Makefile
new file mode 100644
index 00000000000..8785dab37bd
--- /dev/null
+++ b/src/backend/access/stir/Makefile
@@ -0,0 +1,18 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/stir
+#
+# IDENTIFICATION
+#    src/backend/access/stir/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/stir
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	stir.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/stir/meson.build b/src/backend/access/stir/meson.build
new file mode 100644
index 00000000000..4b7ad15346c
--- /dev/null
+++ b/src/backend/access/stir/meson.build
@@ -0,0 +1,5 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+backend_sources += files(
+	'stir.c',
+)
diff --git a/src/backend/access/stir/stir.c b/src/backend/access/stir/stir.c
new file mode 100644
index 00000000000..932590d9ccb
--- /dev/null
+++ b/src/backend/access/stir/stir.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.c
+ *	  Implementation of Short-Term Index Replacement.
+ *
+ * STIR is a specialized access method type designed for temporary storage
+ * of TID values during concurrent index build operations.
+ *
+ * The typical lifecycle of a STIR index is:
+ * 1. created as an auxiliary index for CIC/RIC
+ * 2. accepts inserts for a period
+ * 3. stirbulkdelete called during index validation phase
+ * 4. gets dropped
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/stir/stir.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amvalidate.h"
+#include "access/htup_details.h"
+#include "access/stir.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_amop.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_opfamily.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "utils/catcache.h"
+#include "utils/fmgrprotos.h"
+#include "utils/index_selfuncs.h"
+#include "utils/memutils.h"
+#include "utils/regproc.h"
+#include "utils/syscache.h"
+
+/*
+ * Stir handler function: return IndexAmRoutine with access method parameters
+ * and callbacks.
+ */
+Datum
+stirhandler(PG_FUNCTION_ARGS)
+{
+	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
+
+	/* Set STIR-specific strategy and procedure numbers */
+	amroutine->amstrategies = STIR_NSTRATEGIES;
+	amroutine->amsupport = STIR_NPROC;
+	amroutine->amoptsprocnum = STIR_OPTIONS_PROC;
+
+	/* STIR doesn't support most index operations */
+	amroutine->amcanorder = false;
+	amroutine->amcanorderbyop = false;
+	amroutine->amcanbackward = false;
+	amroutine->amcanunique = false;
+	amroutine->amcanmulticol = true;
+	amroutine->amoptionalkey = true;
+	amroutine->amsearcharray = false;
+	amroutine->amsearchnulls = false;
+	amroutine->amstorage = false;
+	amroutine->amclusterable = false;
+	amroutine->ampredlocks = false;
+	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
+	amroutine->amcaninclude = true;
+	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amparallelvacuumoptions = VACUUM_OPTION_NO_PARALLEL;
+	amroutine->amkeytype = InvalidOid;
+
+	/* Set up function callbacks */
+	amroutine->ambuild = stirbuild;
+	amroutine->ambuildempty = stirbuildempty;
+	amroutine->aminsert = stirinsert;
+	amroutine->aminsertcleanup = NULL;
+	amroutine->ambulkdelete = stirbulkdelete;
+	amroutine->amvacuumcleanup = stirvacuumcleanup;
+	amroutine->amcanreturn = NULL;
+	amroutine->amcostestimate = stircostestimate;
+	amroutine->amoptions = stiroptions;
+	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
+	amroutine->amvalidate = stirvalidate;
+	amroutine->amadjustmembers = NULL;
+	amroutine->ambeginscan = stirbeginscan;
+	amroutine->amrescan = stirrescan;
+	amroutine->amgettuple = NULL;
+	amroutine->amgetbitmap = NULL;
+	amroutine->amendscan = stirendscan;
+	amroutine->ammarkpos = NULL;
+	amroutine->amrestrpos = NULL;
+	amroutine->amestimateparallelscan = NULL;
+	amroutine->aminitparallelscan = NULL;
+	amroutine->amparallelrescan = NULL;
+
+	PG_RETURN_POINTER(amroutine);
+}
+
+/*
+ * Validates operator class for STIR index.
+ *
+ * STIR is not a real index, so validate may be skipped.
+ * But we do it just for consistency.
+ */
+bool
+stirvalidate(Oid opclassoid)
+{
+	bool result = true;
+	HeapTuple classtup;
+	Form_pg_opclass classform;
+	Oid opfamilyoid;
+	HeapTuple familytup;
+	Form_pg_opfamily familyform;
+	char *opfamilyname;
+	CatCList *oprlist;
+	int i;
+
+	/* Fetch opclass information */
+	classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
+	if (!HeapTupleIsValid(classtup))
+		elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
+	classform = (Form_pg_opclass) GETSTRUCT(classtup);
+
+	opfamilyoid = classform->opcfamily;
+
+	/* Fetch opfamily information */
+	familytup = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfamilyoid));
+	if (!HeapTupleIsValid(familytup))
+		elog(ERROR, "cache lookup failed for operator family %u", opfamilyoid);
+	familyform = (Form_pg_opfamily) GETSTRUCT(familytup);
+
+	opfamilyname = NameStr(familyform->opfname);
+
+	/* Fetch all operators and support functions of the opfamily */
+	oprlist = SearchSysCacheList1(AMOPSTRATEGY, ObjectIdGetDatum(opfamilyoid));
+
+	/* Check individual operators */
+	for (i = 0; i < oprlist->n_members; i++)
+	{
+		HeapTuple oprtup = &oprlist->members[i]->tuple;
+		Form_pg_amop oprform = (Form_pg_amop) GETSTRUCT(oprtup);
+
+		/* Check it's allowed strategy for stir */
+		if (oprform->amopstrategy < 1 ||
+			oprform->amopstrategy > STIR_NSTRATEGIES)
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with invalid strategy number %d",
+					        opfamilyname,
+					        format_operator(oprform->amopopr),
+					        oprform->amopstrategy)));
+			result = false;
+		}
+
+		/* stir doesn't support ORDER BY operators */
+		if (oprform->amoppurpose != AMOP_SEARCH ||
+			OidIsValid(oprform->amopsortfamily))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains invalid ORDER BY specification for operator %s",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+
+		/* Check operator signature --- same for all stir strategies */
+		if (!check_amop_signature(oprform->amopopr, BOOLOID,
+		                          oprform->amoplefttype,
+		                          oprform->amoprighttype))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with wrong signature",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+	}
+
+	ReleaseCatCacheList(oprlist);
+	ReleaseSysCache(familytup);
+	ReleaseSysCache(classtup);
+
+	return result;
+}
+
+/*
+ * Initialize meta-page of a STIR index.
+ * The skipInserts flag determines if new inserts will be accepted or skipped.
+ */
+void
+StirFillMetapage(Relation index, Page metaPage, bool skipInserts)
+{
+	StirMetaPageData *metadata;
+
+	StirInitPage(metaPage, STIR_META);
+	metadata = StirPageGetMeta(metaPage);
+	memset(metadata, 0, sizeof(StirMetaPageData));
+	metadata->magicNumber = STIR_MAGIC_NUMBER;
+	metadata->skipInserts = skipInserts;
+	((PageHeader) metaPage)->pd_lower = ((char *) metadata + sizeof(StirMetaPageData)) - (char *) metaPage;
+}
+
+/*
+ * Create and initialize the metapage for a STIR index.
+ * This is called during index creation.
+ */
+void
+StirInitMetapage(Relation index, ForkNumber forknum)
+{
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	/*
+	 * Make a new page; since it is the first page it should be associated with
+	 * block number 0 (STIR_METAPAGE_BLKNO).  No need to hold the extension
+	 * lock because there cannot be concurrent inserters yet.
+	 */
+	metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	Assert(BufferGetBlockNumber(metaBuffer) == STIR_METAPAGE_BLKNO);
+
+	metaPage = BufferGetPage(metaBuffer);
+	StirFillMetapage(index, metaPage, forknum == INIT_FORKNUM);
+
+	MarkBufferDirty(metaBuffer);
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * Initialize any page of a stir index.
+ */
+void
+StirInitPage(Page page, uint16 flags)
+{
+	StirPageOpaque opaque;
+
+	PageInit(page, BLCKSZ, sizeof(StirPageOpaqueData));
+
+	opaque = StirPageGetOpaque(page);
+	opaque->flags = flags;
+	opaque->stir_page_id = STIR_PAGE_ID;
+}
+
+/*
+ * Add a tuple to a STIR page. Returns false if the tuple doesn't fit.
+ * The tuple is added to the end of the page.
+ */
+static bool
+StirPageAddItem(Page page, StirTuple *tuple)
+{
+	StirTuple *itup;
+	StirPageOpaque opaque;
+	char *ptr;
+
+	/* We shouldn't be pointed to an invalid page */
+	Assert(!PageIsNew(page));
+
+	/* Does the new tuple fit on the page? */
+	if (StirPageGetFreeSpace(page) < sizeof(StirTuple))
+		return false;
+
+	/* Copy a new tuple to the end of the page */
+	opaque = StirPageGetOpaque(page);
+	itup = StirPageGetTuple(page, opaque->maxoff + 1);
+	memcpy(itup, tuple, sizeof(StirTuple));
+
+	/* Adjust maxoff and pd_lower */
+	opaque->maxoff++;
+	ptr = (char *) StirPageGetTuple(page, opaque->maxoff + 1);
+	((PageHeader) page)->pd_lower = ptr - page;
+
+	/* Assert we didn't overrun available space */
+	Assert(((PageHeader) page)->pd_lower <= ((PageHeader) page)->pd_upper);
+	return true;
+}
+
+/*
+ * Insert a new tuple into a STIR index.
+ */
+bool
+stirinsert(Relation index, Datum *values, bool *isnull,
+		  ItemPointer ht_ctid, Relation heapRel,
+		  IndexUniqueCheck checkUnique,
+		  bool indexUnchanged,
+		  struct IndexInfo *indexInfo)
+{
+	StirTuple itup;
+	StirMetaPageData *metaData;
+	Buffer buffer,
+			metaBuffer;
+	Page page;
+	BlockNumber blkNo;
+
+	itup.heapPtr = *ht_ctid;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+
+	for (;;)
+	{
+		LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if inserts are allowed */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+		blkNo = metaData->lastBlkNo;
+		/* Don't hold metabuffer lock while doing insert */
+		LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+
+		if (blkNo > 0)
+		{
+			buffer = ReadBuffer(index, blkNo);
+			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+			page = BufferGetPage(buffer);
+
+			Assert(!PageIsNew(page));
+
+			/* Try to add tuple to the existing page */
+			if (StirPageAddItem(page, &itup))
+			{
+				/* Success!  Apply the change, clean up, and exit */
+				MarkBufferDirty(buffer);
+
+				UnlockReleaseBuffer(buffer);
+				ReleaseBuffer(metaBuffer);
+				return false;
+			}
+
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Need to add a new page - get exclusive lock on meta-page */
+		LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+
+		/* Re-check after acquiring exclusive lock */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+
+		/* Check if another backend already extended the index */
+		if (blkNo != metaData->lastBlkNo)
+		{
+			Assert(blkNo < metaData->lastBlkNo);
+			/* Someone else inserted the new page into the index, let's try again */
+			LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+			continue;
+		}
+		else
+		{
+			/* Must extend the file */
+			buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
+									   EB_LOCK_FIRST);
+			page = BufferGetPage(buffer);
+
+			StirInitPage(page, 0);
+
+			if (!StirPageAddItem(page, &itup))
+			{
+				/* We shouldn't be here since we're inserting to an empty page */
+				elog(ERROR, "could not add new stir tuple to empty page");
+			}
+
+			/* Update meta-page with new last block number */
+			metaData->lastBlkNo = BufferGetBlockNumber(buffer);
+
+			MarkBufferDirty(metaBuffer);
+			MarkBufferDirty(buffer);
+
+			UnlockReleaseBuffer(buffer);
+			UnlockReleaseBuffer(metaBuffer);
+
+			return false;
+		}
+	}
+}
+
+/*
+ * STIR doesn't support scans - these functions all error out
+ */
+IndexScanDesc
+stirbeginscan(Relation r, int nkeys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void
+stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+		  ScanKey orderbys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void stirendscan(IndexScanDesc scan)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+/*
+ * Build a STIR index - only allowed for auxiliary indexes.
+ * Just initializes the meta-page without any heap scans.
+ */
+IndexBuildResult *
+stirbuild(Relation heap, Relation index,
+						   struct IndexInfo *indexInfo)
+{
+	IndexBuildResult *result;
+
+	if (!indexInfo->ii_Auxiliary)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Building STIR indexes is not supported")));
+
+	StirInitMetapage(index, MAIN_FORKNUM);
+
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	result->heap_tuples = 0;
+	result->index_tuples = 0;
+	return result;
+}
+
+void stirbuildempty(Relation index)
+{
+	StirInitMetapage(index, INIT_FORKNUM);
+}
+
+IndexBulkDeleteResult *
+stirbulkdelete(IndexVacuumInfo *info,
+									 IndexBulkDeleteResult *stats,
+									 IndexBulkDeleteCallback callback,
+									 void *callback_state)
+{
+	Relation index = info->index;
+	BlockNumber blkno, npages;
+	Buffer buffer;
+	Page page;
+
+	/*
+	 * For normal VACUUM, mark to skip inserts and warn about an index drop
+	 * needed.  In practice this path is not reachable during CREATE INDEX
+	 * CONCURRENTLY because the table-level locks held by CIC prevent concurrent
+	 * VACUUM from opening the auxiliary index.  It can only be reached if a
+	 * leftover STIR index somehow survives after a failed CIC and a later
+	 * VACUUM encounters it.
+	 */
+	if (!info->validate_index)
+	{
+		StirMarkAsSkipInserts(index);
+
+		ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+		return NULL;
+	}
+
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	/*
+	 * Iterate over the pages. We don't care about concurrently added pages,
+	 * because the index is marked as not-ready for that moment and the index is not
+	 * used for insert.
+	 */
+	npages = RelationGetNumberOfBlocks(index);
+	for (blkno = STIR_HEAD_BLKNO; blkno < npages; blkno++)
+	{
+		StirTuple *itup, *itupEnd;
+
+		vacuum_delay_point(false);
+
+		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
+									RBM_NORMAL, info->strategy);
+
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+		page = BufferGetPage(buffer);
+
+		if (PageIsNew(page))
+		{
+			UnlockReleaseBuffer(buffer);
+			continue;
+		}
+
+		itup = StirPageGetTuple(page, FirstOffsetNumber);
+		itupEnd = StirPageGetTuple(page, OffsetNumberNext(StirPageGetMaxOffset(page)));
+		while (itup < itupEnd)
+		{
+			/* Do we have to delete this tuple? */
+			if (callback(&itup->heapPtr, callback_state))
+			{
+				ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("we never delete in stir")));
+			}
+
+			itup = StirPageGetNextTuple(itup);
+		}
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	return stats;
+}
+
+/*
+ * Mark a STIR index to skip future inserts
+ */
+void
+StirMarkAsSkipInserts(Relation index)
+{
+	StirMetaPageData *metaData;
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	metaPage = BufferGetPage(metaBuffer);
+	metaData = StirPageGetMeta(metaPage);
+
+	if (!metaData->skipInserts)
+	{
+		metaData->skipInserts = true;
+		MarkBufferDirty(metaBuffer);
+	}
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * As with stirbulkdelete, this is not reachable during a normal CIC due to
+ * table-level locking.  It serves as a safety net for leftover STIR indexes
+ * from failed concurrent index builds.
+ */
+IndexBulkDeleteResult *
+stirvacuumcleanup(IndexVacuumInfo *info,
+				  IndexBulkDeleteResult *stats)
+{
+	StirMarkAsSkipInserts(info->index);
+	ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+	return NULL;
+}
+
+bytea *
+stiroptions(Datum reloptions, bool validate)
+{
+	return NULL;
+}
+
+void
+stircostestimate(PlannerInfo *root, IndexPath *path,
+					 double loop_count, Cost *indexStartupCost,
+					 Cost *indexTotalCost, Selectivity *indexSelectivity,
+					 double *indexCorrelation, double *indexPages)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1ccfa687f05..0ceeda1fdd9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3412,6 +3412,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
+	ivinfo.validate_index = true;
 
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 4aa52a4bd25..d7ea86b2805 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -314,6 +314,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_ParallelWorkers = 0;
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
+	indexInfo->ii_Auxiliary = false;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 49a5cdf579c..cbeb49050cd 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -726,6 +726,7 @@ do_analyze_rel(Relation onerel, const VacuumParams *params,
 			ivinfo.message_level = elevel;
 			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
 			ivinfo.strategy = vac_strategy;
+			ivinfo.validate_index = false;
 
 			stats = index_vacuum_cleanup(&ivinfo, NULL);
 
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 77834b96a21..1671c3c2196 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -896,6 +896,7 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
 	ivinfo.estimated_count = pvs->shared->estimated_count;
 	ivinfo.num_heap_tuples = pvs->shared->reltuples;
 	ivinfo.strategy = pvs->bstrategy;
+	ivinfo.validate_index = false;
 
 	/* Update error traceback information */
 	pvs->indname = pstrdup(RelationGetRelationName(indrel));
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 3cd35c5c457..5359dab1176 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -875,6 +875,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
+	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index b69320a7fc8..1a5de4b691a 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -58,6 +58,7 @@ typedef struct IndexVacuumInfo
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
 	double		num_heap_tuples;	/* tuples remaining in heap */
+	bool		validate_index; /* validating concurrently built index? */
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index e8cb7f7a627..7f3f08a70ac 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -51,8 +51,9 @@ typedef enum relopt_kind
 	RELOPT_KIND_VIEW = (1 << 9),
 	RELOPT_KIND_BRIN = (1 << 10),
 	RELOPT_KIND_PARTITIONED = (1 << 11),
+	RELOPT_KIND_STIR = (1 << 12),
 	/* if you add a new kind, make sure you update "last_default" too */
-	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_STIR,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/access/stir.h b/src/include/access/stir.h
new file mode 100644
index 00000000000..b08cf4d4ef0
--- /dev/null
+++ b/src/include/access/stir.h
@@ -0,0 +1,110 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.h
+ *	  header file for postgres stir access method implementation.
+ *
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * src/include/access/stir.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STIR_H
+#define STIR_H
+
+#include "access/amapi.h"
+#include "nodes/pathnodes.h"
+#include "storage/bufpage.h"
+
+/* Support procedures numbers */
+#define STIR_NPROC				0
+
+/* Scan strategies */
+#define STIR_NSTRATEGIES		1
+
+#define STIR_OPTIONS_PROC				0
+
+/* Macros for accessing stir page structures */
+#define StirPageGetOpaque(page) ((StirPageOpaque) PageGetSpecialPointer(page))
+#define StirPageGetMaxOffset(page) (StirPageGetOpaque(page)->maxoff)
+#define StirPageIsMeta(page) \
+	((StirPageGetOpaque(page)->flags & STIR_META) != 0)
+#define StirPageGetTuple(page, offset) \
+	((StirTuple *)(PageGetContents(page) \
+		+ sizeof(StirTuple) * ((offset) - 1)))
+#define StirPageGetNextTuple(tuple) \
+	((StirTuple *)((char *)(tuple) + sizeof(StirTuple)))
+
+
+
+/* Preserved page numbers */
+#define STIR_METAPAGE_BLKNO	(0)
+#define STIR_HEAD_BLKNO		(1) /* first data page */
+
+
+/* Opaque for stir pages */
+typedef struct StirPageOpaqueData
+{
+	OffsetNumber maxoff;		/* number of index tuples on the page */
+	uint16		flags;			/* see bit definitions below */
+	uint16		stir_page_id;	/* for identification of STIR indexes */
+} StirPageOpaqueData;
+
+/* Stir page flags */
+#define STIR_META		(1<<0)
+
+typedef StirPageOpaqueData *StirPageOpaque;
+
+#define STIR_PAGE_ID		0xFF84
+
+/* Metadata of stir index */
+typedef struct StirMetaPageData
+{
+	uint32		magicNumber;
+	BlockNumber	lastBlkNo;
+	bool		skipInserts;	/* should we just exit without any inserts? */
+} StirMetaPageData;
+
+/* Magic number to distinguish stir pages from others */
+#define STIR_MAGIC_NUMBER (0xDBAC0DEF)
+
+#define StirPageGetMeta(page)	((StirMetaPageData *) PageGetContents(page))
+
+typedef struct StirTuple
+{
+	ItemPointerData heapPtr;
+} StirTuple;
+
+#define StirPageGetFreeSpace(page) \
+	(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+		- StirPageGetMaxOffset(page) * (sizeof(StirTuple)) \
+		- MAXALIGN(sizeof(StirPageOpaqueData)))
+
+extern void StirFillMetapage(Relation index, Page metaPage, bool skipInserts);
+extern void StirInitMetapage(Relation index, ForkNumber forknum);
+extern void StirInitPage(Page page, uint16 flags);
+extern void StirMarkAsSkipInserts(Relation index);
+
+/* index access method interface functions */
+extern bool stirvalidate(Oid opclassoid);
+extern bool stirinsert(Relation index, Datum *values, bool *isnull,
+					 ItemPointer ht_ctid, Relation heapRel,
+					 IndexUniqueCheck checkUnique,
+					 bool indexUnchanged,
+					 struct IndexInfo *indexInfo);
+extern IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys);
+extern void stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+					 ScanKey orderbys, int norderbys);
+extern void stirendscan(IndexScanDesc scan);
+extern IndexBuildResult *stirbuild(Relation heap, Relation index,
+								 struct IndexInfo *indexInfo);
+extern void stirbuildempty(Relation index);
+extern IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+										   IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+										   void *callback_state);
+extern IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+											  IndexBulkDeleteResult *stats);
+extern bytea *stiroptions(Datum reloptions, bool validate);
+
+#endif			/* STIR_H */
diff --git a/src/include/catalog/pg_am.dat b/src/include/catalog/pg_am.dat
index 46d361047fe..8bd2c2b46ba 100644
--- a/src/include/catalog/pg_am.dat
+++ b/src/include/catalog/pg_am.dat
@@ -33,5 +33,8 @@
 { oid => '3580', oid_symbol => 'BRIN_AM_OID',
   descr => 'block range index (BRIN) access method',
   amname => 'brin', amhandler => 'brinhandler', amtype => 'i' },
+{ oid => '5555', oid_symbol => 'STIR_AM_OID',
+  descr => 'short term index replacement access method',
+  amname => 'stir', amhandler => 'stirhandler', amtype => 'i' },
 
 ]
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index df170b80840..a3457e749db 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -492,4 +492,8 @@
 
 # no brin opclass for the geometric types except box
 
+# allow any types for STIR
+{ opcmethod => 'stir', oid_symbol => 'ANY_STIR_OPS_OID', opcname => 'stir_ops',
+  opcfamily => 'stir/any_ops', opcintype => 'any'},
+
 ]
diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat
index 7a027c4810e..6ffc20a061c 100644
--- a/src/include/catalog/pg_opfamily.dat
+++ b/src/include/catalog/pg_opfamily.dat
@@ -308,5 +308,7 @@
   opfmethod => 'hash', opfname => 'multirange_ops' },
 { oid => '6158',
   opfmethod => 'gist', opfname => 'multirange_ops' },
+{ oid => '5558',
+  opfmethod => 'stir', opfname => 'any_ops' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3579cec5744..242808d0402 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -935,6 +935,10 @@
   proname => 'brinhandler', provolatile => 'v',
   prorettype => 'index_am_handler', proargtypes => 'internal',
   prosrc => 'brinhandler' },
+{ oid => '5556', descr => 'short term index replacement access method handler',
+  proname => 'stirhandler', provolatile => 'v',
+  prorettype => 'index_am_handler', proargtypes => 'internal',
+  prosrc => 'stirhandler' },
 { oid => '3952', descr => 'brin: standalone scan new table pages',
   proname => 'brin_summarize_new_values', provolatile => 'v',
   proparallel => 'u', prorettype => 'int4', proargtypes => 'regclass',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 684e398f824..74efa237212 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -166,8 +166,8 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
- * during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
+ * are used only during index build; they're conventionally zeroed otherwise.
  * ----------------
  */
 typedef struct IndexInfo
@@ -227,7 +227,8 @@ typedef struct IndexInfo
 	bool		ii_WithoutOverlaps;
 	/* # of workers requested (excludes leader) */
 	int			ii_ParallelWorkers;
-
+	/* is auxiliary for concurrent index build? */
+	bool		ii_Auxiliary;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/utils/index_selfuncs.h b/src/include/utils/index_selfuncs.h
index 74793a1a19d..bf0e30dabe9 100644
--- a/src/include/utils/index_selfuncs.h
+++ b/src/include/utils/index_selfuncs.h
@@ -62,6 +62,14 @@ extern void spgcostestimate(struct PlannerInfo *root,
 							Selectivity *indexSelectivity,
 							double *indexCorrelation,
 							double *indexPages);
+extern void stircostestimate(struct PlannerInfo *root,
+							struct IndexPath *path,
+							double loop_count,
+							Cost *indexStartupCost,
+							Cost *indexTotalCost,
+							Selectivity *indexSelectivity,
+							double *indexCorrelation,
+							double *indexPages);
 extern void gincostestimate(struct PlannerInfo *root,
 							struct IndexPath *path,
 							double loop_count,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index 7ab6113c619..92c033a2010 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -173,7 +173,13 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
  spgist | can_exclude   | t
  spgist | can_include   | t
  spgist | bogus         | 
-(36 rows)
+ stir   | can_order     | f
+ stir   | can_unique    | f
+ stir   | can_multi_col | t
+ stir   | can_exclude   | f
+ stir   | can_include   | t
+ stir   | bogus         | 
+(42 rows)
 
 --
 -- additional checks for pg_index_column_has_property
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 6ff4d7ee901..9259679eea2 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2129,9 +2129,10 @@ FROM pg_opclass AS c1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS a1
                  WHERE a1.amopfamily = c1.opcfamily
                    AND binary_coercible(c1.opcintype, a1.amoplefttype));
- opcname | opcfamily 
----------+-----------
-(0 rows)
+ opcname  | opcfamily 
+----------+-----------
+ stir_ops |      5558
+(1 row)
 
 -- Check that each operator listed in pg_amop has an associated opclass,
 -- that is one whose opcintype matches oprleft (possibly by coercion).
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index c8f3932edf0..ecc2c2a6049 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5171,7 +5171,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA *
 List of access methods
@@ -5185,7 +5186,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA h*
 List of access methods
@@ -5210,9 +5212,9 @@ List of access methods
 
 \dA: extra argument "bar" ignored
 \dA+
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5221,12 +5223,13 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ *
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5235,7 +5238,8 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ h*
                      List of access methods
-- 
2.53.0



  [application/x-patch] v33-0007-Refresh-snapshot-periodically-during-index-valid.patch (27.0K, 7-v33-0007-Refresh-snapshot-periodically-during-index-valid.patch)
  download | inline diff:
From e36be8d13923aee871a8bc970d88ae903e56956a Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 21 Apr 2025 14:11:53 +0200
Subject: [PATCH v33 7/7] Refresh snapshot periodically during index validation

Enhances validation phase of concurrently built indexes by periodically refreshing snapshots rather than using a single reference snapshot. This addresses issues with xmin propagation during long-running validations.

The validation now takes a fresh snapshot every few pages, allowing the xmin horizon to advance. This restores feature of commit d9d076222f5b, which was reverted in commit e28bb8851969. New STIR-based approach does not depend on single reference snapshot anymore.
---
 src/backend/access/heap/README.HOT         |  4 +-
 src/backend/access/heap/heapam_handler.c   | 77 +++++++++++++++++++++-
 src/backend/access/spgist/spgvacuum.c      | 12 +++-
 src/backend/catalog/index.c                | 73 +++++++++++++++-----
 src/backend/commands/indexcmds.c           | 50 ++------------
 src/backend/utils/misc/guc_parameters.dat  |  9 +++
 src/include/access/tableam.h               | 25 ++++---
 src/include/access/transam.h               | 15 +++++
 src/include/catalog/index.h                |  2 +-
 src/include/miscadmin.h                    |  1 +
 src/test/regress/expected/create_index.out |  3 +
 src/test/regress/sql/create_index.sql      |  4 ++
 12 files changed, 192 insertions(+), 83 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index b1c797517ee..382fe1723a5 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -401,12 +401,12 @@ live tuple.
 We mark the index open for inserts (but still not ready for reads) then
 we again wait for transactions which have the table open.  Then validate
 the index.  This searches for tuples missing from the index in auxiliary
-index, and inserts any missing ones if they are visible to reference snapshot.
+index, and inserts any missing ones if they are visible to a fresh snapshot.
 Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
-the second reference snapshot is finished.  This ensures that nobody is
+the latest used snapshot is finished.  This ensures that nobody is
 alive any longer who could need to see any tuples that might be missing
 from the index, as well as ensuring that no one can see any inconsistent
 rows in a broken HOT chain (the first condition is stronger than the
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3a04453ff5d..836fd83c4a2 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -53,6 +53,9 @@
 /* GUC: percentage of maintenance_work_mem for CIC validation tuplestore */
 int			debug_cic_validate_store_mem_pct = 10;
 
+/* GUC: refresh snapshot every N pages during CIC validation (0 = disable) */
+int			debug_cic_validate_snapshot_pages = 4096;
+
 static void reform_and_rewrite_tuple(HeapTuple tuple,
                                      Relation OldHeap, Relation NewHeap,
                                      Datum *values, bool *isnull, RewriteState rwstate);
@@ -2030,24 +2033,35 @@ heapam_index_validate_scan_read_stream_next(
 	return result;
 }
 
-static void
+static TransactionId
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
-						   Snapshot snapshot,
 						   ValidateIndexState *state,
 						   ValidateIndexState *auxState)
 {
+	TransactionId limitXmin;
+
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Snapshot		snapshot;
 	TupleTableSlot  *slot;
 	EState			*estate;
 	ExprContext		*econtext;
 	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	int64			num_to_check;
+	int64			page_read_counter = 1; /* set to 1 to skip snapshot reset at start */
 	Tuplestorestate *tuples_for_check;
+
+	/*
+	 * Under REPEATABLE READ or SERIALIZABLE (possible via
+	 * default_transaction_isolation), GetLatestSnapshot() returns the
+	 * transaction-level snapshot and xmin stays pinned.  Periodic snapshot
+	 * refresh is pointless in that case, so skip it.
+	 */
+	bool		reset_snapshot = XactIsoLevel <= XACT_READ_COMMITTED;
 	ValidateIndexScanState callback_private_data;
 
 	Buffer buf;
@@ -2057,6 +2071,8 @@ heapam_index_validate_scan(Relation heapRelation,
 	/* Use a percentage of maintenance_work_mem for tuple store. */
 	int		store_work_mem_part = maintenance_work_mem * debug_cic_validate_store_mem_pct / 100;
 
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
@@ -2065,6 +2081,12 @@ heapam_index_validate_scan(Relation heapRelation,
 	 */
 	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
+	PopActiveSnapshot();
+	InvalidateCatalogSnapshot();
+
+	Assert(!reset_snapshot || !HaveRegisteredOrActiveSnapshot());
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+
 	/*
 	 * sanity checks
 	 */
@@ -2080,6 +2102,29 @@ heapam_index_validate_scan(Relation heapRelation,
 
 	state->tuplesort = auxState->tuplesort = NULL;
 
+	/*
+	 * Now take the first snapshot that will be used to filter candidate
+	 * tuples. We are going to replace it by newer snapshot every so often
+	 * to propagate horizon.
+	 *
+	 * Beware!  There might still be snapshots in use that treat some transaction
+	 * as in-progress that our temporary snapshot treats as committed.
+	 *
+	 * If such a recently-committed transaction deleted tuples in the table,
+	 * we will not include them in the index; yet those transactions which
+	 * see the deleting one as still-in-progress will expect such tuples to
+	 * be there once we mark the index as valid.
+	 *
+	 * We solve this by waiting for all endangered transactions to exit before
+	 * we mark the index as valid, for that reason limitXmin is supported.
+	 *
+	 * We also set ActiveSnapshot to this snap, since functions in indexes may
+	 * need a snapshot.
+	 */
+	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	PushActiveSnapshot(snapshot);
+	limitXmin = snapshot->xmin;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
@@ -2113,6 +2158,7 @@ heapam_index_validate_scan(Relation heapRelation,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		block_number = BufferGetBlockNumber(buf);
+		page_read_counter++;
 
 		i = 0;
 		while ((off = tuples[i]) != InvalidOffsetNumber)
@@ -2183,6 +2229,21 @@ heapam_index_validate_scan(Relation heapRelation,
 		}
 
 		ReleaseBuffer(buf);
+		if (reset_snapshot &&
+			debug_cic_validate_snapshot_pages > 0 &&
+			page_read_counter % debug_cic_validate_snapshot_pages == 0)
+		{
+			PopActiveSnapshot();
+			UnregisterSnapshot(snapshot);
+			/* to make sure we propagate xmin */
+			InvalidateCatalogSnapshot();
+			Assert(!TransactionIdIsValid(MyProc->xmin));
+
+			snapshot = RegisterSnapshot(GetLatestSnapshot());
+			PushActiveSnapshot(snapshot);
+			/* Advance limitXmin so we wait for all snapshots seen so far */
+			limitXmin = TransactionIdNewer(limitXmin, snapshot->xmin);
+		}
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -2192,11 +2253,23 @@ heapam_index_validate_scan(Relation heapRelation,
 	read_stream_end(read_stream);
 	tuplestore_end(tuples_for_check);
 
+	/*
+	 * Drop the latest snapshot.  We must do this before waiting out other
+	 * snapshot holders, else we will deadlock against other processes also
+	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
+	 * they must wait for.
+	 */
+	PopActiveSnapshot();
+	UnregisterSnapshot(snapshot);
+	InvalidateCatalogSnapshot();
+	Assert(!reset_snapshot || MyProc->xmin == InvalidTransactionId);
 	FreeAccessStrategy(bstrategy);
 
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
+
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index c461f8dc02d..ef192fb99c2 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -191,14 +191,16 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 			 * Add target TID to pending list if the redirection could have
 			 * happened since VACUUM started.  (If xid is invalid, assume it
 			 * must have happened before VACUUM started, since REINDEX
-			 * CONCURRENTLY locks out VACUUM.)
+			 * CONCURRENTLY locks out VACUUM, if myXmin is invalid it is
+			 * validation scan.)
 			 *
 			 * Note: we could make a tighter test by seeing if the xid is
 			 * "running" according to the active snapshot; but snapmgr.c
 			 * doesn't currently export a suitable API, and it's not entirely
 			 * clear that a tighter test is worth the cycles anyway.
 			 */
-			if (TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
+			if (!TransactionIdIsValid(bds->myXmin) ||
+					TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
 				spgAddPendingTID(bds, &dt->pointer);
 		}
 		else
@@ -808,7 +810,6 @@ spgvacuumscan(spgBulkDeleteState *bds)
 	/* Finish setting up spgBulkDeleteState */
 	initSpGistState(&bds->spgstate, index);
 	bds->pendingList = NULL;
-	bds->myXmin = GetActiveSnapshot()->xmin;
 	bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
 
 	/*
@@ -959,6 +960,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	bds.stats = stats;
 	bds.callback = callback;
 	bds.callback_state = callback_state;
+	if (info->validate_index)
+		bds.myXmin = InvalidTransactionId;
+	else
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 	spgvacuumscan(&bds);
 
@@ -999,6 +1004,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.stats = stats;
 		bds.callback = dummy_callback;
 		bds.callback_state = NULL;
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 		spgvacuumscan(&bds);
 	}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index d79047fb284..9f96c902150 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -69,6 +69,7 @@
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -3518,8 +3519,9 @@ IndexCheckExclusion(Relation heapRelation,
  * insert their new tuples into it. At the same moment we clear "indisready" for
  * auxiliary index, since it is no more required to be updated.
  *
- * We then take a new reference snapshot, any tuples that are valid according
- * to this snap, but are not in the index, must be added to the index.
+ * We then take a new snapshot, any tuples that are valid according
+ * to this snap, but are not in the index, must be added to the index. In
+ * order to propagate xmin we reset that snapshot every so often.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
  * the snap need not be indexed, because we will wait out all transactions
@@ -3532,7 +3534,7 @@ IndexCheckExclusion(Relation heapRelation,
  * TIDs of both auxiliary and target indexes, and doing a "merge join" against
  * the TID lists to see which tuples from auxiliary index are missing from the
  * target index.  Thus we will ensure that all tuples valid according to the
- * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * latest snapshot are in the index. Notice we need to do bulkdelete in the
  * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
@@ -3545,21 +3547,24 @@ IndexCheckExclusion(Relation heapRelation,
  * before it declares a uniqueness error.
  *
  * After completing validate_index(), we wait until all transactions that
- * were alive at the time of the reference snapshot are gone; this is
- * necessary to be sure there are none left with a transaction snapshot
- * older than the reference (and hence possibly able to see tuples we did
- * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
- * transactions will be able to use it for queries.
+ * were alive at the time of the latest snapshot used during validation are
+ * gone; this is necessary to be sure there are none left with a transaction
+ * snapshot older than that (and hence possibly able to see tuples we did
+ * not index).  The snapshot is periodically refreshed during the heap scan
+ * to propagate the xmin horizon, so limitXmin tracks the most recent one.
+ * Then we mark the index "indisvalid" and commit.  Subsequent transactions
+ * will be able to use it for queries.
  *
  * Also, some actions to concurrent drop the auxiliary index are performed.
  */
-void
-validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
+TransactionId
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId)
 {
 	Relation	heapRelation,
 				indexRelation,
 				auxIndexRelation;
 	IndexInfo  *indexInfo;
+	TransactionId limitXmin;
 	IndexVacuumInfo ivinfo, auxivinfo;
 	ValidateIndexState state, auxState;
 	Oid			save_userid;
@@ -3572,6 +3577,16 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
 	int			aux_work_mem_part = maintenance_work_mem / 10;
 
+	/*
+	 * Under REPEATABLE READ or SERIALIZABLE (possible via
+	 * default_transaction_isolation), GetLatestSnapshot() returns the
+	 * transaction-level snapshot and xmin stays pinned.  Periodic snapshot
+	 * refresh is pointless in that case, so skip it.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	bool		reset_snapshot = XactIsoLevel <= XACT_READ_COMMITTED;
+#endif
+
 	{
 		const int	progress_index[] = {
 			PROGRESS_CREATEIDX_PHASE,
@@ -3609,8 +3624,12 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	 * Fetch info needed for index_insert.  (You might think this should be
 	 * passed in from DefineIndex, but its copy is long gone due to having
 	 * been built in a previous transaction.)
+	 *
+	 * We might need snapshot for index expressions or predicates.
 	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	indexInfo = BuildIndexInfo(indexRelation);
+	PopActiveSnapshot();
 
 	/* mark build is concurrent just for consistency */
 	indexInfo->ii_Concurrent = true;
@@ -3646,6 +3665,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 										   NULL, TUPLESORT_NONE);
 	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	(void) index_bulk_delete(&auxivinfo, NULL,
 							 validate_index_callback, &auxState);
 	/* If aux index is empty, merge may be skipped */
@@ -3665,7 +3687,13 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		index_close(indexRelation, NoLock);
 		table_close(heapRelation, NoLock);
 
-		return;
+		PushActiveSnapshot(GetTransactionSnapshot());
+		limitXmin = GetActiveSnapshot()->xmin;
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+		return limitXmin;
 	}
 
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
@@ -3674,6 +3702,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	/* ambulkdelete updates progress metrics */
 	(void) index_bulk_delete(&ivinfo, NULL,
 							 validate_index_callback, &state);
@@ -3693,19 +3724,24 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	tuplesort_performsort(auxState.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
 
 	/*
 	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
-	table_index_validate_scan(heapRelation,
-							  indexRelation,
-							  indexInfo,
-							  snapshot,
-							  &state,
-							  &auxState);
+	limitXmin = table_index_validate_scan(heapRelation,
+										  indexRelation,
+										  indexInfo,
+										  &state,
+										  &auxState);
 
 	/* Tuple sort closed by table_index_validate_scan */
 	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
@@ -3728,6 +3764,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
+
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 0709e4f986b..a2eb434e20c 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -596,7 +596,6 @@ DefineIndex(ParseState *pstate,
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -1816,32 +1815,11 @@ DefineIndex(ParseState *pstate,
 	/* Tell concurrent index builds to ignore us, if index qualifies */
 	if (safe_index)
 		set_indexsafe_procflags();
-
-	/*
-	 * Now take the "reference snapshot" that will be used by validate_index()
-	 * to filter candidate tuples.  Beware!  There might still be snapshots in
-	 * use that treat some transaction as in-progress that our reference
-	 * snapshot treats as committed.  If such a recently-committed transaction
-	 * deleted tuples in the table, we will not include them in the index; yet
-	 * those transactions which see the deleting one as still-in-progress will
-	 * expect such tuples to be there once we mark the index as valid.
-	 *
-	 * We solve this by waiting for all endangered transactions to exit before
-	 * we mark the index as valid.
-	 *
-	 * We also set ActiveSnapshot to this snap, since functions in indexes may
-	 * need a snapshot.
-	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
-	PushActiveSnapshot(snapshot);
 	/*
 	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
-	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
-	limitXmin = snapshot->xmin;
+	limitXmin = validate_index(tableId, indexRelationId, auxIndexRelationId);
 
-	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1863,8 +1841,8 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * The index is now valid in the sense that it contains all currently
 	 * interesting tuples.  But since it might not contain tuples deleted just
-	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.
+	 * before the last snapshot during validating was taken, we have to wait
+	 * out any transactions that might have older snapshots.
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
@@ -4429,7 +4407,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
-		Snapshot	snapshot;
 
 		StartTransactionCommand();
 
@@ -4444,13 +4421,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		if (newidx->safe)
 			set_indexsafe_procflags();
 
-		/*
-		 * Take the "reference snapshot" that will be used by validate_index()
-		 * to filter candidate tuples.
-		 */
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
-		PushActiveSnapshot(snapshot);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
@@ -4462,16 +4432,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
-
-		/*
-		 * We can now do away with our active snapshot, we still need to save
-		 * the xmin limit to wait for older snapshots.
-		 */
-		limitXmin = snapshot->xmin;
-
-		PopActiveSnapshot();
-		UnregisterSnapshot(snapshot);
+		limitXmin = validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId);
+		Assert(!TransactionIdIsValid(MyProc->xmin));
 
 		/*
 		 * To ensure no deadlocks, we must commit and start yet another
@@ -4484,7 +4446,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		/*
 		 * The index is now valid in the sense that it contains all currently
 		 * interesting tuples.  But since it might not contain tuples deleted
-		 * just before the reference snap was taken, we have to wait out any
+		 * just before the latest snap was taken, we have to wait out any
 		 * transactions that might have older snapshots.
 		 *
 		 * Because we don't take a snapshot or Xid in this transaction,
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index a80ee4fb03f..be29cf3ba5a 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -631,6 +631,15 @@
   boot_val => 'DEFAULT_ASSERT_ENABLED',
 },
 
+{ name => 'debug_cic_validate_snapshot_pages', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Refresh snapshot every N pages during CIC validation (0 to disable).',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'debug_cic_validate_snapshot_pages',
+  boot_val => '4096',
+  min => '0',
+  max => '1000000',
+},
+
 { name => 'debug_cic_validate_store_mem_pct', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Percentage of maintenance_work_mem used for CIC validation tuplestore.',
   flags => 'GUC_NOT_IN_SAMPLE',
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 3705e21b588..49cea7ceef7 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -727,12 +727,11 @@ typedef struct TableAmRoutine
 										   TableScanDesc scan);
 
 	/* see table_index_validate_scan for reference about parameters */
-	void		(*index_validate_scan) (Relation table_rel,
-										Relation index_rel,
-										IndexInfo *index_info,
-										Snapshot snapshot,
-										ValidateIndexState *state,
-										ValidateIndexState *aux_state);
+	TransactionId		(*index_validate_scan) (Relation table_rel,
+												Relation index_rel,
+												IndexInfo *index_info,
+												ValidateIndexState *state,
+												ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1887,20 +1886,18 @@ table_index_build_range_scan(Relation table_rel,
  * Note: it is responsibility of that function to close sortstates in
  * both `state` and `auxstate`.
  */
-static inline void
+static inline TransactionId
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
-						  Snapshot snapshot,
 						  ValidateIndexState *state,
 						  ValidateIndexState *auxstate)
 {
-	table_rel->rd_tableam->index_validate_scan(table_rel,
-											   index_rel,
-											   index_info,
-											   snapshot,
-											   state,
-											   auxstate);
+	return table_rel->rd_tableam->index_validate_scan(table_rel,
+													  index_rel,
+													  index_info,
+													  state,
+													  auxstate);
 }
 
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 6fa91bfcdc0..b33084cb91a 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -417,6 +417,21 @@ NormalTransactionIdOlder(TransactionId a, TransactionId b)
 	return b;
 }
 
+/* return the newer of the two IDs */
+static inline TransactionId
+TransactionIdNewer(TransactionId a, TransactionId b)
+{
+	if (!TransactionIdIsValid(a))
+		return b;
+
+	if (!TransactionIdIsValid(b))
+		return a;
+
+	if (TransactionIdFollows(a, b))
+		return a;
+	return b;
+}
+
 /* return the newer of the two IDs */
 static inline FullTransactionId
 FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 378701b19f1..e928876c459 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -158,7 +158,7 @@ extern void index_build(Relation heapRelation,
 						bool isreindex,
 						bool parallel);
 
-extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
+extern TransactionId validate_index(Oid heapId, Oid indexId, Oid auxIndexId);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index eea3f818a86..f8c27e0dc63 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -269,6 +269,7 @@ extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
 extern PGDLLIMPORT int debug_cic_validate_store_mem_pct;
+extern PGDLLIMPORT int debug_cic_validate_snapshot_pages;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2d6abb15a89..758c5884ff5 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3382,6 +3382,9 @@ DROP INDEX aux_index_ind6;
 --------+---------+-----------+----------+---------
  c1     | integer |           |          | 
 
+SET default_transaction_isolation = 'repeatable read';
+CREATE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+SET default_transaction_isolation = 'read committed';
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index fd96d80abbc..65dd58b947d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1400,6 +1400,10 @@ DROP INDEX aux_index_ind6;
 -- Make sure auxiliary index dropped too
 \d aux_index_tab5
 
+SET default_transaction_isolation = 'repeatable read';
+CREATE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+SET default_transaction_isolation = 'read committed';
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.53.0



  [application/x-patch] v33-0006-Optimize-auxiliary-index-handling.patch (3.0K, 8-v33-0006-Optimize-auxiliary-index-handling.patch)
  download | inline diff:
From 39fffc2d2fa5d98edac9e290a24a8395842124c7 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 30 Dec 2024 16:37:12 +0100
Subject: [PATCH v33 6/7] Optimize auxiliary index handling

Skip unnecessary computations for auxiliary indices by:
- in the index-insert path, detect auxiliary indexes and bypass Datum value computation
- set indexUnchanged=false for auxiliary indices to avoid redundant checks

These optimizations reduce overhead during concurrent index operations.
---
 src/backend/catalog/index.c         | 9 +++++++++
 src/backend/executor/execIndexing.c | 5 ++++-
 src/include/nodes/execnodes.h       | 6 ++++--
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index aaf0b30ff9d..d79047fb284 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2923,6 +2923,15 @@ FormIndexDatum(IndexInfo *indexInfo,
 	ListCell   *indexpr_item;
 	int			i;
 
+	/* Auxiliary index does not need any values to be computed */
+	if (unlikely(indexInfo->ii_Auxiliary))
+	{
+		Assert(indexInfo->ii_Am == STIR_AM_OID);
+		memset(values, 0, sizeof(Datum) * indexInfo->ii_NumIndexAttrs);
+		memset(isnull, true, sizeof(bool) * indexInfo->ii_NumIndexAttrs);
+		return;
+	}
+
 	if (indexInfo->ii_Expressions != NIL &&
 		indexInfo->ii_ExpressionsState == NIL)
 	{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 4363e154c0f..84e99d653ec 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -438,8 +438,11 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
 		 * There's definitely going to be an index_insert() call for this
 		 * index.  If we're being called as part of an UPDATE statement,
 		 * consider if the 'indexUnchanged' = true hint should be passed.
+		 *
+		 * For auxiliary indexes, always pass false to skip value comparison checks,
+		 * since auxiliary indexes only store TIDs and don't track value changes.
 		 */
-		indexUnchanged = ((flags & EIIT_IS_UPDATE) &&
+		indexUnchanged = ((flags & EIIT_IS_UPDATE) && !indexInfo->ii_Auxiliary &&
 						  index_unchanged_by_update(resultRelInfo,
 													estate,
 													indexInfo,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 136dddbbf11..69441685ddb 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -166,8 +166,10 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
- * are used only during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
+ * during index build; they're conventionally zeroed otherwise.  ii_Auxiliary
+ * is also used during retail inserts to skip datum formation for auxiliary
+ * indexes.
  * ----------------
  */
 typedef struct IndexInfo
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-04-06 18:21  Mihail Nikalayeu <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Mihail Nikalayeu @ 2026-04-06 18:21 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; +Cc: Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Rebased once again.


Attachments:

  [application/octet-stream] v34-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch (36.6K, 2-v34-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch)
  download | inline diff:
From 84805f7f3c1b97941ef7ecaefcbc20c78aaca97b Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sun, 11 Jan 2026 19:27:52 +0300
Subject: [PATCH v34 2/7] Add STIR access method and flags related to auxiliary
 indexes

This patch provides infrastructure for following enhancements to concurrent index builds by:
- ii_Auxiliary in IndexInfo: indicates that an index is an auxiliary index used during concurrent index build
- validate_index in IndexVacuumInfo: set if index_bulk_delete called during the validation phase of concurrent index build
- STIR (Short-Term Index Replacement) access method is introduced, intended solely for short-lived, auxiliary usage

STIR functions are designed as an ephemeral helper during concurrent index builds, temporarily storing TIDs without providing the full features of a typical access method. As such, it raises warnings or errors when accessed outside its specialized usage path.

Planned to be used in following commits.
---
 contrib/pgstattuple/pgstattuple.c        |   3 +
 src/backend/access/Makefile              |   1 +
 src/backend/access/heap/vacuumlazy.c     |   2 +
 src/backend/access/meson.build           |   1 +
 src/backend/access/stir/Makefile         |  18 +
 src/backend/access/stir/meson.build      |   5 +
 src/backend/access/stir/stir.c           | 567 +++++++++++++++++++++++
 src/backend/catalog/index.c              |   1 +
 src/backend/catalog/toasting.c           |   1 +
 src/backend/commands/analyze.c           |   1 +
 src/backend/commands/vacuumparallel.c    |   1 +
 src/backend/nodes/makefuncs.c            |   1 +
 src/include/access/genam.h               |   1 +
 src/include/access/reloptions.h          |   3 +-
 src/include/access/stir.h                | 110 +++++
 src/include/catalog/pg_am.dat            |   3 +
 src/include/catalog/pg_opclass.dat       |   4 +
 src/include/catalog/pg_opfamily.dat      |   2 +
 src/include/catalog/pg_proc.dat          |   4 +
 src/include/nodes/execnodes.h            |   7 +-
 src/include/utils/index_selfuncs.h       |   8 +
 src/test/regress/expected/amutils.out    |   8 +-
 src/test/regress/expected/opr_sanity.out |   7 +-
 src/test/regress/expected/psql.out       |  24 +-
 24 files changed, 765 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/stir/Makefile
 create mode 100644 src/backend/access/stir/meson.build
 create mode 100644 src/backend/access/stir/stir.c
 create mode 100644 src/include/access/stir.h

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 6a7f8cb4a7c..5b5984e3aa2 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -285,6 +285,9 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 			case SPGIST_AM_OID:
 				err = "spgist index";
 				break;
+			case STIR_AM_OID:
+				err = "stir index";
+				break;
 			case BRIN_AM_OID:
 				err = "brin index";
 				break;
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index e88d72ea039..ebbcfa90715 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -19,6 +19,7 @@ SUBDIRS	    = \
 	nbtree \
 	rmgrdesc \
 	spgist \
+	stir \
 	sequence \
 	table \
 	tablesample \
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 88c71cd85b6..19cfdfd2640 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3012,6 +3012,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
@@ -3063,6 +3064,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
diff --git a/src/backend/access/meson.build b/src/backend/access/meson.build
index 5fd18de74f9..7219c65f365 100644
--- a/src/backend/access/meson.build
+++ b/src/backend/access/meson.build
@@ -11,6 +11,7 @@ subdir('nbtree')
 subdir('rmgrdesc')
 subdir('sequence')
 subdir('spgist')
+subdir('stir')
 subdir('table')
 subdir('tablesample')
 subdir('transam')
diff --git a/src/backend/access/stir/Makefile b/src/backend/access/stir/Makefile
new file mode 100644
index 00000000000..8785dab37bd
--- /dev/null
+++ b/src/backend/access/stir/Makefile
@@ -0,0 +1,18 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/stir
+#
+# IDENTIFICATION
+#    src/backend/access/stir/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/stir
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	stir.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/stir/meson.build b/src/backend/access/stir/meson.build
new file mode 100644
index 00000000000..4b7ad15346c
--- /dev/null
+++ b/src/backend/access/stir/meson.build
@@ -0,0 +1,5 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+backend_sources += files(
+	'stir.c',
+)
diff --git a/src/backend/access/stir/stir.c b/src/backend/access/stir/stir.c
new file mode 100644
index 00000000000..932590d9ccb
--- /dev/null
+++ b/src/backend/access/stir/stir.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.c
+ *	  Implementation of Short-Term Index Replacement.
+ *
+ * STIR is a specialized access method type designed for temporary storage
+ * of TID values during concurrent index build operations.
+ *
+ * The typical lifecycle of a STIR index is:
+ * 1. created as an auxiliary index for CIC/RIC
+ * 2. accepts inserts for a period
+ * 3. stirbulkdelete called during index validation phase
+ * 4. gets dropped
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/stir/stir.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amvalidate.h"
+#include "access/htup_details.h"
+#include "access/stir.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_amop.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_opfamily.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "utils/catcache.h"
+#include "utils/fmgrprotos.h"
+#include "utils/index_selfuncs.h"
+#include "utils/memutils.h"
+#include "utils/regproc.h"
+#include "utils/syscache.h"
+
+/*
+ * Stir handler function: return IndexAmRoutine with access method parameters
+ * and callbacks.
+ */
+Datum
+stirhandler(PG_FUNCTION_ARGS)
+{
+	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
+
+	/* Set STIR-specific strategy and procedure numbers */
+	amroutine->amstrategies = STIR_NSTRATEGIES;
+	amroutine->amsupport = STIR_NPROC;
+	amroutine->amoptsprocnum = STIR_OPTIONS_PROC;
+
+	/* STIR doesn't support most index operations */
+	amroutine->amcanorder = false;
+	amroutine->amcanorderbyop = false;
+	amroutine->amcanbackward = false;
+	amroutine->amcanunique = false;
+	amroutine->amcanmulticol = true;
+	amroutine->amoptionalkey = true;
+	amroutine->amsearcharray = false;
+	amroutine->amsearchnulls = false;
+	amroutine->amstorage = false;
+	amroutine->amclusterable = false;
+	amroutine->ampredlocks = false;
+	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
+	amroutine->amcaninclude = true;
+	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amparallelvacuumoptions = VACUUM_OPTION_NO_PARALLEL;
+	amroutine->amkeytype = InvalidOid;
+
+	/* Set up function callbacks */
+	amroutine->ambuild = stirbuild;
+	amroutine->ambuildempty = stirbuildempty;
+	amroutine->aminsert = stirinsert;
+	amroutine->aminsertcleanup = NULL;
+	amroutine->ambulkdelete = stirbulkdelete;
+	amroutine->amvacuumcleanup = stirvacuumcleanup;
+	amroutine->amcanreturn = NULL;
+	amroutine->amcostestimate = stircostestimate;
+	amroutine->amoptions = stiroptions;
+	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
+	amroutine->amvalidate = stirvalidate;
+	amroutine->amadjustmembers = NULL;
+	amroutine->ambeginscan = stirbeginscan;
+	amroutine->amrescan = stirrescan;
+	amroutine->amgettuple = NULL;
+	amroutine->amgetbitmap = NULL;
+	amroutine->amendscan = stirendscan;
+	amroutine->ammarkpos = NULL;
+	amroutine->amrestrpos = NULL;
+	amroutine->amestimateparallelscan = NULL;
+	amroutine->aminitparallelscan = NULL;
+	amroutine->amparallelrescan = NULL;
+
+	PG_RETURN_POINTER(amroutine);
+}
+
+/*
+ * Validates operator class for STIR index.
+ *
+ * STIR is not a real index, so validate may be skipped.
+ * But we do it just for consistency.
+ */
+bool
+stirvalidate(Oid opclassoid)
+{
+	bool result = true;
+	HeapTuple classtup;
+	Form_pg_opclass classform;
+	Oid opfamilyoid;
+	HeapTuple familytup;
+	Form_pg_opfamily familyform;
+	char *opfamilyname;
+	CatCList *oprlist;
+	int i;
+
+	/* Fetch opclass information */
+	classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
+	if (!HeapTupleIsValid(classtup))
+		elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
+	classform = (Form_pg_opclass) GETSTRUCT(classtup);
+
+	opfamilyoid = classform->opcfamily;
+
+	/* Fetch opfamily information */
+	familytup = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfamilyoid));
+	if (!HeapTupleIsValid(familytup))
+		elog(ERROR, "cache lookup failed for operator family %u", opfamilyoid);
+	familyform = (Form_pg_opfamily) GETSTRUCT(familytup);
+
+	opfamilyname = NameStr(familyform->opfname);
+
+	/* Fetch all operators and support functions of the opfamily */
+	oprlist = SearchSysCacheList1(AMOPSTRATEGY, ObjectIdGetDatum(opfamilyoid));
+
+	/* Check individual operators */
+	for (i = 0; i < oprlist->n_members; i++)
+	{
+		HeapTuple oprtup = &oprlist->members[i]->tuple;
+		Form_pg_amop oprform = (Form_pg_amop) GETSTRUCT(oprtup);
+
+		/* Check it's allowed strategy for stir */
+		if (oprform->amopstrategy < 1 ||
+			oprform->amopstrategy > STIR_NSTRATEGIES)
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with invalid strategy number %d",
+					        opfamilyname,
+					        format_operator(oprform->amopopr),
+					        oprform->amopstrategy)));
+			result = false;
+		}
+
+		/* stir doesn't support ORDER BY operators */
+		if (oprform->amoppurpose != AMOP_SEARCH ||
+			OidIsValid(oprform->amopsortfamily))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains invalid ORDER BY specification for operator %s",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+
+		/* Check operator signature --- same for all stir strategies */
+		if (!check_amop_signature(oprform->amopopr, BOOLOID,
+		                          oprform->amoplefttype,
+		                          oprform->amoprighttype))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with wrong signature",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+	}
+
+	ReleaseCatCacheList(oprlist);
+	ReleaseSysCache(familytup);
+	ReleaseSysCache(classtup);
+
+	return result;
+}
+
+/*
+ * Initialize meta-page of a STIR index.
+ * The skipInserts flag determines if new inserts will be accepted or skipped.
+ */
+void
+StirFillMetapage(Relation index, Page metaPage, bool skipInserts)
+{
+	StirMetaPageData *metadata;
+
+	StirInitPage(metaPage, STIR_META);
+	metadata = StirPageGetMeta(metaPage);
+	memset(metadata, 0, sizeof(StirMetaPageData));
+	metadata->magicNumber = STIR_MAGIC_NUMBER;
+	metadata->skipInserts = skipInserts;
+	((PageHeader) metaPage)->pd_lower = ((char *) metadata + sizeof(StirMetaPageData)) - (char *) metaPage;
+}
+
+/*
+ * Create and initialize the metapage for a STIR index.
+ * This is called during index creation.
+ */
+void
+StirInitMetapage(Relation index, ForkNumber forknum)
+{
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	/*
+	 * Make a new page; since it is the first page it should be associated with
+	 * block number 0 (STIR_METAPAGE_BLKNO).  No need to hold the extension
+	 * lock because there cannot be concurrent inserters yet.
+	 */
+	metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	Assert(BufferGetBlockNumber(metaBuffer) == STIR_METAPAGE_BLKNO);
+
+	metaPage = BufferGetPage(metaBuffer);
+	StirFillMetapage(index, metaPage, forknum == INIT_FORKNUM);
+
+	MarkBufferDirty(metaBuffer);
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * Initialize any page of a stir index.
+ */
+void
+StirInitPage(Page page, uint16 flags)
+{
+	StirPageOpaque opaque;
+
+	PageInit(page, BLCKSZ, sizeof(StirPageOpaqueData));
+
+	opaque = StirPageGetOpaque(page);
+	opaque->flags = flags;
+	opaque->stir_page_id = STIR_PAGE_ID;
+}
+
+/*
+ * Add a tuple to a STIR page. Returns false if the tuple doesn't fit.
+ * The tuple is added to the end of the page.
+ */
+static bool
+StirPageAddItem(Page page, StirTuple *tuple)
+{
+	StirTuple *itup;
+	StirPageOpaque opaque;
+	char *ptr;
+
+	/* We shouldn't be pointed to an invalid page */
+	Assert(!PageIsNew(page));
+
+	/* Does the new tuple fit on the page? */
+	if (StirPageGetFreeSpace(page) < sizeof(StirTuple))
+		return false;
+
+	/* Copy a new tuple to the end of the page */
+	opaque = StirPageGetOpaque(page);
+	itup = StirPageGetTuple(page, opaque->maxoff + 1);
+	memcpy(itup, tuple, sizeof(StirTuple));
+
+	/* Adjust maxoff and pd_lower */
+	opaque->maxoff++;
+	ptr = (char *) StirPageGetTuple(page, opaque->maxoff + 1);
+	((PageHeader) page)->pd_lower = ptr - page;
+
+	/* Assert we didn't overrun available space */
+	Assert(((PageHeader) page)->pd_lower <= ((PageHeader) page)->pd_upper);
+	return true;
+}
+
+/*
+ * Insert a new tuple into a STIR index.
+ */
+bool
+stirinsert(Relation index, Datum *values, bool *isnull,
+		  ItemPointer ht_ctid, Relation heapRel,
+		  IndexUniqueCheck checkUnique,
+		  bool indexUnchanged,
+		  struct IndexInfo *indexInfo)
+{
+	StirTuple itup;
+	StirMetaPageData *metaData;
+	Buffer buffer,
+			metaBuffer;
+	Page page;
+	BlockNumber blkNo;
+
+	itup.heapPtr = *ht_ctid;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+
+	for (;;)
+	{
+		LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if inserts are allowed */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+		blkNo = metaData->lastBlkNo;
+		/* Don't hold metabuffer lock while doing insert */
+		LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+
+		if (blkNo > 0)
+		{
+			buffer = ReadBuffer(index, blkNo);
+			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+			page = BufferGetPage(buffer);
+
+			Assert(!PageIsNew(page));
+
+			/* Try to add tuple to the existing page */
+			if (StirPageAddItem(page, &itup))
+			{
+				/* Success!  Apply the change, clean up, and exit */
+				MarkBufferDirty(buffer);
+
+				UnlockReleaseBuffer(buffer);
+				ReleaseBuffer(metaBuffer);
+				return false;
+			}
+
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Need to add a new page - get exclusive lock on meta-page */
+		LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+
+		/* Re-check after acquiring exclusive lock */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+
+		/* Check if another backend already extended the index */
+		if (blkNo != metaData->lastBlkNo)
+		{
+			Assert(blkNo < metaData->lastBlkNo);
+			/* Someone else inserted the new page into the index, let's try again */
+			LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+			continue;
+		}
+		else
+		{
+			/* Must extend the file */
+			buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
+									   EB_LOCK_FIRST);
+			page = BufferGetPage(buffer);
+
+			StirInitPage(page, 0);
+
+			if (!StirPageAddItem(page, &itup))
+			{
+				/* We shouldn't be here since we're inserting to an empty page */
+				elog(ERROR, "could not add new stir tuple to empty page");
+			}
+
+			/* Update meta-page with new last block number */
+			metaData->lastBlkNo = BufferGetBlockNumber(buffer);
+
+			MarkBufferDirty(metaBuffer);
+			MarkBufferDirty(buffer);
+
+			UnlockReleaseBuffer(buffer);
+			UnlockReleaseBuffer(metaBuffer);
+
+			return false;
+		}
+	}
+}
+
+/*
+ * STIR doesn't support scans - these functions all error out
+ */
+IndexScanDesc
+stirbeginscan(Relation r, int nkeys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void
+stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+		  ScanKey orderbys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void stirendscan(IndexScanDesc scan)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+/*
+ * Build a STIR index - only allowed for auxiliary indexes.
+ * Just initializes the meta-page without any heap scans.
+ */
+IndexBuildResult *
+stirbuild(Relation heap, Relation index,
+						   struct IndexInfo *indexInfo)
+{
+	IndexBuildResult *result;
+
+	if (!indexInfo->ii_Auxiliary)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Building STIR indexes is not supported")));
+
+	StirInitMetapage(index, MAIN_FORKNUM);
+
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	result->heap_tuples = 0;
+	result->index_tuples = 0;
+	return result;
+}
+
+void stirbuildempty(Relation index)
+{
+	StirInitMetapage(index, INIT_FORKNUM);
+}
+
+IndexBulkDeleteResult *
+stirbulkdelete(IndexVacuumInfo *info,
+									 IndexBulkDeleteResult *stats,
+									 IndexBulkDeleteCallback callback,
+									 void *callback_state)
+{
+	Relation index = info->index;
+	BlockNumber blkno, npages;
+	Buffer buffer;
+	Page page;
+
+	/*
+	 * For normal VACUUM, mark to skip inserts and warn about an index drop
+	 * needed.  In practice this path is not reachable during CREATE INDEX
+	 * CONCURRENTLY because the table-level locks held by CIC prevent concurrent
+	 * VACUUM from opening the auxiliary index.  It can only be reached if a
+	 * leftover STIR index somehow survives after a failed CIC and a later
+	 * VACUUM encounters it.
+	 */
+	if (!info->validate_index)
+	{
+		StirMarkAsSkipInserts(index);
+
+		ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+		return NULL;
+	}
+
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	/*
+	 * Iterate over the pages. We don't care about concurrently added pages,
+	 * because the index is marked as not-ready for that moment and the index is not
+	 * used for insert.
+	 */
+	npages = RelationGetNumberOfBlocks(index);
+	for (blkno = STIR_HEAD_BLKNO; blkno < npages; blkno++)
+	{
+		StirTuple *itup, *itupEnd;
+
+		vacuum_delay_point(false);
+
+		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
+									RBM_NORMAL, info->strategy);
+
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+		page = BufferGetPage(buffer);
+
+		if (PageIsNew(page))
+		{
+			UnlockReleaseBuffer(buffer);
+			continue;
+		}
+
+		itup = StirPageGetTuple(page, FirstOffsetNumber);
+		itupEnd = StirPageGetTuple(page, OffsetNumberNext(StirPageGetMaxOffset(page)));
+		while (itup < itupEnd)
+		{
+			/* Do we have to delete this tuple? */
+			if (callback(&itup->heapPtr, callback_state))
+			{
+				ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("we never delete in stir")));
+			}
+
+			itup = StirPageGetNextTuple(itup);
+		}
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	return stats;
+}
+
+/*
+ * Mark a STIR index to skip future inserts
+ */
+void
+StirMarkAsSkipInserts(Relation index)
+{
+	StirMetaPageData *metaData;
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	metaPage = BufferGetPage(metaBuffer);
+	metaData = StirPageGetMeta(metaPage);
+
+	if (!metaData->skipInserts)
+	{
+		metaData->skipInserts = true;
+		MarkBufferDirty(metaBuffer);
+	}
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * As with stirbulkdelete, this is not reachable during a normal CIC due to
+ * table-level locking.  It serves as a safety net for leftover STIR indexes
+ * from failed concurrent index builds.
+ */
+IndexBulkDeleteResult *
+stirvacuumcleanup(IndexVacuumInfo *info,
+				  IndexBulkDeleteResult *stats)
+{
+	StirMarkAsSkipInserts(info->index);
+	ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+	return NULL;
+}
+
+bytea *
+stiroptions(Datum reloptions, bool validate)
+{
+	return NULL;
+}
+
+void
+stircostestimate(PlannerInfo *root, IndexPath *path,
+					 double loop_count, Cost *indexStartupCost,
+					 Cost *indexTotalCost, Selectivity *indexSelectivity,
+					 double *indexCorrelation, double *indexPages)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9407c357f27..cc067e58d36 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3432,6 +3432,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
+	ivinfo.validate_index = true;
 
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 4aa52a4bd25..d7ea86b2805 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -314,6 +314,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_ParallelWorkers = 0;
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
+	indexInfo->ii_Auxiliary = false;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 49a5cdf579c..cbeb49050cd 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -726,6 +726,7 @@ do_analyze_rel(Relation onerel, const VacuumParams *params,
 			ivinfo.message_level = elevel;
 			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
 			ivinfo.strategy = vac_strategy;
+			ivinfo.validate_index = false;
 
 			stats = index_vacuum_cleanup(&ivinfo, NULL);
 
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 77834b96a21..1671c3c2196 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -896,6 +896,7 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
 	ivinfo.estimated_count = pvs->shared->estimated_count;
 	ivinfo.num_heap_tuples = pvs->shared->reltuples;
 	ivinfo.strategy = pvs->bstrategy;
+	ivinfo.validate_index = false;
 
 	/* Update error traceback information */
 	pvs->indname = pstrdup(RelationGetRelationName(indrel));
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 3cd35c5c457..5359dab1176 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -875,6 +875,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
+	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index b69320a7fc8..1a5de4b691a 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -58,6 +58,7 @@ typedef struct IndexVacuumInfo
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
 	double		num_heap_tuples;	/* tuples remaining in heap */
+	bool		validate_index; /* validating concurrently built index? */
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index e8cb7f7a627..7f3f08a70ac 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -51,8 +51,9 @@ typedef enum relopt_kind
 	RELOPT_KIND_VIEW = (1 << 9),
 	RELOPT_KIND_BRIN = (1 << 10),
 	RELOPT_KIND_PARTITIONED = (1 << 11),
+	RELOPT_KIND_STIR = (1 << 12),
 	/* if you add a new kind, make sure you update "last_default" too */
-	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_STIR,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/access/stir.h b/src/include/access/stir.h
new file mode 100644
index 00000000000..b08cf4d4ef0
--- /dev/null
+++ b/src/include/access/stir.h
@@ -0,0 +1,110 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.h
+ *	  header file for postgres stir access method implementation.
+ *
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * src/include/access/stir.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STIR_H
+#define STIR_H
+
+#include "access/amapi.h"
+#include "nodes/pathnodes.h"
+#include "storage/bufpage.h"
+
+/* Support procedures numbers */
+#define STIR_NPROC				0
+
+/* Scan strategies */
+#define STIR_NSTRATEGIES		1
+
+#define STIR_OPTIONS_PROC				0
+
+/* Macros for accessing stir page structures */
+#define StirPageGetOpaque(page) ((StirPageOpaque) PageGetSpecialPointer(page))
+#define StirPageGetMaxOffset(page) (StirPageGetOpaque(page)->maxoff)
+#define StirPageIsMeta(page) \
+	((StirPageGetOpaque(page)->flags & STIR_META) != 0)
+#define StirPageGetTuple(page, offset) \
+	((StirTuple *)(PageGetContents(page) \
+		+ sizeof(StirTuple) * ((offset) - 1)))
+#define StirPageGetNextTuple(tuple) \
+	((StirTuple *)((char *)(tuple) + sizeof(StirTuple)))
+
+
+
+/* Preserved page numbers */
+#define STIR_METAPAGE_BLKNO	(0)
+#define STIR_HEAD_BLKNO		(1) /* first data page */
+
+
+/* Opaque for stir pages */
+typedef struct StirPageOpaqueData
+{
+	OffsetNumber maxoff;		/* number of index tuples on the page */
+	uint16		flags;			/* see bit definitions below */
+	uint16		stir_page_id;	/* for identification of STIR indexes */
+} StirPageOpaqueData;
+
+/* Stir page flags */
+#define STIR_META		(1<<0)
+
+typedef StirPageOpaqueData *StirPageOpaque;
+
+#define STIR_PAGE_ID		0xFF84
+
+/* Metadata of stir index */
+typedef struct StirMetaPageData
+{
+	uint32		magicNumber;
+	BlockNumber	lastBlkNo;
+	bool		skipInserts;	/* should we just exit without any inserts? */
+} StirMetaPageData;
+
+/* Magic number to distinguish stir pages from others */
+#define STIR_MAGIC_NUMBER (0xDBAC0DEF)
+
+#define StirPageGetMeta(page)	((StirMetaPageData *) PageGetContents(page))
+
+typedef struct StirTuple
+{
+	ItemPointerData heapPtr;
+} StirTuple;
+
+#define StirPageGetFreeSpace(page) \
+	(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+		- StirPageGetMaxOffset(page) * (sizeof(StirTuple)) \
+		- MAXALIGN(sizeof(StirPageOpaqueData)))
+
+extern void StirFillMetapage(Relation index, Page metaPage, bool skipInserts);
+extern void StirInitMetapage(Relation index, ForkNumber forknum);
+extern void StirInitPage(Page page, uint16 flags);
+extern void StirMarkAsSkipInserts(Relation index);
+
+/* index access method interface functions */
+extern bool stirvalidate(Oid opclassoid);
+extern bool stirinsert(Relation index, Datum *values, bool *isnull,
+					 ItemPointer ht_ctid, Relation heapRel,
+					 IndexUniqueCheck checkUnique,
+					 bool indexUnchanged,
+					 struct IndexInfo *indexInfo);
+extern IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys);
+extern void stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+					 ScanKey orderbys, int norderbys);
+extern void stirendscan(IndexScanDesc scan);
+extern IndexBuildResult *stirbuild(Relation heap, Relation index,
+								 struct IndexInfo *indexInfo);
+extern void stirbuildempty(Relation index);
+extern IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+										   IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+										   void *callback_state);
+extern IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+											  IndexBulkDeleteResult *stats);
+extern bytea *stiroptions(Datum reloptions, bool validate);
+
+#endif			/* STIR_H */
diff --git a/src/include/catalog/pg_am.dat b/src/include/catalog/pg_am.dat
index 46d361047fe..8bd2c2b46ba 100644
--- a/src/include/catalog/pg_am.dat
+++ b/src/include/catalog/pg_am.dat
@@ -33,5 +33,8 @@
 { oid => '3580', oid_symbol => 'BRIN_AM_OID',
   descr => 'block range index (BRIN) access method',
   amname => 'brin', amhandler => 'brinhandler', amtype => 'i' },
+{ oid => '5555', oid_symbol => 'STIR_AM_OID',
+  descr => 'short term index replacement access method',
+  amname => 'stir', amhandler => 'stirhandler', amtype => 'i' },
 
 ]
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index df170b80840..a3457e749db 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -492,4 +492,8 @@
 
 # no brin opclass for the geometric types except box
 
+# allow any types for STIR
+{ opcmethod => 'stir', oid_symbol => 'ANY_STIR_OPS_OID', opcname => 'stir_ops',
+  opcfamily => 'stir/any_ops', opcintype => 'any'},
+
 ]
diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat
index 7a027c4810e..6ffc20a061c 100644
--- a/src/include/catalog/pg_opfamily.dat
+++ b/src/include/catalog/pg_opfamily.dat
@@ -308,5 +308,7 @@
   opfmethod => 'hash', opfname => 'multirange_ops' },
 { oid => '6158',
   opfmethod => 'gist', opfname => 'multirange_ops' },
+{ oid => '5558',
+  opfmethod => 'stir', opfname => 'any_ops' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3ea17fc5629..d4644b0b5ef 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -935,6 +935,10 @@
   proname => 'brinhandler', provolatile => 'v',
   prorettype => 'index_am_handler', proargtypes => 'internal',
   prosrc => 'brinhandler' },
+{ oid => '5556', descr => 'short term index replacement access method handler',
+  proname => 'stirhandler', provolatile => 'v',
+  prorettype => 'index_am_handler', proargtypes => 'internal',
+  prosrc => 'stirhandler' },
 { oid => '3952', descr => 'brin: standalone scan new table pages',
   proname => 'brin_summarize_new_values', provolatile => 'v',
   proparallel => 'u', prorettype => 'int4', proargtypes => 'regclass',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3ecae7552fc..ecaf82f2afa 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -169,8 +169,8 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
- * during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
+ * are used only during index build; they're conventionally zeroed otherwise.
  * ----------------
  */
 typedef struct IndexInfo
@@ -230,7 +230,8 @@ typedef struct IndexInfo
 	bool		ii_WithoutOverlaps;
 	/* # of workers requested (excludes leader) */
 	int			ii_ParallelWorkers;
-
+	/* is auxiliary for concurrent index build? */
+	bool		ii_Auxiliary;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/utils/index_selfuncs.h b/src/include/utils/index_selfuncs.h
index 74793a1a19d..bf0e30dabe9 100644
--- a/src/include/utils/index_selfuncs.h
+++ b/src/include/utils/index_selfuncs.h
@@ -62,6 +62,14 @@ extern void spgcostestimate(struct PlannerInfo *root,
 							Selectivity *indexSelectivity,
 							double *indexCorrelation,
 							double *indexPages);
+extern void stircostestimate(struct PlannerInfo *root,
+							struct IndexPath *path,
+							double loop_count,
+							Cost *indexStartupCost,
+							Cost *indexTotalCost,
+							Selectivity *indexSelectivity,
+							double *indexCorrelation,
+							double *indexPages);
 extern void gincostestimate(struct PlannerInfo *root,
 							struct IndexPath *path,
 							double loop_count,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index 7ab6113c619..92c033a2010 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -173,7 +173,13 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
  spgist | can_exclude   | t
  spgist | can_include   | t
  spgist | bogus         | 
-(36 rows)
+ stir   | can_order     | f
+ stir   | can_unique    | f
+ stir   | can_multi_col | t
+ stir   | can_exclude   | f
+ stir   | can_include   | t
+ stir   | bogus         | 
+(42 rows)
 
 --
 -- additional checks for pg_index_column_has_property
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index cfdc6b1a17a..cc947194aa7 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2131,9 +2131,10 @@ FROM pg_opclass AS c1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS a1
                  WHERE a1.amopfamily = c1.opcfamily
                    AND binary_coercible(c1.opcintype, a1.amoplefttype));
- opcname | opcfamily 
----------+-----------
-(0 rows)
+ opcname  | opcfamily 
+----------+-----------
+ stir_ops |      5558
+(1 row)
 
 -- Check that each operator listed in pg_amop has an associated opclass,
 -- that is one whose opcintype matches oprleft (possibly by coercion).
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index c8f3932edf0..ecc2c2a6049 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5171,7 +5171,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA *
 List of access methods
@@ -5185,7 +5186,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA h*
 List of access methods
@@ -5210,9 +5212,9 @@ List of access methods
 
 \dA: extra argument "bar" ignored
 \dA+
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5221,12 +5223,13 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ *
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5235,7 +5238,8 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ h*
                      List of access methods
-- 
2.43.0



  [application/octet-stream] v34-0001-Add-stress-tests-for-concurrent-index-builds.patch (12.5K, 3-v34-0001-Add-stress-tests-for-concurrent-index-builds.patch)
  download | inline diff:
From c4f389d4c181939bfab95f50b47ebe866252ffa7 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v34 1/7] Add stress tests for concurrent index builds

Introduce stress tests for concurrent index operations:
- test concurrent inserts/updates during CREATE/REINDEX INDEX CONCURRENTLY
- cover various index types (btree, gin, gist, brin, hash, spgist)
- test unique and non-unique indexes
- test with expressions and predicates
- test both parallel and non-parallel operations

These tests verify the behavior of the following commits.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 293 ++++++++++++++++++++++++++++++++
 2 files changed, 294 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 592cef74ecb..51a62dccb7b 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..dd7a1eff0ef
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,293 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+use constant STRESS_PGBENCH_CLIENTS => 30;
+use constant STRESS_PGBENCH_JOBS => 8;
+use constant STRESS_PGBENCH_TRANSACTIONS => 10000;
+use constant STRESS_MAX_SLEEP_MS => 10;
+
+use constant DEFAULT_PGBENCH_CLIENTS => 15;
+use constant DEFAULT_PGBENCH_JOBS => 4;
+use constant DEFAULT_PGBENCH_TRANSACTIONS => 500;
+use constant DEFAULT_MAX_SLEEP_MS => 1;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my $node;
+my $pg_test_extra = $ENV{PG_TEST_EXTRA} // '';
+my $is_stress = $pg_test_extra =~ /\bstress\b/ ? 1 : 0;
+my $pgbench_clients =
+  $is_stress ? STRESS_PGBENCH_CLIENTS : DEFAULT_PGBENCH_CLIENTS;
+my $pgbench_jobs = $is_stress ? STRESS_PGBENCH_JOBS : DEFAULT_PGBENCH_JOBS;
+my $pgbench_transactions =
+  $is_stress ? STRESS_PGBENCH_TRANSACTIONS : DEFAULT_PGBENCH_TRANSACTIONS;
+my $max_sleep_ms = $is_stress ? STRESS_MAX_SLEEP_MS : DEFAULT_MAX_SLEEP_MS;
+my $pgbench_options = sprintf(
+	'--no-vacuum --client=%d --jobs=%d --exit-on-abort --transactions=%d',
+	$pgbench_clients,
+	$pgbench_jobs,
+	$pgbench_transactions);
+my $no_hot = $is_stress ? int(rand(2)) : 0;
+
+print(
+		sprintf(
+		'settings: PG_TEST_EXTRA=%s stress=%d clients=%d jobs=%d transactions=%d max_sleep_ms=%d no_hot=%d',
+		defined($ENV{PG_TEST_EXTRA})
+		? ($pg_test_extra eq '' ? '(empty)' : $pg_test_extra)
+		: '(undef)',
+		$is_stress,
+		$pgbench_clients,
+		$pgbench_jobs,
+		$pgbench_transactions,
+		$max_sleep_ms,
+		$no_hot));
+print "\n";
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'maintenance_work_mem = 32MB'); # to avoid OOM
+$node->append_conf('postgresql.conf', 'shared_buffers = 32MB'); # to avoid OOM
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE UNLOGGED TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp,
+								ia int4[], p point)));
+
+if ($no_hot) { $node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);)); }
+
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => sprintf(q(
+			SET debug_parallel_query = off; -- this is because predicate_stable implementation
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\set use_rr random(0, 9)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :use_rr = 0
+						SET default_transaction_isolation = 'repeatable read';
+					\endif
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+					RESET default_transaction_isolation;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for unique BTREE',
+	{
+		'concurrent_ops_unique_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\set use_rr random(0, 9)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :use_rr = 0
+						SET default_transaction_isolation = 'repeatable read';
+					\endif
+					CREATE UNIQUE INDEX CONCURRENTLY new_idx ON tbl(i);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+					RESET default_transaction_isolation;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIN with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN',
+	{
+		'concurrent_ops_gin_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\set use_rr random(0, 9)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :use_rr = 0
+						SET default_transaction_isolation = 'repeatable read';
+					\endif
+					CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIN (ia);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					DROP INDEX CONCURRENTLY new_idx;
+					RESET default_transaction_isolation;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIST/BRIN/HASH/SPGIST index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_other_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\set use_rr random(0, 9)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :use_rr = 0
+						SET default_transaction_isolation = 'repeatable read';
+					\endif
+					\set variant random(0, 3)
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIST (p);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING BRIN (updated_at);
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING HASH (updated_at);
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING SPGIST (p);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					DROP INDEX CONCURRENTLY new_idx;
+					RESET default_transaction_isolation;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+		});
+
+$node->stop;
+done_testing();
-- 
2.43.0



  [application/octet-stream] v34-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch (21.0K, 4-v34-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch)
  download | inline diff:
From 1337fb6930d365238c82113a611620758082cd8d Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 12 Jan 2026 00:57:56 +0300
Subject: [PATCH v34 3/7] Add Datum storage support to tuplestore Extend
 tuplestore to store individual Datum values

This support enables usages of tuplestore for non-tuple data (TIDs) in the next commit.
---
 src/backend/utils/sort/tuplestore.c | 367 +++++++++++++++++++++++-----
 src/include/utils/tuplestore.h      |  33 +--
 2 files changed, 327 insertions(+), 73 deletions(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index f9e2d95186a..2a9b25bd238 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1,16 +1,19 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.c
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
+ *
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
  * Also, it is possible to support multiple independent read pointers.
  *
  * A temporary file is used to handle the data if it exceeds the
@@ -61,6 +64,8 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "storage/buffile.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
 #include "utils/tuplestore.h"
@@ -116,16 +121,15 @@ struct Tuplestorestate
 	BufFile    *myfile;			/* underlying file, or NULL if none */
 	MemoryContext context;		/* memory context for holding tuples */
 	ResourceOwner resowner;		/* resowner for holding temp files */
+	Oid			datumType;		/* InvalidOid or oid of Datum's to be stored */
+	int16		datumTypeLen;	/* typelen of that Datum */
+	bool		datumTypeByVal; /* by-value of that Datum */
 
 	/*
 	 * These function pointers decouple the routines that must know what kind
 	 * of tuple we are handling from the routines that don't need to know it.
 	 * They are set up by the tuplestore_begin_xxx routines.
 	 *
-	 * (Although tuplestore.c currently only supports heap tuples, I've copied
-	 * this part of tuplesort.c so that extension to other kinds of objects
-	 * will be easy if it's ever needed.)
-	 *
 	 * Function to copy a supplied input tuple into palloc'd space. (NB: we
 	 * assume that a single pfree() is enough to release the tuple later, so
 	 * the representation must be "flat" in one palloc chunk.) state->availMem
@@ -150,6 +154,12 @@ struct Tuplestorestate
 	 */
 	void	   *(*readtup) (Tuplestorestate *state, unsigned int len);
 
+	/*
+	 * Function to get length of tuple from tape. Used to provide 'len' argument
+	 * for readtup (see above).
+	 */
+	unsigned int(*lentup) (Tuplestorestate *state, bool eofOK);
+
 	/*
 	 * This array holds pointers to tuples in memory if we are in state INMEM.
 	 * In states WRITEFILE and READFILE it's not used.
@@ -186,6 +196,7 @@ struct Tuplestorestate
 #define COPYTUP(state,tup)	((*(state)->copytup) (state, tup))
 #define WRITETUP(state,tup) ((*(state)->writetup) (state, tup))
 #define READTUP(state,len)	((*(state)->readtup) (state, len))
+#define LENTUP(state,eofOK)	((*(state)->lentup) (state, eofOK))
 #define LACKMEM(state)		((state)->availMem < 0)
 #define USEMEM(state,amt)	((state)->availMem -= (amt))
 #define FREEMEM(state,amt)	((state)->availMem += (amt))
@@ -194,9 +205,9 @@ struct Tuplestorestate
  *
  * NOTES about on-tape representation of tuples:
  *
- * We require the first "unsigned int" of a stored tuple to be the total size
- * on-tape of the tuple, including itself (so it is never zero).
- * The remainder of the stored tuple
+ * In case of tuples we use first "unsigned int" of a stored tuple
+ * to be the total size on-tape of the tuple, including itself
+ * (so it is never zero). The remainder of the stored tuple
  * may or may not match the in-memory representation of the tuple ---
  * any conversion needed is the job of the writetup and readtup routines.
  *
@@ -207,10 +218,13 @@ struct Tuplestorestate
  * state->backward is not set, the write/read routines may omit the extra
  * length word.
  *
- * writetup is expected to write both length words as well as the tuple
+ * In the case of Datum with constant length, both "unsigned int" are omitted.
+ *
+ * writetup is expected to write both length words and the tuple
  * data.  When readtup is called, the tape is positioned just after the
- * front length word; readtup must read the tuple data and advance past
- * the back length word (if present).
+ * front length word (if it is not omitted like in case of content-size Datum);
+ * readtup must read the tuple data and advance past the back length word
+ * (if present).
  *
  * The write/read routines can make use of the tuple description data
  * stored in the Tuplestorestate record, if needed. They are also expected
@@ -242,11 +256,16 @@ static Tuplestorestate *tuplestore_begin_common(int eflags,
 static void tuplestore_puttuple_common(Tuplestorestate *state, void *tuple);
 static void dumptuples(Tuplestorestate *state);
 static void tuplestore_updatemax(Tuplestorestate *state);
-static unsigned int getlen(Tuplestorestate *state, bool eofOK);
+
+static unsigned int lentup_heap(Tuplestorestate *state, bool eofOK);
 static void *copytup_heap(Tuplestorestate *state, void *tup);
 static void writetup_heap(Tuplestorestate *state, void *tup);
 static void *readtup_heap(Tuplestorestate *state, unsigned int len);
 
+static unsigned int lentup_datum(Tuplestorestate *state, bool eofOK);
+static void *copytup_datum(Tuplestorestate *state, void *datum);
+static void writetup_datum(Tuplestorestate *state, void *datum);
+static void *readtup_datum(Tuplestorestate *state, unsigned int len);
 
 /*
  *		tuplestore_begin_xxx
@@ -269,6 +288,12 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
 	state->allowedMem = maxKBytes * (int64) 1024;
 	state->availMem = state->allowedMem;
 	state->myfile = NULL;
+	/*
+	 * Set Datum related data to invalid by default.
+	 */
+	state->datumType = InvalidOid;
+	state->datumTypeLen = 0;
+	state->datumTypeByVal = false;
 
 	/*
 	 * The palloc/pfree pattern for tuple memory is in a FIFO pattern.  A
@@ -346,6 +371,37 @@ tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
 	state->copytup = copytup_heap;
 	state->writetup = writetup_heap;
 	state->readtup = readtup_heap;
+	state->lentup = lentup_heap;
+
+	return state;
+}
+
+/*
+ * The same as tuplestore_begin_heap but create store for Datum values.
+ */
+Tuplestorestate *
+tuplestore_begin_datum(Oid datumType, bool randomAccess, bool interXact, int maxKBytes)
+{
+	Tuplestorestate *state;
+	int			eflags;
+
+	/*
+	 * This interpretation of the meaning of randomAccess is compatible with
+	 * the pre-8.3 behavior of tuplestores.
+	 */
+	eflags = randomAccess ?
+		(EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND) :
+		(EXEC_FLAG_REWIND);
+
+	state = tuplestore_begin_common(eflags, interXact, maxKBytes);
+	state->datumType = datumType;
+	get_typlenbyval(state->datumType, &state->datumTypeLen, &state->datumTypeByVal);
+	Assert(!(state->datumTypeByVal && randomAccess));
+
+	state->copytup = copytup_datum;
+	state->writetup = writetup_datum;
+	state->readtup = readtup_datum;
+	state->lentup = lentup_datum;
 
 	return state;
 }
@@ -444,16 +500,19 @@ tuplestore_clear(Tuplestorestate *state)
 	{
 		int64		availMem = state->availMem;
 
-		/*
-		 * Below, we reset the memory context for storing tuples.  To save
-		 * from having to always call GetMemoryChunkSpace() on all stored
-		 * tuples, we adjust the availMem to forget all the tuples and just
-		 * recall USEMEM for the space used by the memtuples array.  Here we
-		 * just Assert that's correct and the memory tracking hasn't gone
-		 * wrong anywhere.
-		 */
-		for (i = state->memtupdeleted; i < state->memtupcount; i++)
-			availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			/*
+			 * Below, we reset the memory context for storing tuples.  To save
+			 * from having to always call GetMemoryChunkSpace() on all stored
+			 * tuples, we adjust the availMem to forget all the tuples and just
+			 * recall USEMEM for the space used by the memtuples array.  Here we
+			 * just Assert that's correct and the memory tracking hasn't gone
+			 * wrong anywhere.
+			 */
+			for (i = state->memtupdeleted; i < state->memtupcount; i++)
+				availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		}
 
 		availMem += GetMemoryChunkSpace(state->memtuples);
 
@@ -777,6 +836,25 @@ tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple)
 	MemoryContextSwitchTo(oldcxt);
 }
 
+/*
+ * Like tuplestore_puttupleslot but for single Datum.
+ */
+void
+tuplestore_putdatum(Tuplestorestate *state, Datum datum)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->context);
+
+	/*
+	 * Copy the Datum.  (Must do this even in WRITEFILE case.  Note that
+	 * COPYTUP includes USEMEM, so we needn't do that here.)
+	 */
+	datum = PointerGetDatum(COPYTUP(state, DatumGetPointer(datum)));
+
+	tuplestore_puttuple_common(state, DatumGetPointer(datum));
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
 /*
  * Similar to tuplestore_puttuple(), but work from values + nulls arrays.
  * This avoids an extra tuple-construction operation.
@@ -1028,10 +1106,10 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 			pg_fallthrough;
 
 		case TSS_READFILE:
-			*should_free = true;
+			*should_free = !state->datumTypeByVal;
 			if (forward)
 			{
-				if ((tuplen = getlen(state, true)) != 0)
+				if ((tuplen = LENTUP(state, true)) != 0)
 				{
 					tup = READTUP(state, tuplen);
 					return tup;
@@ -1043,6 +1121,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				}
 			}
 
+			Assert(!state->datumTypeByVal);
 			/*
 			 * Backward.
 			 *
@@ -1060,7 +1139,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				Assert(!state->truncated);
 				return NULL;
 			}
-			tuplen = getlen(state, false);
+			tuplen = LENTUP(state, false);
 
 			if (readptr->eof_reached)
 			{
@@ -1091,7 +1170,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 					Assert(!state->truncated);
 					return NULL;
 				}
-				tuplen = getlen(state, false);
+				tuplen = LENTUP(state, false);
 			}
 
 			/*
@@ -1153,6 +1232,41 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 	}
 }
 
+bool
+tuplestore_getdatum(Tuplestorestate *state, bool forward,
+					bool *should_free, Datum *result)
+{
+	Datum datum;
+	*should_free = false;
+
+	datum = (Datum) tuplestore_gettuple(state, forward, should_free);
+
+	/* For by-value datum we may receive zero as valid value. */
+	if (state->datumTypeByVal)
+	{
+		/* Return false only on EOF */
+		if (state->readptrs[state->activeptr].eof_reached)
+		{
+			*result = PointerGetDatum(NULL);
+			return false;
+		}
+
+		*result = datum;
+		return true;
+	}
+
+	if (datum)
+	{
+		*result = datum;
+		return true;
+	}
+	else
+	{
+		*result = PointerGetDatum(NULL);
+		return false;
+	}
+}
+
 /*
  * tuplestore_gettupleslot_force - exported function to fetch a tuple
  *
@@ -1205,10 +1319,20 @@ tuplestore_advance(Tuplestorestate *state, bool forward)
 			pfree(tuple);
 		return true;
 	}
-	else
+
+	/*
+	 * A NULL return normally means end-of-data, but for by-value datum
+	 * stores a valid zero-valued datum (e.g., false, 0) is indistinguishable
+	 * from NULL via pointer check.  Use eof_reached to distinguish.
+	 */
+	if (state->datumTypeByVal)
 	{
-		return false;
+		TSReadPointer *readptr = &state->readptrs[state->activeptr];
+
+		return !readptr->eof_reached;
 	}
+
+	return false;
 }
 
 /*
@@ -1271,7 +1395,13 @@ tuplestore_skiptuples(Tuplestorestate *state, int64 ntuples, bool forward)
 				tuple = tuplestore_gettuple(state, forward, &should_free);
 
 				if (tuple == NULL)
-					return false;
+				{
+					/* See tuplestore_advance for why pointer check is insufficient */
+					if (!state->datumTypeByVal ||
+						state->readptrs[state->activeptr].eof_reached)
+						return false;
+					continue;
+				}
 				if (should_free)
 					pfree(tuple);
 				CHECK_FOR_INTERRUPTS();
@@ -1505,8 +1635,11 @@ tuplestore_trim(Tuplestorestate *state)
 	/* Release no-longer-needed tuples */
 	for (i = state->memtupdeleted; i < nremove; i++)
 	{
-		FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
-		pfree(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
+			pfree(state->memtuples[i]);
+		}
 		state->memtuples[i] = NULL;
 		/* As in dumptuples(), increment memtupdeleted synchronously */
 		state->memtupdeleted++;
@@ -1603,25 +1736,6 @@ tuplestore_in_memory(Tuplestorestate *state)
 	return (state->status == TSS_INMEM);
 }
 
-
-/*
- * Tape interface routines
- */
-
-static unsigned int
-getlen(Tuplestorestate *state, bool eofOK)
-{
-	unsigned int len;
-	size_t		nbytes;
-
-	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
-	if (nbytes == 0)
-		return 0;
-	else
-		return len;
-}
-
-
 /*
  * Routines specialized for HeapTuple case
  *
@@ -1632,6 +1746,19 @@ getlen(Tuplestorestate *state, bool eofOK)
  * to write that separately.
  */
 
+static unsigned int
+lentup_heap(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	else
+		return len;
+}
+
 static void *
 copytup_heap(Tuplestorestate *state, void *tup)
 {
@@ -1678,3 +1805,127 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 		BufFileReadExact(state->myfile, &tuplen, sizeof(tuplen));
 	return tuple;
 }
+
+/*
+ * Routines specialized for Datum case.
+ *
+ * Handles both fixed and variable-length Datums efficiently:
+ * - Fixed-length and Variable-length includes length prefix (and suffix if backward scan)
+ * - By-value types handled inline without extra copying, storing single extra byte
+ *   XXX: consider refactoring to avoid it, currently need it for correct rewind logic
+ */
+
+static unsigned int
+lentup_datum(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	Assert(state->datumType != InvalidOid);
+
+	if (state->datumTypeByVal)
+	{
+		uint8	junk;
+		nbytes = BufFileReadMaybeEOF(state->myfile, &junk, sizeof(uint8), eofOK);
+		if (nbytes == 0)
+			return 0;
+		Assert(junk == (uint8) state->datumTypeLen);
+		return state->datumTypeLen;
+	}
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	return len;
+}
+
+static void *
+copytup_datum(Tuplestorestate *state, void *datum)
+{
+	Datum d;
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+		return DatumGetPointer(PointerGetDatum(datum));
+
+	if (datum == NULL)
+		return NULL;
+
+	d = datumCopy(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+	USEMEM(state, GetMemoryChunkSpace(DatumGetPointer(d)));
+	return DatumGetPointer(d);
+}
+
+static void
+writetup_datum(Tuplestorestate *state, void *datum)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		uint8 junk = state->datumTypeLen; /* overflow is ok */
+		Datum v;
+		Assert(state->datumTypeLen > 0);
+
+		/* just marker byte used to track the end of data for rewind logic */
+		BufFileWrite(state->myfile, &junk, sizeof(junk));
+		store_att_byval(&v, PointerGetDatum(datum), state->datumTypeLen);
+		BufFileWrite(state->myfile, &v, state->datumTypeLen);
+		Assert(!state->backward);
+	}
+	else
+	{
+		unsigned int size;
+		unsigned int tuplen;
+
+		if (state->datumTypeLen < 0)
+			size = datumGetSize(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+		else
+			size = state->datumTypeLen;
+
+		/*
+		 * Include sizeof(unsigned int) in the stored length, matching the
+		 * convention used by writetup_heap.  The backward-scan seek
+		 * arithmetic in tuplestore_gettuple assumes this.
+		 */
+		tuplen = size + sizeof(unsigned int);
+		BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		BufFileWrite(state->myfile, datum, size);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		FREEMEM(state, GetMemoryChunkSpace(datum));
+		pfree(datum);
+	}
+}
+
+static void *
+readtup_datum(Tuplestorestate *state, unsigned int len)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		Datum datum = 0;
+
+		Assert(state->datumTypeLen > 0);
+		Assert(len == state->datumTypeLen);
+		BufFileReadExact(state->myfile, &datum, state->datumTypeLen);
+
+		Assert(!state->backward);
+		return DatumGetPointer(fetch_att(&datum, true, state->datumTypeLen));
+	}
+	else
+	{
+		unsigned int datalen = len - sizeof(unsigned int);
+		void *data = palloc(datalen);
+
+		BufFileReadExact(state->myfile, data, datalen);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileReadExact(state->myfile, &len, sizeof(len));
+
+		return data;
+	}
+}
diff --git a/src/include/utils/tuplestore.h b/src/include/utils/tuplestore.h
index f638b96e156..e16d9a3d352 100644
--- a/src/include/utils/tuplestore.h
+++ b/src/include/utils/tuplestore.h
@@ -1,17 +1,18 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.h
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
- * Also, it is possible to support multiple independent read pointers.
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
  * A temporary file is used to handle the data if it exceeds the
  * space limit specified by the caller.
@@ -39,14 +40,13 @@
  */
 typedef struct Tuplestorestate Tuplestorestate;
 
-/*
- * Currently we only need to store MinimalTuples, but it would be easy
- * to support the same behavior for IndexTuples and/or bare Datums.
- */
-
 extern Tuplestorestate *tuplestore_begin_heap(bool randomAccess,
 											  bool interXact,
 											  int maxKBytes);
+extern Tuplestorestate *tuplestore_begin_datum(Oid datumType,
+											   bool randomAccess,
+											   bool interXact,
+											   int maxKBytes);
 
 extern void tuplestore_set_eflags(Tuplestorestate *state, int eflags);
 
@@ -55,6 +55,7 @@ extern void tuplestore_puttupleslot(Tuplestorestate *state,
 extern void tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple);
 extern void tuplestore_putvalues(Tuplestorestate *state, TupleDesc tdesc,
 								 const Datum *values, const bool *isnull);
+extern void tuplestore_putdatum(Tuplestorestate *state, Datum datum);
 
 extern int	tuplestore_alloc_read_pointer(Tuplestorestate *state, int eflags);
 
@@ -72,6 +73,8 @@ extern bool tuplestore_in_memory(Tuplestorestate *state);
 
 extern bool tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 									bool copy, TupleTableSlot *slot);
+extern bool tuplestore_getdatum(Tuplestorestate *state, bool forward,
+								bool *should_free, Datum *result);
 
 extern bool tuplestore_gettupleslot_force(Tuplestorestate *state, bool forward,
 										  bool copy, TupleTableSlot *slot);
-- 
2.43.0



  [application/octet-stream] v34-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch (98.1K, 5-v34-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch)
  download | inline diff:
From 412f7e850d93f4ce27e7955bfca1076c4d0862bd Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 15:03:10 +0100
Subject: [PATCH v34 4/7] Use auxiliary indexes for concurrent index operations

Replace the second table full scan in concurrent index builds with an auxiliary index approach:
- create a STIR auxiliary index with the same predicate (if exists) as in main index
- use it to track tuples inserted during the first phase
- merge auxiliary index with main index during validation to catch up new index with any tuples missed during the first phase
- automatically drop auxiliary when main index is ready

To merge main and auxiliary indexes:
- index_bulk_delete called for both, TIDs put into tuplesort
- both tuplesort are being sorted
- both tuplesort scanned with two pointers looking for the TIDs present in auxiliary index, but absent in main one
- all such TIDs are put into tuplestore
- all TIDs in tuplestore are fetched using the stream, tuplestore used in heapam_index_validate_scan_read_stream_next to provide the next page to prefetch
- if fetched tuple is alive - it is inserted into the main index

This eliminates the need for a second full table scan during validation, improving performance, especially for large tables. Affects both CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY operations.
---
 doc/src/sgml/monitoring.sgml               |  26 +-
 doc/src/sgml/ref/create_index.sgml         |  34 +-
 doc/src/sgml/ref/reindex.sgml              |  40 +-
 src/backend/access/heap/README.HOT         |  13 +-
 src/backend/access/heap/heapam_handler.c   | 561 ++++++++++++++-------
 src/backend/catalog/index.c                | 322 ++++++++++--
 src/backend/catalog/system_views.sql       |  17 +-
 src/backend/commands/indexcmds.c           | 345 +++++++++++--
 src/backend/nodes/makefuncs.c              |   4 +-
 src/backend/utils/misc/guc_parameters.dat  |   9 +
 src/include/access/tableam.h               |  12 +-
 src/include/catalog/index.h                |   9 +-
 src/include/commands/progress.h            |  13 +-
 src/include/miscadmin.h                    |   1 +
 src/include/nodes/makefuncs.h              |   3 +-
 src/test/regress/expected/create_index.out |  42 ++
 src/test/regress/expected/indexing.out     |   3 +-
 src/test/regress/expected/rules.out        |  17 +-
 src/test/regress/sql/create_index.sql      |  21 +
 19 files changed, 1156 insertions(+), 336 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 312374da5e0..a4186e8a22f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6792,6 +6792,18 @@ FROM pg_stat_get_backend_idset() AS backendid;
        information for this phase.
       </entry>
      </row>
+     <row>
+      <entry><literal>waiting for writers to use auxiliary index</literal></entry>
+      <entry>
+       <command>CREATE INDEX CONCURRENTLY</command> or <command>REINDEX CONCURRENTLY</command> is waiting for transactions
+       with write locks that can potentially see the table to finish, to ensure use of auxiliary index for new tuples in
+       future transactions.
+       This phase is skipped when not in concurrent mode.
+       Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
+       and <structname>current_locker_pid</structname> contain the progress
+       information for this phase.
+      </entry>
+     </row>
      <row>
       <entry><literal>building index</literal></entry>
       <entry>
@@ -6832,13 +6844,12 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </entry>
      </row>
      <row>
-      <entry><literal>index validation: scanning table</literal></entry>
+      <entry><literal>index validation: merging indexes</literal></entry>
       <entry>
-       <command>CREATE INDEX CONCURRENTLY</command> is scanning the table
-       to validate the index tuples collected in the previous two phases.
+       <command>CREATE INDEX CONCURRENTLY</command> is merging content of auxiliary index with the target index.
        This phase is skipped when not in concurrent mode.
-       Columns <structname>blocks_total</structname> (set to the total size of the table)
-       and <structname>blocks_done</structname> contain the progress information for this phase.
+       Columns <structname>tuples_total</structname> (set to the number of tuples to be merged)
+       and <structname>tuples_done</structname> contain the progress information for this phase.
       </entry>
      </row>
      <row>
@@ -6855,8 +6866,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
      <row>
       <entry><literal>waiting for readers before marking dead</literal></entry>
       <entry>
-       <command>REINDEX CONCURRENTLY</command> is waiting for transactions
-       with read locks on the table to finish, before marking the old index dead.
+       <command>CREATE INDEX CONCURRENTLY</command> is waiting for transactions
+        with read locks on the table to finish, before marking the auxiliary index as dead.
+       <command>REINDEX CONCURRENTLY</command> is also waiting before marking the old index as dead.
        This phase is skipped when not in concurrent mode.
        Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
        and <structname>current_locker_pid</structname> contain the progress
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index bb7505d171b..12c88587a79 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -620,10 +620,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     out writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>CREATE INDEX</command>.
     When this option is used,
-    <productname>PostgreSQL</productname> must perform two scans of the table, and in
-    addition it must wait for all existing transactions that could potentially
-    modify or use the index to terminate.  Thus
-    this method requires more total work than a standard index build and takes
+    <productname>PostgreSQL</productname> must perform table scan followed by
+    validation phase, and in addition it must wait for all existing transactions
+    that could potentially modify or use the index to terminate.  Thus
+    this method requires more total work than a standard index build and may take
     significantly longer to complete.  However, since it allows normal
     operations to continue while the index is built, this method is useful for
     adding new indexes in a production environment.  Of course, the extra CPU
@@ -631,14 +631,14 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </para>
 
    <para>
-    In a concurrent index build, the index is actually entered as an
-    <quote>invalid</quote> index into
-    the system catalogs in one transaction, then two table scans occur in
-    two more transactions.  Before each table scan, the index build must
+    In a concurrent index build, the main and auxiliary indexes are actually
+    entered as an <quote>invalid</quote> index into
+    the system catalogs in one transaction, then two phases occur in
+    multiple transactions.  Before each phase, the index build must
     wait for existing transactions that have modified the table to terminate.
-    After the second scan, the index build must wait for any transactions
+    After the second phase, the index build must wait for any transactions
     that have a snapshot (see <xref linkend="mvcc"/>) predating the second
-    scan to terminate, including transactions used by any phase of concurrent
+    phase to terminate, including transactions used by any phase of concurrent
     index builds on other tables, if the indexes involved are partial or have
     columns that are not simple column references.
     Then finally the index can be marked <quote>valid</quote> and ready for use,
@@ -651,10 +651,11 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    <para>
     If a problem arises while scanning the table, such as a deadlock or a
     uniqueness violation in a unique index, the <command>CREATE INDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> index. This index
-    will be ignored for querying purposes because it might be incomplete;
-    however it will still consume update overhead. The <application>psql</application>
-    <command>\d</command> command will report such an index as <literal>INVALID</literal>:
+    command will fail but leave behind an <quote>invalid</quote> index and its
+    associated auxiliary index. These indexes
+    will be ignored for querying purposes because they might be incomplete;
+    however they will still consume update overhead. The <application>psql</application>
+    <command>\d</command> command will report such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -664,11 +665,12 @@ postgres=# \d tab
  col    | integer |           |          |
 Indexes:
     "idx" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     The recommended recovery
-    method in such cases is to drop the index and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>.  (Another possibility is
+    method in such cases is to drop these indexes and try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
     to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
    </para>
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 185cd75ca30..9e0248261ae 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -368,9 +368,8 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <productname>PostgreSQL</productname> supports rebuilding indexes with minimum locking
     of writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>REINDEX</command>. When this option
-    is used, <productname>PostgreSQL</productname> must perform two scans of the table
-    for each index that needs to be rebuilt and wait for termination of
-    all existing transactions that could potentially use the index.
+    is used, <productname>PostgreSQL</productname> must perform several steps to ensure data
+    consistency while allowing normal operations to continue.
     This method requires more total work than a standard index
     rebuild and takes significantly longer to complete as it needs to wait
     for unfinished transactions that might modify the index. However, since
@@ -388,7 +387,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <orderedlist>
      <listitem>
       <para>
-       A new transient index definition is added to the catalog
+       A new transient index definition and an auxiliary index are added to the catalog
        <literal>pg_index</literal>.  This definition will be used to replace
        the old index.  A <literal>SHARE UPDATE EXCLUSIVE</literal> lock at
        session level is taken on the indexes being reindexed as well as their
@@ -398,7 +397,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       A first pass to build the index is done for each new index.  Once the
+       The auxiliary index is marked as "ready for inserts", making
+       it visible to other sessions. This index efficiently tracks all new
+       tuples during the reindex process.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The new main index is built by scanning the table.  Once the
        index is built, its flag <literal>pg_index.indisready</literal> is
        switched to <quote>true</quote> to make it ready for inserts, making it
        visible to other sessions once the transaction that performed the build
@@ -409,9 +416,9 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       Then a second pass is performed to add tuples that were added while the
-       first pass was running.  This step is also done in a separate
-       transaction for each index.
+       A validation phase merges any missing entries from the auxiliary index
+       into the main index, ensuring all concurrent changes are captured.
+       This step is also done in a separate transaction for each index.
       </para>
      </listitem>
 
@@ -428,7 +435,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes have <literal>pg_index.indisready</literal> switched to
+       The old and auxiliary indexes have <literal>pg_index.indisready</literal> switched to
        <quote>false</quote> to prevent any new tuple insertions, after waiting
        for running queries that might reference the old index to complete.
       </para>
@@ -436,7 +443,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes are dropped.  The <literal>SHARE UPDATE
+       The old and auxiliary indexes are dropped.  The <literal>SHARE UPDATE
        EXCLUSIVE</literal> session locks for the indexes and the table are
        released.
       </para>
@@ -447,11 +454,11 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
    <para>
     If a problem arises while rebuilding the indexes, such as a
     uniqueness violation in a unique index, the <command>REINDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> new index in addition to
-    the pre-existing one. This index will be ignored for querying purposes
-    because it might be incomplete; however it will still consume update
+    command will fail but leave behind an <quote>invalid</quote> new index and its auxiliary index in addition to
+    the pre-existing one. These indexes will be ignored for querying purposes
+    because they might be incomplete; however they will still consume update
     overhead. The <application>psql</application> <command>\d</command> command will report
-    such an index as <literal>INVALID</literal>:
+    such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -462,12 +469,13 @@ postgres=# \d tab
 Indexes:
     "idx" btree (col)
     "idx_ccnew" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     If the index marked <literal>INVALID</literal> is suffixed
-    <literal>_ccnew</literal>, then it corresponds to the transient
+    <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
-    recovery method is to drop it using <literal>DROP INDEX</literal>,
+    recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
     then attempt <command>REINDEX CONCURRENTLY</command> again.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 74e407f375a..b1c797517ee 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -375,6 +375,11 @@ constraint on which updates can be HOT.  Other transactions must include
 such an index when determining HOT-safety of updates, even though they
 must ignore it for both insertion and searching purposes.
 
+Also, special auxiliary index is created the same way. It is marked as
+"ready for inserts" without any actual table scan. Its purpose is to collect
+new tuples inserted into table while our target index is still "not ready
+for inserts".
+
 We must do this to avoid making incorrect index entries.  For example,
 suppose we are building an index on column X and we make an index entry for
 a non-HOT tuple with X=1.  Then some other backend, unaware that X is an
@@ -394,10 +399,10 @@ entry at the root of the HOT-update chain but we use the key value from the
 live tuple.
 
 We mark the index open for inserts (but still not ready for reads) then
-we again wait for transactions which have the table open.  Then we take
-a second reference snapshot and validate the index.  This searches for
-tuples missing from the index, and inserts any missing ones.  Again,
-the index entries have to have TIDs equal to HOT-chain root TIDs, but
+we again wait for transactions which have the table open.  Then validate
+the index.  This searches for tuples missing from the index in auxiliary
+index, and inserts any missing ones if they are visible to reference snapshot.
+Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 07f07188d46..a3474925d61 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -42,15 +42,20 @@
 #include "storage/lmgr.h"
 #include "storage/lock.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/tuplesort.h"
+#include "utils/tuplestore.h"
+
+/* GUC: percentage of maintenance_work_mem for CIC validation tuplestore */
+int			debug_cic_validate_store_mem_pct = 10;
 
 static void reform_and_rewrite_tuple(HeapTuple tuple,
-									 Relation OldHeap, Relation NewHeap,
-									 Datum *values, bool *isnull, RewriteState rwstate);
+                                     Relation OldHeap, Relation NewHeap,
+                                     Datum *values, bool *isnull, RewriteState rwstate);
 
 static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer,
 								   HeapTuple tuple,
@@ -1665,242 +1670,422 @@ heapam_index_build_range_scan(Relation heapRelation,
 	return reltuples;
 }
 
+/*
+ * Calculate set difference (relative complement) of main and aux
+ * sets.
+ *
+ * All records which are present in auxiliary tuplesort but not in
+ * main are added to the store.
+ *
+ * In set theory notation store = aux - main or store = aux / main.
+ *
+ * returns number of items added to store
+ */
+static int64
+heapam_index_validate_tuplesort_difference(Tuplesortstate *main,
+										   Tuplesortstate *aux,
+										   Tuplestorestate *store)
+{
+	int64		num = 0;
+	/* state variables for the merge */
+	ItemPointer	indexcursor = NULL,
+					auxindexcursor = NULL;
+	ItemPointerData decoded,
+					auxdecoded;
+	bool			tuplesort_empty = false,
+					auxtuplesort_empty = false;
+
+	/* Initialize pointers. */
+	ItemPointerSetInvalid(&decoded);
+	ItemPointerSetInvalid(&auxdecoded);
+
+	/*
+	 * Main loop: we step through the auxiliary sort (auxState->tuplesort),
+	 * which holds TIDs that must compared to those from the "main" sort
+	 * (state->tuplesort).
+	 */
+	while (!auxtuplesort_empty)
+	{
+		Datum		ts_val;
+		bool		ts_isnull;
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Attempt to fetch the next TID from the auxiliary sort. If it's
+		 * empty, we set auxindexcursor to NULL.
+		 */
+		auxtuplesort_empty = !tuplesort_getdatum(aux, true,
+												 false, &ts_val, &ts_isnull,
+												 NULL);
+		Assert(auxtuplesort_empty || !ts_isnull);
+		if (!auxtuplesort_empty)
+		{
+			itemptr_decode(&auxdecoded, DatumGetInt64(ts_val));
+			auxindexcursor = &auxdecoded;
+		}
+		else
+		{
+			auxindexcursor = NULL;
+		}
+
+		/*
+		 * If the auxiliary sort is not yet empty, we now try to synchronize
+		 * the "main" sort cursor (indexcursor) with auxindexcursor. We advance
+		 * the main sort cursor until we've reached or passed the auxiliary TID.
+		 */
+		if (!auxtuplesort_empty)
+		{
+			/*
+			 * Move the main sort forward while:
+			 *   (1) It's not exhausted (tuplesort_empty == false), and
+			 *   (2) Either indexcursor is NULL (first iteration) or
+			 *       indexcursor < auxindexcursor in TID order.
+			 */
+			while (!tuplesort_empty && (indexcursor == NULL || /* null on first time here */
+						ItemPointerCompare(indexcursor, auxindexcursor) < 0))
+			{
+				/*
+				 * Get the next TID from the main sort. If it's empty,
+				 * we set indexcursor to NULL.
+				 */
+				tuplesort_empty = !tuplesort_getdatum(main, true,
+													  false, &ts_val, &ts_isnull,
+													  NULL);
+				Assert(tuplesort_empty || !ts_isnull);
+
+				if (!tuplesort_empty)
+				{
+					itemptr_decode(&decoded, DatumGetInt64(ts_val));
+					indexcursor = &decoded;
+				}
+				else
+				{
+					indexcursor = NULL;
+				}
+
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * Now, if either:
+			 *  - the main sort is empty, or
+			 *  - indexcursor > auxindexcursor,
+			 *
+			 * then auxindexcursor identifies a TID that doesn't appear in
+			 * the main sort. We likely need to insert it
+			 * into the target index if it’s visible in the heap.
+			 */
+			if (tuplesort_empty || ItemPointerCompare(indexcursor, auxindexcursor) > 0)
+			{
+				tuplestore_putdatum(store, Int64GetDatum(itemptr_encode(auxindexcursor)));
+				num++;
+			}
+		}
+	}
+
+	return num;
+}
+
+typedef struct ValidateIndexScanState
+{
+	Tuplestorestate		*store;
+	BlockNumber			prev_block_number;
+	OffsetNumber		prev_offset_number;
+} ValidateIndexScanState;
+
+/*
+ * This is ReadStreamBlockNumberCB implementation which works as follows:
+ *
+ * 1) It iterates over a sorted tuplestore, where each element is an encoded
+ *    ItemPointer
+ *
+ * 2) It returns the current BlockNumber and collects all OffsetNumbers
+ *    for that block in per_buffer_data.
+ *
+ * 3) Once the code encounters a new BlockNumber, it stops reading more
+ *    offsets and saves the OffsetNumber of the new block for the next call.
+ *
+ * 4) The list of offsets for a block is always terminated with InvalidOffsetNumber.
+ *
+ * This function is intended to be repeatedly called, each time returning
+ * the next block and its corresponding set of offsets.
+ */
+static BlockNumber
+heapam_index_validate_scan_read_stream_next(
+								  ReadStream *stream,
+								  void *void_callback_private_data,
+								  void *void_per_buffer_data
+								  )
+{
+	bool should_free;
+	Datum datum;
+	BlockNumber result = InvalidBlockNumber;
+	int i = 0;
+
+	/*
+	 * Retrieve the specialized callback state and the output buffer.
+	 * callback_private_data keeps track of the previous block and offset
+	 * from a prior invocation, if any.
+	 */
+	ValidateIndexScanState *callback_private_data = void_callback_private_data;
+	OffsetNumber *per_buffer_data = void_per_buffer_data;
+
+	/*
+	 * If there is a "leftover" offset number from the previous invocation,
+	 * it means we had switched to a new block in the middle of the last call.
+	 * We place that leftover offset number into the buffer first.
+	 */
+	if (callback_private_data->prev_offset_number != InvalidOffsetNumber)
+	{
+		Assert(callback_private_data->prev_block_number != InvalidBlockNumber);
+		/*
+		 * 'result' is the block number to return. We set it to the block
+		 * from the previous leftover offset.
+		 */
+		result = callback_private_data->prev_block_number;
+		/* Place leftover offset number in the output buffer. */
+		per_buffer_data[i++] = callback_private_data->prev_offset_number;
+		/*
+		 * Clear the leftover offset number so it won't be reused unless
+		 * we encounter another block change.
+		 */
+		callback_private_data->prev_offset_number = InvalidOffsetNumber;
+	}
+
+	/*
+	 * Read from the tuplestore until we either run out of tuples or we
+	 * encounter a block change. For each tuple:
+	 *
+	 *   1) Decode its block/offset from the Datum.
+	 *   2) If it's the first time in this call (prev_block_number == InvalidBlockNumber),
+	 *      initialize prev_block_number.
+	 *   3) If the block number matches the current block, collect the offset.
+	 *   4) If the block number differs, save that offset as leftover and break
+	 *      so that the next call can handle the new block.
+	 */
+	while (tuplestore_getdatum(callback_private_data->store, true, &should_free, &datum))
+	{
+		BlockNumber next_block_number;
+		ItemPointerData next_data;
+
+		/* Decode the datum into an ItemPointer (block + offset). */
+		itemptr_decode(&next_data, DatumGetInt64(datum));
+		next_block_number = ItemPointerGetBlockNumber(&next_data);
+
+		/*
+		 * If we haven't set a block number yet this round, initialize it
+		 * using the first tuple we read.
+		 */
+		if (callback_private_data->prev_block_number == InvalidBlockNumber)
+			callback_private_data->prev_block_number = next_block_number;
+
+		/*
+		 * Always set the result to be the "current" block number
+		 * we are filling offsets for.
+		 */
+		result = callback_private_data->prev_block_number;
+
+		/*
+		 * If this tuple is from the same block, just store its offset
+		 * in our per_buffer_data array.
+		 */
+		if (next_block_number == callback_private_data->prev_block_number)
+		{
+			per_buffer_data[i++] = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+		}
+		else
+		{
+			/*
+			 * If the block just changed, store the offset of the new block
+			 * as leftover for the next invocation and break out.
+			 */
+			callback_private_data->prev_block_number = next_block_number;
+			callback_private_data->prev_offset_number = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+
+			/* Break to let the next call handle the new block. */
+			break;
+		}
+	}
+
+	/*
+	 * Terminate the list of offsets for this block with an InvalidOffsetNumber.
+	 */
+	per_buffer_data[i] = InvalidOffsetNumber;
+	return result;
+}
+
 static void
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
 						   Snapshot snapshot,
-						   ValidateIndexState *state)
+						   ValidateIndexState *state,
+						   ValidateIndexState *auxState)
 {
-	TableScanDesc scan;
-	HeapScanDesc hscan;
-	HeapTuple	heapTuple;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-	BlockNumber previous_blkno = InvalidBlockNumber;
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
+
+	TupleTableSlot  *slot;
+	EState			*estate;
+	ExprContext		*econtext;
+	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	int64			num_to_check;
+	Tuplestorestate *tuples_for_check;
+	ValidateIndexScanState callback_private_data;
+
+	Buffer buf;
+	OffsetNumber *tuples;
+	ReadStream *read_stream;
+
+	/* Use a percentage of maintenance_work_mem for tuple store. */
+	int		store_work_mem_part = maintenance_work_mem * debug_cic_validate_store_mem_pct / 100;
+
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting
+	 * item pointers.  This can be significantly faster, primarily because TID
+	 * is a pass-by-reference type on all platforms, whereas int8 is
+	 * pass-by-value on most platforms.
+	 */
+	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
 	/*
 	 * sanity checks
 	 */
 	Assert(OidIsValid(indexRelation->rd_rel->relam));
 
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
+	num_to_check = heapam_index_validate_tuplesort_difference(state->tuplesort,
+														 auxState->tuplesort,
+														 tuples_for_check);
+
+	/* It is our responsibility to close tuple sort as fast as we can */
+	tuplesort_end(state->tuplesort);
+	tuplesort_end(auxState->tuplesort);
+
+	state->tuplesort = auxState->tuplesort = NULL;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+									&TTSOpsBufferHeapTuple);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
 
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+	callback_private_data.prev_block_number = InvalidBlockNumber;
+	callback_private_data.store = tuples_for_check;
+	callback_private_data.prev_offset_number = InvalidOffsetNumber;
 
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	scan = table_beginscan_strat(heapRelation,	/* relation */
-								 snapshot,	/* snapshot */
-								 0, /* number of keys */
-								 NULL,	/* scan key */
-								 true,	/* buffer access strategy OK */
-								 false);	/* syncscan not OK */
-	hscan = (HeapScanDesc) scan;
+	read_stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING,
+														 bstrategy,
+														 heapRelation, MAIN_FORKNUM,
+														 heapam_index_validate_scan_read_stream_next,
+														 &callback_private_data,
+														 (MaxHeapTuplesPerPage + 1) * sizeof(OffsetNumber));
 
-	pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_TOTAL,
-								 hscan->rs_nblocks);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_TOTAL, num_to_check);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE, 0);
 
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while ((buf = read_stream_next_buffer(read_stream, (void **) &tuples)) != InvalidBuffer)
 	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
+		HeapTupleData	heap_tuple_data[MaxHeapTuplesPerPage];
+		int i;
+		OffsetNumber off;
+		BlockNumber block_number;
 
 		CHECK_FOR_INTERRUPTS();
 
-		state->htups += 1;
+		LockBuffer(buf, BUFFER_LOCK_SHARE);
+		block_number = BufferGetBlockNumber(buf);
 
-		if ((previous_blkno == InvalidBlockNumber) ||
-			(hscan->rs_cblock != previous_blkno))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
-										 hscan->rs_cblock);
-			previous_blkno = hscan->rs_cblock;
+			ItemPointerData tid;
+			bool		all_dead, found;
+			ItemPointerSet(&tid, block_number, off);
+
+			found = heap_hot_search_buffer(&tid, heapRelation, buf, snapshot,
+										   &heap_tuple_data[i], &all_dead, true);
+			if (!found)
+				ItemPointerSetInvalid(&heap_tuple_data[i].t_self);
+			i++;
+			state->htups += 1;
 		}
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		/*
-		 * As commented in table_index_build_scan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
+		 * It is safe to access tuple data after releasing the buffer lock
+		 * because the buffer pin is still held, and the only operation that
+		 * could physically move tuple data on the page is
+		 * PageRepairFragmentation via heap_page_prune.  VACUUM conflicts with
+		 * CIC (both take ShareUpdateExclusiveLock), and opportunistic pruning
+		 * from concurrent DML cannot affect root tuples we are referencing.
 		 */
-		if (hscan->rs_cblock != root_blkno)
-		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
 		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
+		 * No predicate evaluation is needed here: the auxiliary STIR index
+		 * only contains TIDs for tuples that already satisfied the partial
+		 * index predicate at DML time (checked in ExecInsertIndexTuples).
 		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
+			if (ItemPointerIsValid(&heap_tuple_data[i].t_self))
 			{
+				ItemPointerData root_tid;
+				ItemPointerSet(&root_tid, block_number, off);
+
+				/* Reset the per-tuple memory context for the next fetch. */
+				MemoryContextReset(econtext->ecxt_per_tuple_memory);
+				ExecStoreBufferHeapTuple(&heap_tuple_data[i], slot, buf);
+
+				/* Compute the key values and null flags for this tuple. */
+				FormIndexDatum(indexInfo,
+							   slot,
+							   estate,
+							   values,
+							   isnull);
+
 				/*
-				 * Remember index items seen earlier on the current heap page
+				 * Insert the tuple into the target index.
 				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+				index_insert(indexRelation,
+							 values,
+							 isnull,
+							 &root_tid, /* insert root tuple */
+							 heapRelation,
+							 indexInfo->ii_Unique ?
+							 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+							 false,
+							 indexInfo);
+
+				state->tups_inserted += 1;
 			}
 
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  false, &ts_val, &ts_isnull,
-												  NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_TUPLES_DONE, 1);
+			i++;
 		}
 
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 false,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
+		ReleaseBuffer(buf);
 	}
 
-	table_endscan(scan);
-
 	ExecDropSingleTupleTableSlot(slot);
 
 	FreeExecutorState(estate);
 
+	read_stream_end(read_stream);
+	tuplestore_end(tuples_for_check);
+
+	FreeAccessStrategy(bstrategy);
+
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index cc067e58d36..b1417ec05c6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -715,6 +715,8 @@ UpdateIndexRelation(Oid indexoid,
  *			already exists.
  *		INDEX_CREATE_PARTITIONED:
  *			create a partitioned index (table must be partitioned)
+ *		INDEX_CREATE_AUXILIARY:
+ *			mark index as auxiliary index
  *		INDEX_CREATE_SUPPRESS_PROGRESS:
  *			don't report progress during the index build.
  *
@@ -723,6 +725,9 @@ UpdateIndexRelation(Oid indexoid,
  * allow_system_table_mods: allow table to be a system catalog
  * is_internal: if true, post creation hook for new index
  * constraintId: if not NULL, receives OID of created constraint
+ * relpersistence: persistence level to use for index. In most of the
+ *		cases it should be equal to the persistence level of the table,
+ *		auxiliary indexes are only exception here.
  *
  * Returns the OID of the created index.
  */
@@ -763,6 +768,7 @@ index_create(Relation heapRelation,
 	bool		invalid = (flags & INDEX_CREATE_INVALID) != 0;
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
+	bool		auxiliary = (flags & INDEX_CREATE_AUXILIARY) != 0;
 	bool		progress = (flags & INDEX_CREATE_SUPPRESS_PROGRESS) == 0;
 	char		relkind;
 	TransactionId relfrozenxid;
@@ -789,7 +795,10 @@ index_create(Relation heapRelation,
 	namespaceId = RelationGetNamespace(heapRelation);
 	shared_relation = heapRelation->rd_rel->relisshared;
 	mapped_relation = RelationIsMapped(heapRelation);
-	relpersistence = heapRelation->rd_rel->relpersistence;
+	if (auxiliary)
+		relpersistence = RELPERSISTENCE_UNLOGGED; /* aux indexes are always unlogged */
+	else
+		relpersistence = heapRelation->rd_rel->relpersistence;
 
 	/*
 	 * check parameters
@@ -797,6 +806,11 @@ index_create(Relation heapRelation,
 	if (indexInfo->ii_NumIndexAttrs < 1)
 		elog(ERROR, "must index at least one column");
 
+	if (indexInfo->ii_Am == STIR_AM_OID && !auxiliary)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("user-defined indexes with STIR access method are not supported")));
+
 	if (!allow_system_table_mods &&
 		IsSystemRelation(heapRelation) &&
 		IsNormalProcessingMode())
@@ -1402,7 +1416,8 @@ index_create_copy(Relation heapRelation, uint16 flags,
 							!concurrently,	/* isready */
 							concurrently,	/* concurrent */
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							false);
 
 	/* fetch exclusion constraint info if any */
 	if (indexRelation->rd_index->indisexclusion)
@@ -1422,13 +1437,16 @@ index_create_copy(Relation heapRelation, uint16 flags,
 	 * index information.  All this information will be used for the index
 	 * creation.
 	 */
-	for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
 	{
 		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
-		Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
 
-		indexColNames = lappend(indexColNames, NameStr(att->attname));
-		newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+		{
+			Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+			indexColNames = lappend(indexColNames, NameStr(att->attname));
+			newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		}
 	}
 
 	/* Extract opclass options for each attribute */
@@ -1490,6 +1508,157 @@ index_create_copy(Relation heapRelation, uint16 flags,
 	return newIndexId;
 }
 
+/*
+ * index_concurrently_create_aux
+ *
+ * Create concurrently an auxiliary index based on the definition of the one
+ * provided by caller.  The index is inserted into catalogs and needs to be
+ * built later on. This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ */
+Oid
+index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
+							   Oid tablespaceOid, const char *newName)
+{
+	Relation	indexRelation;
+	IndexInfo  *oldInfo,
+			*newInfo;
+	Oid			newIndexId = InvalidOid;
+	HeapTuple	indexTuple;
+
+	List	   *indexColNames = NIL;
+	List	   *indexExprs = NIL;
+	List	   *indexPreds = NIL;
+
+	Oid *auxOpclassIds;
+	int16 *auxColoptions;
+
+	indexRelation = index_open(mainIndexId, RowExclusiveLock);
+
+	/* The new index needs some information from the old index */
+	oldInfo = BuildIndexInfo(indexRelation);
+
+	/*
+	 * Build of an auxiliary index with exclusion constraints is not
+	 * supported.
+	 */
+	if (oldInfo->ii_ExclusionOps != NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("auxiliary index creation for exclusion constraints is not supported")));
+
+	/* Get the array of class and column options IDs from index info */
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(mainIndexId));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", mainIndexId);
+
+
+	/*
+	 * Fetch the list of expressions and predicates directly from the
+	 * catalogs.  This cannot rely on the information from IndexInfo of the
+	 * old index as these have been flattened for the planner.
+	 */
+	if (oldInfo->ii_Expressions != NIL)
+	{
+		Datum		exprDatum;
+		char	   *exprString;
+
+		exprDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indexprs);
+		exprString = TextDatumGetCString(exprDatum);
+		indexExprs = (List *) stringToNode(exprString);
+		pfree(exprString);
+	}
+	if (oldInfo->ii_Predicate != NIL)
+	{
+		Datum		predDatum;
+		char	   *predString;
+
+		predDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indpred);
+		predString = TextDatumGetCString(predDatum);
+		indexPreds = (List *) stringToNode(predString);
+
+		/* Also convert to implicit-AND format */
+		indexPreds = make_ands_implicit((Expr *) indexPreds);
+		pfree(predString);
+	}
+
+	/*
+	 * Build the index information for the new index.  Note that rebuild of
+	 * indexes with exclusion constraints is not supported, hence there is no
+	 * need to fill all the ii_Exclusion* fields.
+	 */
+	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
+							oldInfo->ii_NumIndexKeyAttrs,
+							STIR_AM_OID, /* special AM for aux indexes */
+							indexExprs,
+							indexPreds,
+							false,	/* aux index are not unique */
+							oldInfo->ii_NullsNotDistinct,
+							false,	/* not ready for inserts */
+							true,
+							false,	/* aux are not summarizing */
+							false,	/* aux are not without overlaps */
+							true	/* auxiliary */);
+
+	/*
+	 * Extract the list of column names and the column numbers for the new
+	 * index information.  All this information will be used for the index
+	 * creation.
+	 */
+	{
+		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
+
+		for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+		{
+			Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+			indexColNames = lappend(indexColNames, NameStr(att->attname));
+			newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		}
+	}
+
+	auxOpclassIds = palloc0(sizeof(Oid) * newInfo->ii_NumIndexAttrs);
+	auxColoptions = palloc0(sizeof(int16) * newInfo->ii_NumIndexAttrs);
+
+	/* Fill with "any ops" */
+	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
+	{
+		auxOpclassIds[i] = ANY_STIR_OPS_OID;
+		auxColoptions[i] = 0;
+	}
+
+	newIndexId = index_create(heapRelation,
+							  newName,
+							  InvalidOid,    /* indexRelationId */
+							  InvalidOid,    /* parentIndexRelid */
+							  InvalidOid,    /* parentConstraintId */
+							  InvalidRelFileNumber, /* relFileNumber */
+							  newInfo,
+							  indexColNames,
+							  STIR_AM_OID,
+							  tablespaceOid,
+							  indexRelation->rd_indcollation,
+							  auxOpclassIds,
+							  NULL,
+							  auxColoptions,
+							  NULL,
+							  (Datum) 0,
+							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT | INDEX_CREATE_AUXILIARY,
+							  0,
+							  true, /* allow table to be a system catalog? */
+							  false,    /* is_internal? */
+							  NULL);
+
+	/* Close the relations used and clean up */
+	index_close(indexRelation, NoLock);
+	ReleaseSysCache(indexTuple);
+
+	return newIndexId;
+}
+
 /*
  * index_concurrently_build
  *
@@ -2470,7 +2639,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2530,7 +2700,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3309,12 +3480,21 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * We do a concurrent index build by first inserting the catalog entry for the
  * index via index_create(), marking it not indisready and not indisvalid.
+ * Then we create special auxiliary index the same way. It based on STIR AM.
  * Then we commit our transaction and start a new one, then we wait for all
  * transactions that could have been modifying the table to terminate.  Now
- * we know that any subsequently-started transactions will see the index and
+ * we know that any subsequently-started transactions will see indexes and
  * honor its constraints on HOT updates; so while existing HOT-chains might
  * be broken with respect to the index, no currently live tuple will have an
- * incompatible HOT update done to it.  We now build the index normally via
+ * incompatible HOT update done to it.
+ *
+ * After that, we build the auxiliary index. It is fast operation without any actual
+ * table scan. As result, we have empty STIR index. We commit transaction and
+ * again wait for all transactions that could have been modifying the table
+ * to terminate. At that moment all new tuples are going to be inserted into
+ * auxiliary index.
+ *
+ * We now build the index normally via
  * index_build(), while holding a weak lock that allows concurrent
  * insert/update/delete.  Also, we index only tuples that are valid
  * as of the start of the scan (see table_index_build_scan), whereas a normal
@@ -3324,14 +3504,17 @@ IndexCheckExclusion(Relation heapRelation,
  * bogus unique-index failures due to concurrent UPDATEs (we might see
  * different versions of the same row as being valid when we pass over them,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
- * does not contain any tuples added to the table while we built the index.
+ * does not contain any tuples added to the table while we built the index
+ * (but these tuples contained in auxiliary index).
  *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
- * commit the second transaction and start a third.  Again we wait for all
+ * commit the third transaction and start a fourth.  Again we wait for all
  * transactions that could have been modifying the table to terminate.  Now
  * we know that any subsequently-started transactions will see the index and
- * insert their new tuples into it.  We then take a new reference snapshot
- * which is passed to validate_index().  Any tuples that are valid according
+ * insert their new tuples into it. At the same moment we clear "indisready" for
+ * auxiliary index, since it is no more required to be updated.
+ *
+ * We then take a new reference snapshot, any tuples that are valid according
  * to this snap, but are not in the index, must be added to the index.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
@@ -3339,12 +3522,14 @@ IndexCheckExclusion(Relation heapRelation,
  * that might care about them before we mark the index valid.)
  *
  * validate_index() works by first gathering all the TIDs currently in the
- * index, using a bulkdelete callback that just stores the TIDs and doesn't
+ * indexes, using a bulkdelete callback that just stores the TIDs and doesn't
  * ever say "delete it".  (This should be faster than a plain indexscan;
  * also, not all index AMs support full-index indexscan.)  Then we sort the
- * TIDs, and finally scan the table doing a "merge join" against the TID list
- * to see which tuples are missing from the index.  Thus we will ensure that
- * all tuples valid according to the reference snapshot are in the index.
+ * TIDs of both auxiliary and target indexes, and doing a "merge join" against
+ * the TID lists to see which tuples from auxiliary index are missing from the
+ * target index.  Thus we will ensure that all tuples valid according to the
+ * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
  * tuple that is already dead or is in process of being deleted, and we
@@ -3362,22 +3547,26 @@ IndexCheckExclusion(Relation heapRelation,
  * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
  * transactions will be able to use it for queries.
  *
- * Doing two full table scans is a brute-force strategy.  We could try to be
- * cleverer, eg storing new tuples in a special area of the table (perhaps
- * making the table append-only by setting use_fsm).  However that would
- * add yet more locking issues.
+ * Also, some actions to concurrent drop the auxiliary index are performed.
  */
 void
-validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 {
 	Relation	heapRelation,
-				indexRelation;
+				indexRelation,
+				auxIndexRelation;
 	IndexInfo  *indexInfo;
-	IndexVacuumInfo ivinfo;
-	ValidateIndexState state;
+	IndexVacuumInfo ivinfo, auxivinfo;
+	ValidateIndexState state, auxState;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	/* Use 80% of maintenance_work_mem to target index sorting and
+	 * 10% rest for auxiliary.
+	 *
+	 * Rest 10% will be used for tuplestore later. */
+	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
+	int			aux_work_mem_part = maintenance_work_mem / 10;
 
 	{
 		const int	progress_index[] = {
@@ -3410,6 +3599,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	RestrictSearchPath();
 
 	indexRelation = index_open(indexId, RowExclusiveLock);
+	auxIndexRelation = index_open(auxIndexId, RowExclusiveLock);
 
 	/*
 	 * Fetch info needed for index_insert.  (You might think this should be
@@ -3434,15 +3624,49 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.strategy = NULL;
 	ivinfo.validate_index = true;
 
+	/*
+	 * Copy all info to auxiliary info, changing only relation.
+	 */
+	auxivinfo = ivinfo;
+	auxivinfo.index = auxIndexRelation;
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
 	 * is a pass-by-reference type on all platforms, whereas int8 is
 	 * pass-by-value on most platforms.
 	 */
+	auxState.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
+										   InvalidOid, false,
+										   aux_work_mem_part,
+										   NULL, TUPLESORT_NONE);
+	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
+
+	(void) index_bulk_delete(&auxivinfo, NULL,
+							 validate_index_callback, &auxState);
+	/* If aux index is empty, merge may be skipped */
+	if (auxState.itups == 0)
+	{
+		tuplesort_end(auxState.tuplesort);
+		auxState.tuplesort = NULL;
+
+		/* Roll back any GUC changes executed by index functions */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+
+		/* Close rels, but keep locks */
+		index_close(auxIndexRelation, NoLock);
+		index_close(indexRelation, NoLock);
+		table_close(heapRelation, NoLock);
+
+		return;
+	}
+
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
 											InvalidOid, false,
-											maintenance_work_mem,
+											(int) main_work_mem_part,
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
@@ -3465,27 +3689,30 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	tuplesort_performsort(auxState.tuplesort);
 
 	/*
-	 * Now scan the heap and "merge" it with the index
+	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN);
+								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
 	table_index_validate_scan(heapRelation,
 							  indexRelation,
 							  indexInfo,
 							  snapshot,
-							  &state);
+							  &state,
+							  &auxState);
 
-	/* Done with tuplesort object */
-	tuplesort_end(state.tuplesort);
+	/* Tuple sort closed by table_index_validate_scan */
+	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
 
 	/* Make sure to release resources cached in indexInfo (if needed). */
 	index_insert_cleanup(indexRelation, indexInfo);
 
 	elog(DEBUG2,
-		 "validate_index found %.0f heap tuples, %.0f index tuples; inserted %.0f missing tuples",
-		 state.htups, state.itups, state.tups_inserted);
+		 "validate_index fetched %.0f heap tuples, %.0f index tuples;"
+						" %.0f aux index tuples; inserted %.0f missing tuples",
+		 state.htups, state.itups, auxState.itups, state.tups_inserted);
 
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
@@ -3494,6 +3721,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
 	/* Close rels, but keep locks */
+	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
 }
@@ -3554,6 +3782,12 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			Assert(!indexForm->indisvalid);
 			indexForm->indisvalid = true;
 			break;
+		case INDEX_DROP_CLEAR_READY:
+			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
+			Assert(indexForm->indisready);
+			Assert(!indexForm->indisvalid);
+			indexForm->indisready = false;
+			break;
 		case INDEX_DROP_CLEAR_VALID:
 
 			/*
@@ -3825,6 +4059,13 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		indexInfo->ii_ExclusionStrats = NULL;
 	}
 
+	/* Auxiliary indexes are not allowed to be rebuilt */
+	if (indexInfo->ii_Auxiliary)
+		ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("reindex of auxiliary index \"%s\" not supported",
+					RelationGetRelationName(iRel))));
+
 	/* Suppress use of the target index while rebuilding it */
 	SetReindexProcessing(heapId, indexId);
 
@@ -4067,6 +4308,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
+		Oid			indexAm = get_rel_relam(indexOid);
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4092,6 +4334,18 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
+		if (indexAm == STIR_AM_OID)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+							get_namespace_name(indexNamespaceId),
+							get_rel_name(indexOid))));
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			continue;
+		}
+
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index eba25aa3e4d..5dcd318012e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1388,16 +1388,17 @@ CREATE VIEW pg_stat_progress_create_index AS
                       END AS command,
         CASE S.param10 WHEN 0 THEN 'initializing'
                        WHEN 1 THEN 'waiting for writers before build'
-                       WHEN 2 THEN 'building index' ||
+                       WHEN 2 THEN 'waiting for writers to use auxiliary index'
+                       WHEN 3 THEN 'building index' ||
                            COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)),
                                     '')
-                       WHEN 3 THEN 'waiting for writers before validation'
-                       WHEN 4 THEN 'index validation: scanning index'
-                       WHEN 5 THEN 'index validation: sorting tuples'
-                       WHEN 6 THEN 'index validation: scanning table'
-                       WHEN 7 THEN 'waiting for old snapshots'
-                       WHEN 8 THEN 'waiting for readers before marking dead'
-                       WHEN 9 THEN 'waiting for readers before dropping'
+                       WHEN 4 THEN 'waiting for writers before validation'
+                       WHEN 5 THEN 'index validation: scanning index'
+                       WHEN 6 THEN 'index validation: sorting tuples'
+                       WHEN 7 THEN 'index validation: merging indexes'
+                       WHEN 8 THEN 'waiting for old snapshots'
+                       WHEN 9 THEN 'waiting for readers before marking dead'
+                       WHEN 10 THEN 'waiting for readers before dropping'
                        END as phase,
         S.param4 AS lockers_total,
         S.param5 AS lockers_done,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9ab74c8df0a..2d7b6b7eb8b 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -183,6 +183,7 @@ CheckIndexCompatible(Oid oldId,
 					 bool isWithoutOverlaps)
 {
 	bool		isconstraint;
+	bool		isauxiliary;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
 	Oid		   *opclassIds;
@@ -233,6 +234,7 @@ CheckIndexCompatible(Oid oldId,
 
 	amcanorder = amRoutine->amcanorder;
 	amsummarizing = amRoutine->amsummarizing;
+	isauxiliary = accessMethodId == STIR_AM_OID;
 
 	/*
 	 * Compute the operator classes, collations, and exclusion operators for
@@ -244,7 +246,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing,
+							  isWithoutOverlaps, isauxiliary);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -557,6 +560,7 @@ DefineIndex(ParseState *pstate,
 {
 	bool		concurrent;
 	char	   *indexRelationName;
+	char	   *auxIndexRelationName = NULL;
 	char	   *accessMethodName;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
@@ -566,6 +570,7 @@ DefineIndex(ParseState *pstate,
 	Oid			namespaceId;
 	Oid			tablespaceId;
 	Oid			createdConstraintId = InvalidOid;
+	Oid			auxIndexRelationId = InvalidOid;
 	List	   *indexColNames;
 	List	   *allIndexParams;
 	Relation	rel;
@@ -587,6 +592,7 @@ DefineIndex(ParseState *pstate,
 	int			numberOfKeyAttributes;
 	TransactionId limitXmin;
 	ObjectAddress address;
+	ObjectAddress auxAddress;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
@@ -837,6 +843,15 @@ DefineIndex(ParseState *pstate,
 											stmt->excludeOpNames,
 											stmt->primary,
 											stmt->isconstraint);
+	/*
+	 * Select name for auxiliary index
+	 */
+	if (concurrent)
+		auxIndexRelationName = ChooseRelationName(indexRelationName,
+												  NULL,
+												  "ccaux",
+												  namespaceId,
+												  false);
 
 	/*
 	 * look up the access method, verify it can handle the requested features
@@ -931,7 +946,8 @@ DefineIndex(ParseState *pstate,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  false);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -1603,6 +1619,16 @@ DefineIndex(ParseState *pstate,
 		return address;
 	}
 
+	/*
+	 * In case of concurrent build - create auxiliary index record.
+	 */
+	if (concurrent)
+	{
+		auxIndexRelationId = index_concurrently_create_aux(rel, indexRelationId,
+											tablespaceId, auxIndexRelationName);
+		ObjectAddressSet(auxAddress, RelationRelationId, auxIndexRelationId);
+	}
+
 	AtEOXact_GUC(false, root_save_nestlevel);
 	SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 
@@ -1631,11 +1657,11 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * For a concurrent build, it's important to make the catalog entries
 	 * visible to other transactions before we start to build the index. That
-	 * will prevent them from making incompatible HOT updates.  The new index
-	 * will be marked not indisready and not indisvalid, so that no one else
-	 * tries to either insert into it or use it for queries.
+	 * will prevent them from making incompatible HOT updates. New indexes
+	 * (main and auxiliary) will be marked not indisready and not indisvalid,
+	 * so that no one else tries to either insert into it or use it for queries.
 	 *
-	 * We must commit our current transaction so that the index becomes
+	 * We must commit our current transaction so that the indexes becomes
 	 * visible; then start another.  Note that all the data structures we just
 	 * built are lost in the commit.  The only data we keep past here are the
 	 * relation IDs.
@@ -1645,7 +1671,7 @@ DefineIndex(ParseState *pstate,
 	 * cannot block, even if someone else is waiting for access, because we
 	 * already have the same lock within our transaction.
 	 *
-	 * Note: we don't currently bother with a session lock on the index,
+	 * Note: we don't currently bother with a session lock on the indexes,
 	 * because there are no operations that could change its state while we
 	 * hold lock on the parent table.  This might need to change later.
 	 */
@@ -1684,7 +1710,7 @@ DefineIndex(ParseState *pstate,
 	 * with the old list of indexes.  Use ShareLock to consider running
 	 * transactions that hold locks that permit writing to the table.  Note we
 	 * do not need to worry about xacts that open the table for writing after
-	 * this point; they will see the new index when they open it.
+	 * this point; they will see the new indexes when they open it.
 	 *
 	 * Note: the reason we use actual lock acquisition here, rather than just
 	 * checking the ProcArray and sleeping, is that deadlock is possible if
@@ -1696,14 +1722,44 @@ DefineIndex(ParseState *pstate,
 
 	/*
 	 * At this moment we are sure that there are no transactions with the
-	 * table open for write that don't have this new index in their list of
+	 * table open for write that don't have this new indexes in their list of
 	 * indexes.  We have waited out all the existing transactions and any new
-	 * transaction will have the new index in its list, but the index is still
-	 * marked as "not-ready-for-inserts".  The index is consulted while
+	 * transaction will have both new indexes in its list, but indexes are still
+	 * marked as "not-ready-for-inserts". The indexes are consulted while
 	 * deciding HOT-safety though.  This arrangement ensures that no new HOT
 	 * chains can be created where the new tuple and the old tuple in the
 	 * chain have different index keys.
 	 *
+	 * Now call build on auxiliary index. Index will be created empty without
+	 * any actual heap scan, but marked as "ready-for-inserts". The goal of
+	 * that index is accumulate new tuples while main index is actually built.
+	 */
+
+	/* Set ActiveSnapshot since functions in the indexes may need it */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	index_concurrently_build(tableId, auxIndexRelationId);
+	/* we can do away with our snapshot */
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Now we need to ensure there are no transactions with the auxiliary index
+	 * marked as "not-ready-for-inserts".
+	 */
+	WaitForLockers(heaplocktag, ShareLock, true);
+
+	/*
+	 * At this moment we are sure that all new tuples in table are inserted into
+	 * the auxiliary index. Now it is time to build the target index itself.
+	 *
 	 * We now take a new snapshot, and build the index using all tuples that
 	 * are visible in this snapshot.  We can be sure that any HOT updates to
 	 * these tuples will be compatible with the index, since any updates made
@@ -1738,9 +1794,28 @@ DefineIndex(ParseState *pstate,
 	 * the index marked as read-only for updates.
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForLockers(heaplocktag, ShareLock, true);
 
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/*
+	 * Now target index is marked as "ready" for all transactions. So, auxiliary
+	 * index is no longer needed. So, start removing process by reverting "ready"
+	 * flag.
+	 */
+	index_set_state_flags(auxIndexRelationId, INDEX_DROP_CLEAR_READY);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * Now take the "reference snapshot" that will be used by validate_index()
 	 * to filter candidate tuples.  Beware!  There might still be snapshots in
@@ -1758,24 +1833,14 @@ DefineIndex(ParseState *pstate,
 	 */
 	snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	PushActiveSnapshot(snapshot);
-
 	/*
-	 * Scan the index and the heap, insert any missing index entries.
-	 */
-	validate_index(tableId, indexRelationId, snapshot);
-
-	/*
-	 * Drop the reference snapshot.  We must do this before waiting out other
-	 * snapshot holders, else we will deadlock against other processes also
-	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
-	 * they must wait for.  But first, save the snapshot's xmin to use as
-	 * limitXmin for GetCurrentVirtualXIDs().
+	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
+	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
 	limitXmin = snapshot->xmin;
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
-
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1802,7 +1867,7 @@ DefineIndex(ParseState *pstate,
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForOlderSnapshots(limitXmin, true);
 
 	/*
@@ -1827,6 +1892,53 @@ DefineIndex(ParseState *pstate,
 	 * to replan; so relcache flush on the index itself was sufficient.)
 	 */
 	CacheInvalidateRelcacheByRelid(heaprelid.relId);
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+	/* Now wait for all transaction to see auxiliary as "non-ready for inserts" */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/* Now it is time to mark auxiliary index as dead */
+	index_concurrently_set_dead(tableId, auxIndexRelationId);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_6);
+	/* Now wait for all transaction to ignore auxiliary because it is dead */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Drop auxiliary index.
+	 *
+	 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
+	 * right lock level.
+	 */
+	performDeletion(&auxAddress, DROP_RESTRICT,
+							 PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
 
 	/*
 	 * Last thing to do is release the session-level lock on the parent table.
@@ -3598,6 +3710,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	typedef struct ReindexIndexInfo
 	{
 		Oid			indexId;
+		Oid			auxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -3703,8 +3816,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
 														   ShareUpdateExclusiveLock);
+					IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-					if (!indexRelation->rd_index->indisvalid)
+
+					if (indexInfo->ii_Auxiliary)
+						ereport(WARNING,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+									get_namespace_name(get_rel_namespace(cellOid)),
+									get_rel_name(cellOid))));
+					else if (!indexRelation->rd_index->indisvalid)
 						ereport(WARNING,
 								(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 								 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3756,8 +3876,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
 															   ShareUpdateExclusiveLock);
+						IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-						if (!indexRelation->rd_index->indisvalid)
+						if (indexInfo->ii_Auxiliary)
+							ereport(WARNING,
+									(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+									 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+											get_namespace_name(get_rel_namespace(cellOid)),
+											get_rel_name(cellOid))));
+						else if (!indexRelation->rd_index->indisvalid)
 							ereport(WARNING,
 									(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 									 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3818,6 +3945,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot reindex invalid index on TOAST table")));
 
+				/* Auxiliary indexes are not allowed to be rebuilt */
+				if (get_rel_relam(relationOid) == STIR_AM_OID)
+					ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("reindex of auxiliary index \"%s\" not supported",
+								get_rel_name(relationOid))));
+
 				/*
 				 * Check if parent relation can be locked and if it exists,
 				 * this needs to be done at this stage as the list of indexes
@@ -3921,15 +4055,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
+		char	   *auxConcurrentName;
 		ReindexIndexInfo *idx = lfirst(lc);
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
+		Oid			auxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
 		int			save_sec_context;
 		int			save_nestlevel;
 		Relation	newIndexRel;
+		Relation	auxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -3980,6 +4117,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 											"ccnew",
 											get_rel_namespace(indexRel->rd_index->indrelid),
 											false);
+		auxConcurrentName = ChooseRelationName(get_rel_name(idx->indexId),
+											NULL,
+											"ccaux",
+											get_rel_namespace(indexRel->rd_index->indrelid),
+											false);
 
 		/* Choose the new tablespace, indexes of toast tables are not moved */
 		if (OidIsValid(params->tablespaceOid) &&
@@ -3997,11 +4139,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 									   tablespaceid,
 									   concurrentName);
 
+		auxIndexId = index_concurrently_create_aux(heapRel,
+												   newIndexId,
+												   tablespaceid,
+												   auxConcurrentName);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
+		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4010,6 +4158,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
+		newidx->auxIndexId = auxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4028,10 +4177,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		lockrelid = palloc_object(LockRelId);
+		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
+		relationLocks = lappend(relationLocks, lockrelid);
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
 		/* Roll back any GUC changes executed by index functions */
@@ -4112,13 +4265,60 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * doing that, wait until no running transactions could have the table of
 	 * the index open with the old list of indexes.  See "phase 2" in
 	 * DefineIndex() for more details.
+	*/
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * Now build all auxiliary indexes and mark them as "ready-for-inserts".
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		StartTransactionCommand();
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/* Set ActiveSnapshot since functions in the indexes may need it */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Build auxiliary index, it is fast - without any actual heap scan, just an empty index. */
+		index_concurrently_build(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	StartTransactionCommand();
+
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Wait until all auxiliary indexes are taken into account by all
+	 * transactions.
+	 */
 	WaitForLockersMultiple(lockTags, ShareLock, true);
 	CommitTransactionCommand();
 
+	/* Now it is time to perform target index build. */
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4165,6 +4365,41 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * need to set the PROC_IN_SAFE_IC flag here.
 	 */
 
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * At this moment all target indexes are marked as "ready-to-insert". So,
+	 * we are free to start process of dropping auxiliary indexes.
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+		StartTransactionCommand();
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		PopActiveSnapshot();
+
+		CommitTransactionCommand();
+	}
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -4172,12 +4407,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * were created during the previous phase.  See "phase 3" in DefineIndex()
 	 * for more details.
 	 */
-
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
-	WaitForLockersMultiple(lockTags, ShareLock, true);
-	CommitTransactionCommand();
-
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4215,7 +4444,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -4244,7 +4473,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * there's no need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-									 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+									 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 		WaitForOlderSnapshots(limitXmin, true);
 
 		CommitTransactionCommand();
@@ -4335,14 +4564,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
-	 * Mark the old indexes as dead.  First we must wait until no running
-	 * transaction could be using the index for a query.  See also
+	 * Mark the old and auxiliary indexes as dead. First we must wait until no
+	 * running transaction could be using the index for a query.  See also
 	 * index_drop() for more details.
 	 */
 
 	INJECTION_POINT("reindex-relation-concurrently-before-set-dead", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	foreach(lc, indexIds)
@@ -4367,6 +4596,28 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PopActiveSnapshot();
 	}
 
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+	}
+
 	/* Commit this transaction to make the updates visible. */
 	CommitTransactionCommand();
 	StartTransactionCommand();
@@ -4380,11 +4631,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
-	 * Drop the old indexes.
+	 * Drop the old and auxiliary indexes.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_6);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -4404,6 +4655,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			add_exact_object_address(&object, objects);
 		}
 
+		foreach(lc, newIndexIds)
+		{
+			ReindexIndexInfo *idx = lfirst(lc);
+			ObjectAddress object;
+
+			object.classId = RelationRelationId;
+			object.objectId = idx->auxIndexId;
+			object.objectSubId = 0;
+
+			add_exact_object_address(&object, objects);
+		}
+
 		/*
 		 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
 		 * right lock level.
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 5359dab1176..84f7cf9824e 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, bool auxiliary)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -850,6 +850,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Concurrent = concurrent;
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
+	n->ii_Auxiliary = auxiliary;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
@@ -875,7 +876,6 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
-	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 7a8a5d0764c..4f8761de6b9 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -632,6 +632,15 @@
   boot_val => 'DEFAULT_ASSERT_ENABLED',
 },
 
+{ name => 'debug_cic_validate_store_mem_pct', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Percentage of maintenance_work_mem used for CIC validation tuplestore.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'debug_cic_validate_store_mem_pct',
+  boot_val => '10',
+  min => '1',
+  max => '90',
+},
+
 { name => 'debug_copy_parse_plan_trees', type => 'bool', context => 'PGC_SUSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Set this to force all parse and plan trees to be passed through copyObject(), to facilitate catching errors and omissions in copyObject().',
   flags => 'GUC_NOT_IN_SAMPLE',
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 4647785fd35..fafca930aae 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -738,7 +738,8 @@ typedef struct TableAmRoutine
 										Relation index_rel,
 										IndexInfo *index_info,
 										Snapshot snapshot,
-										ValidateIndexState *state);
+										ValidateIndexState *state,
+										ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1892,19 +1893,24 @@ table_index_build_range_scan(Relation table_rel,
  * table_index_validate_scan - second table scan for concurrent index build
  *
  * See validate_index() for an explanation.
+ *
+ * Note: it is responsibility of that function to close sortstates in
+ * both `state` and `auxstate`.
  */
 static inline void
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
 						  Snapshot snapshot,
-						  ValidateIndexState *state)
+						  ValidateIndexState *state,
+						  ValidateIndexState *auxstate)
 {
 	table_rel->rd_tableam->index_validate_scan(table_rel,
 											   index_rel,
 											   index_info,
 											   snapshot,
-											   state);
+											   state,
+											   auxstate);
 }
 
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9aee8226347..3239e5c716f 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -31,6 +31,7 @@ typedef enum
 {
 	INDEX_CREATE_SET_READY,
 	INDEX_CREATE_SET_VALID,
+	INDEX_DROP_CLEAR_READY,
 	INDEX_DROP_CLEAR_VALID,
 	INDEX_DROP_SET_DEAD,
 } IndexStateFlagsAction;
@@ -72,6 +73,7 @@ extern void index_check_primary_key(Relation heapRel,
 #define	INDEX_CREATE_PARTITIONED			(1 << 5)
 #define INDEX_CREATE_INVALID				(1 << 6)
 #define INDEX_CREATE_SUPPRESS_PROGRESS		(1 << 7)
+#define INDEX_CREATE_AUXILIARY				(1 << 8)
 
 extern Oid	index_create(Relation heapRelation,
 						 const char *indexRelationName,
@@ -106,6 +108,11 @@ extern Oid	index_create_copy(Relation heapRelation, uint16 flags,
 							  Oid oldIndexId, Oid tablespaceOid,
 							  const char *newName);
 
+extern Oid	index_concurrently_create_aux(Relation heapRelation,
+										  Oid mainIndexId,
+										  Oid tablespaceOid,
+										  const char *newName);
+
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
 
@@ -152,7 +159,7 @@ extern void index_build(Relation heapRelation,
 						bool parallel,
 						bool progress);
 
-extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
+extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 67948667a97..35990693f39 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -117,14 +117,15 @@
 
 /* Phases of CREATE INDEX (as advertised via PROGRESS_CREATEIDX_PHASE) */
 #define PROGRESS_CREATEIDX_PHASE_WAIT_1			1
-#define PROGRESS_CREATEIDX_PHASE_BUILD			2
-#define PROGRESS_CREATEIDX_PHASE_WAIT_2			3
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	4
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		5
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN	6
-#define PROGRESS_CREATEIDX_PHASE_WAIT_3			7
+#define PROGRESS_CREATEIDX_PHASE_WAIT_2			2
+#define PROGRESS_CREATEIDX_PHASE_BUILD			3
+#define PROGRESS_CREATEIDX_PHASE_WAIT_3			4
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	5
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		6
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE	7
 #define PROGRESS_CREATEIDX_PHASE_WAIT_4			8
 #define PROGRESS_CREATEIDX_PHASE_WAIT_5			9
+#define PROGRESS_CREATEIDX_PHASE_WAIT_6			10
 
 /*
  * Subphases of CREATE INDEX, for index_build.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 7277c37e779..7ea643b7f80 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -268,6 +268,7 @@ extern PGDLLIMPORT bool allowSystemTableMods;
 extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
+extern PGDLLIMPORT int debug_cic_validate_store_mem_pct;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
 
 /*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index bf54d39feb0..cd7f1eb0592 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,8 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								bool auxiliary);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 55538c4c41e..d1723f47e89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1437,6 +1437,7 @@ DETAIL:  Key (f1)=(b) already exists.
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
 ERROR:  could not create unique index "concur_index3"
 DETAIL:  Key (f2)=(b) is duplicated.
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -3211,6 +3212,7 @@ INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
 ERROR:  could not create unique index "concur_reindex_ind5"
 DETAIL:  Key (c1)=(1) is duplicated.
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
@@ -3223,8 +3225,10 @@ DETAIL:  Key (c1)=(1) is duplicated.
  c1     | integer |           |          | 
 Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1) INVALID
+    "concur_reindex_ind5_ccaux" stir (c1) INVALID
     "concur_reindex_ind5_ccnew" UNIQUE, btree (c1) INVALID
 
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -3252,6 +3256,44 @@ Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1)
 
 DROP TABLE concur_reindex_tab4;
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+ERROR:  relation "concur_reindex_tab4" does not exist
+LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+                    ^
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
+HINT:  Use DROP INDEX or REINDEX INDEX.
+WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index f50868ca6a6..b34009f868c 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1585,10 +1585,11 @@ select indexrelid::regclass, indisvalid,
 --------------------------------+------------+-----------------------+-------------------------------
  parted_isvalid_idx             | f          | parted_isvalid_tab    | 
  parted_isvalid_idx_11          | f          | parted_isvalid_tab_11 | parted_isvalid_tab_1_expr_idx
+ parted_isvalid_idx_11_ccaux    | f          | parted_isvalid_tab_11 | 
  parted_isvalid_tab_12_expr_idx | t          | parted_isvalid_tab_12 | parted_isvalid_tab_1_expr_idx
  parted_isvalid_tab_1_expr_idx  | f          | parted_isvalid_tab_1  | parted_isvalid_idx
  parted_isvalid_tab_2_expr_idx  | t          | parted_isvalid_tab_2  | parted_isvalid_idx
-(5 rows)
+(6 rows)
 
 drop table parted_isvalid_tab;
 -- Check state of replica indexes when attaching a partition.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 81a73c426d2..ea52f0725c3 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2064,14 +2064,15 @@ pg_stat_progress_create_index| SELECT s.pid,
         CASE s.param10
             WHEN 0 THEN 'initializing'::text
             WHEN 1 THEN 'waiting for writers before build'::text
-            WHEN 2 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
-            WHEN 3 THEN 'waiting for writers before validation'::text
-            WHEN 4 THEN 'index validation: scanning index'::text
-            WHEN 5 THEN 'index validation: sorting tuples'::text
-            WHEN 6 THEN 'index validation: scanning table'::text
-            WHEN 7 THEN 'waiting for old snapshots'::text
-            WHEN 8 THEN 'waiting for readers before marking dead'::text
-            WHEN 9 THEN 'waiting for readers before dropping'::text
+            WHEN 2 THEN 'waiting for writers to use auxiliary index'::text
+            WHEN 3 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
+            WHEN 4 THEN 'waiting for writers before validation'::text
+            WHEN 5 THEN 'index validation: scanning index'::text
+            WHEN 6 THEN 'index validation: sorting tuples'::text
+            WHEN 7 THEN 'index validation: merging indexes'::text
+            WHEN 8 THEN 'waiting for old snapshots'::text
+            WHEN 9 THEN 'waiting for readers before marking dead'::text
+            WHEN 10 THEN 'waiting for readers before dropping'::text
             ELSE NULL::text
         END AS phase,
     s.param4 AS lockers_total,
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 82e4062a215..c2c1b031527 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -503,6 +503,7 @@ CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS concur_index2 ON concur_heap(f1);
 INSERT INTO concur_heap VALUES ('b','x');
 -- check if constraint is enforced properly at build time
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -1315,10 +1316,12 @@ CREATE TABLE concur_reindex_tab4 (c1 int);
 INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 -- This trick creates an invalid index.
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -1330,6 +1333,24 @@ REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
 DROP TABLE concur_reindex_tab4;
 
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+\d aux_index_tab5
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+DROP TABLE aux_index_tab5;
+
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
-- 
2.43.0



  [application/octet-stream] v34-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch (31.7K, 6-v34-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch)
  download | inline diff:
From aecd02fb2d73787706687add200069f57216d1d8 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 14:36:31 +0100
Subject: [PATCH v34 5/7] Track and drop auxiliary indexes in DROP/REINDEX

During concurrent index operations, auxiliary indexes may be left as orphaned objects when errors occur (junk auxiliary indexes).

This patch improves the handling of such auxiliary indexes:
- add auxiliaryForIndexId parameter to index_create() to track dependencies between main and auxiliary indexes
- automatically drop auxiliary indexes when the main index is dropped
- delete junk auxiliary indexes properly during REINDEX operations
---
 doc/src/sgml/ref/create_index.sgml         |  14 ++-
 doc/src/sgml/ref/reindex.sgml              |   8 +-
 src/backend/catalog/dependency.c           |   2 +-
 src/backend/catalog/index.c                |  78 +++++++++++----
 src/backend/catalog/pg_depend.c            |  62 ++++++++++++
 src/backend/catalog/toasting.c             |   1 +
 src/backend/commands/indexcmds.c           |  37 +++++++-
 src/backend/commands/tablecmds.c           |  52 +++++++++-
 src/backend/nodes/makefuncs.c              |   3 +-
 src/include/catalog/dependency.h           |   1 +
 src/include/nodes/execnodes.h              |   2 +
 src/include/nodes/makefuncs.h              |   2 +-
 src/test/regress/expected/create_index.out | 105 +++++++++++++++++++--
 src/test/regress/sql/create_index.sql      |  57 ++++++++++-
 14 files changed, 380 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 12c88587a79..406c02e866e 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -668,10 +668,16 @@ Indexes:
     "idx_ccaux" stir (col) INVALID
 </programlisting>
 
-    The recommended recovery
-    method in such cases is to drop these indexes and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
-    to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
+    The recommended recovery method in such cases is to drop the index with
+    <command>DROP INDEX</command>. The auxiliary index (suffixed with
+    <literal>_ccaux</literal>) will be automatically dropped when the main
+    index is dropped. After dropping the indexes, you can try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is to
+    rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>,
+    which will also handle cleanup of any invalid auxiliary indexes.)
+    If the only invalid index is one suffixed <literal>_ccaux</literal>,
+    the recommended recovery method is just <literal>DROP INDEX</literal>
+    for that index.
    </para>
 
    <para>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 9e0248261ae..ac9cfec5c55 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -476,11 +476,15 @@ Indexes:
     <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
     recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
-    then attempt <command>REINDEX CONCURRENTLY</command> again.
+    then attempt <command>REINDEX CONCURRENTLY</command> again. The auxiliary index
+    (suffixed with <literal>_ccaux</literal>) will be automatically dropped
+    along with its main index.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
     the recommended recovery method is to just drop said index, since the
-    rebuild proper has been successful.
+    rebuild proper has been successful. If the only
+    invalid index is one suffixed <literal>_ccaux</literal>, the recommended
+    recovery method is just <literal>DROP INDEX</literal> for that index.
     A nonzero number may be appended to the suffix of the invalid index
     names to keep them unique, like <literal>_ccnew1</literal>,
     <literal>_ccold2</literal>, etc.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fdb8e67e1f5..c6941fb19d1 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -292,7 +292,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.  (Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object, 0);
+	AcquireDeletionLock(object, flags);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b1417ec05c6..9136dfc7c73 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -780,6 +780,8 @@ index_create(Relation heapRelation,
 		   ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
 	/* partitioned indexes must never be "built" by themselves */
 	Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
+	/* ii_AuxiliaryForIndexId and INDEX_CREATE_AUXILIARY are required both or neither */
+	Assert(OidIsValid(indexInfo->ii_AuxiliaryForIndexId) == auxiliary);
 
 	relkind = partitioned ? RELKIND_PARTITIONED_INDEX : RELKIND_INDEX;
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -1185,6 +1187,15 @@ index_create(Relation heapRelation,
 			recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_SEC);
 		}
 
+		/*
+		 * Record dependency on the main index in case of auxiliary index.
+		 */
+		if (OidIsValid(indexInfo->ii_AuxiliaryForIndexId))
+		{
+			ObjectAddressSet(referenced, RelationRelationId, indexInfo->ii_AuxiliaryForIndexId);
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+		}
+
 		/* placeholder for normal dependencies */
 		addrs = new_object_addresses();
 
@@ -1417,7 +1428,8 @@ index_create_copy(Relation heapRelation, uint16 flags,
 							concurrently,	/* concurrent */
 							indexRelation->rd_indam->amsummarizing,
 							oldInfo->ii_WithoutOverlaps,
-							false);
+							false,
+							InvalidOid);
 
 	/* fetch exclusion constraint info if any */
 	if (indexRelation->rd_index->indisexclusion)
@@ -1601,7 +1613,8 @@ index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
 							true,
 							false,	/* aux are not summarizing */
 							false,	/* aux are not without overlaps */
-							true	/* auxiliary */);
+							true	/* auxiliary */,
+							mainIndexId /* auxiliaryForIndexId */);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -2640,7 +2653,8 @@ BuildIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid /* auxiliary_for_index_id is set only during build */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2701,7 +2715,8 @@ BuildDummyIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3783,8 +3798,11 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			indexForm->indisvalid = true;
 			break;
 		case INDEX_DROP_CLEAR_READY:
-			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
-			Assert(indexForm->indisready);
+			/*
+			 * Clear indisready during a CREATE INDEX CONCURRENTLY sequence.
+			 * indisready may already be false if the CIC failed before
+			 * index_concurrently_build had a chance to set it.
+			 */
 			Assert(!indexForm->indisvalid);
 			indexForm->indisready = false;
 			break;
@@ -3869,6 +3887,7 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 				heapRelation;
 	Oid			heapId;
 	Oid			save_userid;
+	Oid			junkAuxIndexId;
 	int			save_sec_context;
 	int			save_nestlevel;
 	IndexInfo  *indexInfo;
@@ -3925,6 +3944,19 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
 	}
 
+	/* Check for the auxiliary index for that index, it needs to be dropped */
+	junkAuxIndexId = get_auxiliary_index(indexId);
+	if (OidIsValid(junkAuxIndexId))
+	{
+		ObjectAddress object;
+		object.classId = RelationRelationId;
+		object.objectId = junkAuxIndexId;
+		object.objectSubId = 0;
+		performDeletion(&object, DROP_RESTRICT,
+								 PERFORM_DELETION_INTERNAL |
+								 PERFORM_DELETION_QUIETLY);
+	}
+
 	/*
 	 * Open the target index relation and get an exclusive lock on it, to
 	 * ensure that no one else is touching this particular index.
@@ -4213,7 +4245,8 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 {
 	Relation	rel;
 	Oid			toast_relid;
-	List	   *indexIds;
+	List	   *indexIds,
+			   *auxIndexIds = NIL;
 	char		persistence;
 	bool		result = false;
 	ListCell   *indexId;
@@ -4302,13 +4335,30 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	else
 		persistence = rel->rd_rel->relpersistence;
 
+	foreach(indexId, indexIds)
+	{
+		Oid			indexOid = lfirst_oid(indexId);
+		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* All STIR indexes are auxiliary indexes */
+		if (indexAm == STIR_AM_OID)
+		{
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			auxIndexIds = lappend_oid(auxIndexIds, indexOid);
+		}
+	}
+
 	/* Reindex all the indexes. */
 	i = 1;
 	foreach(indexId, indexIds)
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
-		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* Auxiliary indexes are going to be dropped during main index rebuild */
+		if (list_member_oid(auxIndexIds, indexOid))
+			continue;
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4334,18 +4384,6 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
-		if (indexAm == STIR_AM_OID)
-		{
-			ereport(WARNING,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
-							get_namespace_name(indexNamespaceId),
-							get_rel_name(indexOid))));
-			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
-				RemoveReindexPending(indexOid);
-			continue;
-		}
-
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 07c2d41c189..deacd2f7c95 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -20,6 +20,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am_d.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_depend.h"
 #include "catalog/pg_extension.h"
@@ -1108,6 +1109,67 @@ get_index_constraint(Oid indexId)
 	return constraintId;
 }
 
+/*
+ * get_auxiliary_index
+ *		Given the OID of an index, return the OID of its auxiliary
+ *		index, or InvalidOid if there is no auxiliary index.
+ */
+Oid
+get_auxiliary_index(Oid indexId)
+{
+	Oid			auxiliaryIndexOid = InvalidOid;
+	Relation	depRel;
+	ScanKeyData key[3];
+	SysScanDesc scan;
+	HeapTuple	tup;
+
+	/* Search the dependency table for the index */
+	depRel = table_open(DependRelationId, AccessShareLock);
+
+	ScanKeyInit(&key[0],
+				Anum_pg_depend_refclassid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationRelationId));
+	ScanKeyInit(&key[1],
+				Anum_pg_depend_refobjid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indexId));
+	ScanKeyInit(&key[2],
+				Anum_pg_depend_refobjsubid,
+				BTEqualStrategyNumber, F_INT4EQ,
+				Int32GetDatum(0));
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+							  NULL, 3, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend deprec = (Form_pg_depend) GETSTRUCT(tup);
+
+		/*
+		 * Look for an AUTO dependency on a STIR index.  There can be at most
+		 * one STIR auxiliary per index, so we stop at the first match.
+		 * Transitive auxiliaries (e.g. ccnew_ccaux from a failed REINDEX
+		 * CONCURRENTLY) are found by calling this with the ccnew OID, and
+		 * are also cleaned up automatically via cascading AUTO dependency
+		 * when the intermediate index is dropped.
+		 */
+		if (deprec->classid == RelationRelationId &&
+			(deprec->deptype == DEPENDENCY_AUTO) &&
+			get_rel_relkind(deprec->objid) == RELKIND_INDEX &&
+			get_rel_relam(deprec->objid) == STIR_AM_OID)
+		{
+			auxiliaryIndexOid = deprec->objid;
+			break;
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(depRel, AccessShareLock);
+
+	return auxiliaryIndexOid;
+}
+
 /*
  * get_index_ref_constraints
  *		Given the OID of an index, return the OID of all foreign key
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index d7ea86b2805..f428dcdf10f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -315,6 +315,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
 	indexInfo->ii_Auxiliary = false;
+	indexInfo->ii_AuxiliaryForIndexId = InvalidOid;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2d7b6b7eb8b..46c4ccc6789 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -247,7 +247,7 @@ CheckIndexCompatible(Oid oldId,
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
 							  false, false, amsummarizing,
-							  isWithoutOverlaps, isauxiliary);
+							  isWithoutOverlaps, isauxiliary, InvalidOid);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -947,7 +947,8 @@ DefineIndex(ParseState *pstate,
 							  concurrent,
 							  amissummarizing,
 							  stmt->iswithoutoverlaps,
-							  false);
+							  false,
+							  InvalidOid);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -3711,6 +3712,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		Oid			indexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -4060,6 +4062,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
@@ -4067,6 +4070,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		int			save_nestlevel;
 		Relation	newIndexRel;
 		Relation	auxIndexRel;
+		Relation	junkAuxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -4144,12 +4148,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 												   tablespaceid,
 												   auxConcurrentName);
 
+		/* Search for auxiliary index for reindexed index, to drop it */
+		junkAuxIndexId = get_auxiliary_index(idx->indexId);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
+		if (OidIsValid(junkAuxIndexId))
+			junkAuxIndexRel = index_open(junkAuxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4159,6 +4168,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
 		newidx->auxIndexId = auxIndexId;
+		newidx->junkAuxIndexId = junkAuxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4180,10 +4190,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		if (OidIsValid(junkAuxIndexId))
+		{
+			lockrelid = palloc_object(LockRelId);
+			*lockrelid = junkAuxIndexRel->rd_lockInfo.lockRelId;
+			relationLocks = lappend(relationLocks, lockrelid);
+		}
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		if (OidIsValid(junkAuxIndexId))
+			index_close(junkAuxIndexRel, NoLock);
 		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
@@ -4372,7 +4390,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 	/*
 	 * At this moment all target indexes are marked as "ready-to-insert". So,
-	 * we are free to start process of dropping auxiliary indexes.
+	 * we are free to start process of dropping auxiliary indexes - including
+	 * junk indexes detected earlier.
 	 */
 	foreach(lc, newIndexIds)
 	{
@@ -4395,6 +4414,9 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 */
 		PushActiveSnapshot(GetTransactionSnapshot());
 		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		/* Ensure the junk index is marked as non-ready */
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_set_state_flags(newidx->junkAuxIndexId, INDEX_DROP_CLEAR_READY);
 		PopActiveSnapshot();
 
 		CommitTransactionCommand();
@@ -4614,6 +4636,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_concurrently_set_dead(newidx->tableId, newidx->junkAuxIndexId);
 
 		PopActiveSnapshot();
 	}
@@ -4665,6 +4689,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			object.objectSubId = 0;
 
 			add_exact_object_address(&object, objects);
+
+			if (OidIsValid(idx->junkAuxIndexId))
+			{
+				object.objectId = idx->junkAuxIndexId;
+				object.objectSubId = 0;
+				add_exact_object_address(&object, objects);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0ce2e81f9c2..e2309b6a1ba 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1567,6 +1567,8 @@ RemoveRelations(DropStmt *drop)
 	ListCell   *cell;
 	int			flags = 0;
 	LOCKMODE	lockmode = AccessExclusiveLock;
+	MemoryContext private_context,
+				  oldcontext;
 
 	/* DROP CONCURRENTLY uses a weaker lock, and has some restrictions */
 	if (drop->concurrent)
@@ -1631,9 +1633,20 @@ RemoveRelations(DropStmt *drop)
 			relkind = 0;		/* keep compiler quiet */
 			break;
 	}
+	/*
+	 * Create a memory context that will survive forced transaction commits we
+	 * may need to do below (in case of concurrent index drop).
+	 * Since it is a child of PortalContext, it will go away eventually even if
+	 * we suffer an error; there's no need for special abort cleanup logic.
+	 */
+	private_context = AllocSetContextCreate(PortalContext,
+											"RemoveRelations",
+											ALLOCSET_SMALL_SIZES);
 
+	oldcontext = MemoryContextSwitchTo(private_context);
 	/* Lock and validate each relation; build a list of object addresses */
 	objects = new_object_addresses();
+	MemoryContextSwitchTo(oldcontext);
 
 	foreach(cell, drop->objects)
 	{
@@ -1685,6 +1698,38 @@ RemoveRelations(DropStmt *drop)
 			flags |= PERFORM_DELETION_CONCURRENTLY;
 		}
 
+		/*
+		 * Concurrent index drop requires it to be the first transaction. But in
+		 * case we have junk auxiliary index - we want to drop it too (and also
+		 * in a concurrent way). In this case perform silent internal deletion
+		 * of auxiliary index, and restore transaction state. It is fine to do it
+		 * in the loop because there is only single element in drop->objects.
+		 */
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
+			state.actual_relkind == RELKIND_INDEX)
+		{
+			Oid junkAuxIndexOid = get_auxiliary_index(relOid);
+			if (OidIsValid(junkAuxIndexOid))
+			{
+				ObjectAddress object;
+				object.classId = RelationRelationId;
+				object.objectId = junkAuxIndexOid;
+				object.objectSubId = 0;
+				performDeletion(&object, DROP_RESTRICT,
+										 PERFORM_DELETION_CONCURRENTLY |
+										 PERFORM_DELETION_INTERNAL |
+										 PERFORM_DELETION_QUIETLY);
+				CommitTransactionCommand();
+				MemoryContextDelete(private_context);
+
+				/* And start again - now without auxiliary index. */
+				StartTransactionCommand();
+				PushActiveSnapshot(GetTransactionSnapshot());
+				RemoveRelations(drop);
+				return;
+			}
+		}
+
 		/*
 		 * Concurrent index drop cannot be used with partitioned indexes,
 		 * either.
@@ -1713,12 +1758,17 @@ RemoveRelations(DropStmt *drop)
 		obj.objectId = relOid;
 		obj.objectSubId = 0;
 
+		oldcontext = MemoryContextSwitchTo(private_context);
 		add_exact_object_address(&obj, objects);
+		MemoryContextSwitchTo(oldcontext);
 	}
 
+	/* Deletion may involve multiple commits, so, switch to memory context */
+	oldcontext = MemoryContextSwitchTo(private_context);
 	performMultipleDeletions(objects, drop->behavior, flags);
+	MemoryContextSwitchTo(oldcontext);
 
-	free_object_addresses(objects);
+	MemoryContextDelete(private_context);
 }
 
 /*
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 84f7cf9824e..c54748ff644 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps, bool auxiliary)
+			  bool withoutoverlaps, bool auxiliary, Oid auxiliary_for_index_id)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -851,6 +851,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
 	n->ii_Auxiliary = auxiliary;
+	n->ii_AuxiliaryForIndexId = auxiliary_for_index_id;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 2f3c1eae3c7..6ae210c584e 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -193,6 +193,7 @@ extern List *getOwnedSequences(Oid relid);
 extern Oid	getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok);
 
 extern Oid	get_index_constraint(Oid indexId);
+extern Oid	get_auxiliary_index(Oid indexId);
 
 extern List *get_index_ref_constraints(Oid indexId);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ecaf82f2afa..f1605e00cdc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -232,6 +232,8 @@ typedef struct IndexInfo
 	int			ii_ParallelWorkers;
 	/* is auxiliary for concurrent index build? */
 	bool		ii_Auxiliary;
+	/* if creating an auxiliary index, the OID of the main index; otherwise InvalidOid. */
+	Oid			ii_AuxiliaryForIndexId;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index cd7f1eb0592..3a704781c8b 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -100,7 +100,7 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
 								bool summarizing, bool withoutoverlaps,
-								bool auxiliary);
+								bool auxiliary, Oid auxiliary_for_index_id);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index d1723f47e89..2d6abb15a89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3279,20 +3279,109 @@ ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-ERROR:  relation "concur_reindex_tab4" does not exist
-LINE 1: DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
-                    ^
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
-ERROR:  could not create unique index "aux_index_ind6"
-DETAIL:  Key (c1)=(1) is duplicated.
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
 WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
 HINT:  Use DROP INDEX or REINDEX INDEX.
 WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
 NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+-- Make sure it is still exists
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
+DROP INDEX aux_index_ind6;
+ERROR:  index "aux_index_ind6" does not exist
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index c2c1b031527..fd96d80abbc 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1344,11 +1344,62 @@ REINDEX INDEX aux_index_ind6_ccaux;
 -- Concurrently also
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 -- This makes the previous failure go away, so the index can become valid.
-DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
+-- Make sure it is still exists
+\d aux_index_tab5
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.43.0



  [application/octet-stream] v34-0006-Optimize-auxiliary-index-handling.patch (3.0K, 7-v34-0006-Optimize-auxiliary-index-handling.patch)
  download | inline diff:
From 17191fcaeb5d38fbb2f4181e04814c62df6d771d Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 30 Dec 2024 16:37:12 +0100
Subject: [PATCH v34 6/7] Optimize auxiliary index handling

Skip unnecessary computations for auxiliary indices by:
- in the index-insert path, detect auxiliary indexes and bypass Datum value computation
- set indexUnchanged=false for auxiliary indices to avoid redundant checks

These optimizations reduce overhead during concurrent index operations.
---
 src/backend/catalog/index.c         | 9 +++++++++
 src/backend/executor/execIndexing.c | 5 ++++-
 src/include/nodes/execnodes.h       | 6 ++++--
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9136dfc7c73..4edf68aced2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2940,6 +2940,15 @@ FormIndexDatum(IndexInfo *indexInfo,
 	ListCell   *indexpr_item;
 	int			i;
 
+	/* Auxiliary index does not need any values to be computed */
+	if (unlikely(indexInfo->ii_Auxiliary))
+	{
+		Assert(indexInfo->ii_Am == STIR_AM_OID);
+		memset(values, 0, sizeof(Datum) * indexInfo->ii_NumIndexAttrs);
+		memset(isnull, true, sizeof(bool) * indexInfo->ii_NumIndexAttrs);
+		return;
+	}
+
 	if (indexInfo->ii_Expressions != NIL &&
 		indexInfo->ii_ExpressionsState == NIL)
 	{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 4363e154c0f..84e99d653ec 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -438,8 +438,11 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
 		 * There's definitely going to be an index_insert() call for this
 		 * index.  If we're being called as part of an UPDATE statement,
 		 * consider if the 'indexUnchanged' = true hint should be passed.
+		 *
+		 * For auxiliary indexes, always pass false to skip value comparison checks,
+		 * since auxiliary indexes only store TIDs and don't track value changes.
 		 */
-		indexUnchanged = ((flags & EIIT_IS_UPDATE) &&
+		indexUnchanged = ((flags & EIIT_IS_UPDATE) && !indexInfo->ii_Auxiliary &&
 						  index_unchanged_by_update(resultRelInfo,
 													estate,
 													indexInfo,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f1605e00cdc..62f797bc197 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -169,8 +169,10 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
- * are used only during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
+ * during index build; they're conventionally zeroed otherwise.  ii_Auxiliary
+ * is also used during retail inserts to skip datum formation for auxiliary
+ * indexes.
  * ----------------
  */
 typedef struct IndexInfo
-- 
2.43.0



  [application/octet-stream] v34-0007-Refresh-snapshot-periodically-during-index-valid.patch (27.1K, 8-v34-0007-Refresh-snapshot-periodically-during-index-valid.patch)
  download | inline diff:
From 62a289139fb151848a5db5d89d87ed024ce6b5f2 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 21 Apr 2025 14:11:53 +0200
Subject: [PATCH v34 7/7] Refresh snapshot periodically during index validation

Enhances validation phase of concurrently built indexes by periodically refreshing snapshots rather than using a single reference snapshot. This addresses issues with xmin propagation during long-running validations.

The validation now takes a fresh snapshot every few pages, allowing the xmin horizon to advance. This restores feature of commit d9d076222f5b, which was reverted in commit e28bb8851969. New STIR-based approach does not depend on single reference snapshot anymore.
---
 src/backend/access/heap/README.HOT         |  4 +-
 src/backend/access/heap/heapam_handler.c   | 77 +++++++++++++++++++++-
 src/backend/access/spgist/spgvacuum.c      | 12 +++-
 src/backend/catalog/index.c                | 73 +++++++++++++++-----
 src/backend/commands/indexcmds.c           | 52 +++------------
 src/backend/utils/misc/guc_parameters.dat  |  9 +++
 src/include/access/tableam.h               | 25 ++++---
 src/include/access/transam.h               | 15 +++++
 src/include/catalog/index.h                |  2 +-
 src/include/miscadmin.h                    |  1 +
 src/test/regress/expected/create_index.out |  3 +
 src/test/regress/sql/create_index.sql      |  4 ++
 12 files changed, 194 insertions(+), 83 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index b1c797517ee..382fe1723a5 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -401,12 +401,12 @@ live tuple.
 We mark the index open for inserts (but still not ready for reads) then
 we again wait for transactions which have the table open.  Then validate
 the index.  This searches for tuples missing from the index in auxiliary
-index, and inserts any missing ones if they are visible to reference snapshot.
+index, and inserts any missing ones if they are visible to a fresh snapshot.
 Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
-the second reference snapshot is finished.  This ensures that nobody is
+the latest used snapshot is finished.  This ensures that nobody is
 alive any longer who could need to see any tuples that might be missing
 from the index, as well as ensuring that no one can see any inconsistent
 rows in a broken HOT chain (the first condition is stronger than the
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index a3474925d61..8a9d94b1edd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -53,6 +53,9 @@
 /* GUC: percentage of maintenance_work_mem for CIC validation tuplestore */
 int			debug_cic_validate_store_mem_pct = 10;
 
+/* GUC: refresh snapshot every N pages during CIC validation (0 = disable) */
+int			debug_cic_validate_snapshot_pages = 4096;
+
 static void reform_and_rewrite_tuple(HeapTuple tuple,
                                      Relation OldHeap, Relation NewHeap,
                                      Datum *values, bool *isnull, RewriteState rwstate);
@@ -1922,24 +1925,35 @@ heapam_index_validate_scan_read_stream_next(
 	return result;
 }
 
-static void
+static TransactionId
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
-						   Snapshot snapshot,
 						   ValidateIndexState *state,
 						   ValidateIndexState *auxState)
 {
+	TransactionId limitXmin;
+
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Snapshot		snapshot;
 	TupleTableSlot  *slot;
 	EState			*estate;
 	ExprContext		*econtext;
 	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	int64			num_to_check;
+	int64			page_read_counter = 1; /* set to 1 to skip snapshot reset at start */
 	Tuplestorestate *tuples_for_check;
+
+	/*
+	 * Under REPEATABLE READ or SERIALIZABLE (possible via
+	 * default_transaction_isolation), GetLatestSnapshot() returns the
+	 * transaction-level snapshot and xmin stays pinned.  Periodic snapshot
+	 * refresh is pointless in that case, so skip it.
+	 */
+	bool		reset_snapshot = XactIsoLevel <= XACT_READ_COMMITTED;
 	ValidateIndexScanState callback_private_data;
 
 	Buffer buf;
@@ -1949,6 +1963,8 @@ heapam_index_validate_scan(Relation heapRelation,
 	/* Use a percentage of maintenance_work_mem for tuple store. */
 	int		store_work_mem_part = maintenance_work_mem * debug_cic_validate_store_mem_pct / 100;
 
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
@@ -1957,6 +1973,12 @@ heapam_index_validate_scan(Relation heapRelation,
 	 */
 	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
+	PopActiveSnapshot();
+	InvalidateCatalogSnapshot();
+
+	Assert(!reset_snapshot || !HaveRegisteredOrActiveSnapshot());
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+
 	/*
 	 * sanity checks
 	 */
@@ -1972,6 +1994,29 @@ heapam_index_validate_scan(Relation heapRelation,
 
 	state->tuplesort = auxState->tuplesort = NULL;
 
+	/*
+	 * Now take the first snapshot that will be used to filter candidate
+	 * tuples. We are going to replace it by newer snapshot every so often
+	 * to propagate horizon.
+	 *
+	 * Beware!  There might still be snapshots in use that treat some transaction
+	 * as in-progress that our temporary snapshot treats as committed.
+	 *
+	 * If such a recently-committed transaction deleted tuples in the table,
+	 * we will not include them in the index; yet those transactions which
+	 * see the deleting one as still-in-progress will expect such tuples to
+	 * be there once we mark the index as valid.
+	 *
+	 * We solve this by waiting for all endangered transactions to exit before
+	 * we mark the index as valid, for that reason limitXmin is supported.
+	 *
+	 * We also set ActiveSnapshot to this snap, since functions in indexes may
+	 * need a snapshot.
+	 */
+	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	PushActiveSnapshot(snapshot);
+	limitXmin = snapshot->xmin;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
@@ -2005,6 +2050,7 @@ heapam_index_validate_scan(Relation heapRelation,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		block_number = BufferGetBlockNumber(buf);
+		page_read_counter++;
 
 		i = 0;
 		while ((off = tuples[i]) != InvalidOffsetNumber)
@@ -2075,6 +2121,21 @@ heapam_index_validate_scan(Relation heapRelation,
 		}
 
 		ReleaseBuffer(buf);
+		if (reset_snapshot &&
+			debug_cic_validate_snapshot_pages > 0 &&
+			page_read_counter % debug_cic_validate_snapshot_pages == 0)
+		{
+			PopActiveSnapshot();
+			UnregisterSnapshot(snapshot);
+			/* to make sure we propagate xmin */
+			InvalidateCatalogSnapshot();
+			Assert(!TransactionIdIsValid(MyProc->xmin));
+
+			snapshot = RegisterSnapshot(GetLatestSnapshot());
+			PushActiveSnapshot(snapshot);
+			/* Advance limitXmin so we wait for all snapshots seen so far */
+			limitXmin = TransactionIdNewer(limitXmin, snapshot->xmin);
+		}
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -2084,11 +2145,23 @@ heapam_index_validate_scan(Relation heapRelation,
 	read_stream_end(read_stream);
 	tuplestore_end(tuples_for_check);
 
+	/*
+	 * Drop the latest snapshot.  We must do this before waiting out other
+	 * snapshot holders, else we will deadlock against other processes also
+	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
+	 * they must wait for.
+	 */
+	PopActiveSnapshot();
+	UnregisterSnapshot(snapshot);
+	InvalidateCatalogSnapshot();
+	Assert(!reset_snapshot || MyProc->xmin == InvalidTransactionId);
 	FreeAccessStrategy(bstrategy);
 
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
+
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index c461f8dc02d..ef192fb99c2 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -191,14 +191,16 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 			 * Add target TID to pending list if the redirection could have
 			 * happened since VACUUM started.  (If xid is invalid, assume it
 			 * must have happened before VACUUM started, since REINDEX
-			 * CONCURRENTLY locks out VACUUM.)
+			 * CONCURRENTLY locks out VACUUM, if myXmin is invalid it is
+			 * validation scan.)
 			 *
 			 * Note: we could make a tighter test by seeing if the xid is
 			 * "running" according to the active snapshot; but snapmgr.c
 			 * doesn't currently export a suitable API, and it's not entirely
 			 * clear that a tighter test is worth the cycles anyway.
 			 */
-			if (TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
+			if (!TransactionIdIsValid(bds->myXmin) ||
+					TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
 				spgAddPendingTID(bds, &dt->pointer);
 		}
 		else
@@ -808,7 +810,6 @@ spgvacuumscan(spgBulkDeleteState *bds)
 	/* Finish setting up spgBulkDeleteState */
 	initSpGistState(&bds->spgstate, index);
 	bds->pendingList = NULL;
-	bds->myXmin = GetActiveSnapshot()->xmin;
 	bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
 
 	/*
@@ -959,6 +960,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	bds.stats = stats;
 	bds.callback = callback;
 	bds.callback_state = callback_state;
+	if (info->validate_index)
+		bds.myXmin = InvalidTransactionId;
+	else
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 	spgvacuumscan(&bds);
 
@@ -999,6 +1004,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.stats = stats;
 		bds.callback = dummy_callback;
 		bds.callback_state = NULL;
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 		spgvacuumscan(&bds);
 	}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4edf68aced2..49adcb152cf 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -69,6 +69,7 @@
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -3538,8 +3539,9 @@ IndexCheckExclusion(Relation heapRelation,
  * insert their new tuples into it. At the same moment we clear "indisready" for
  * auxiliary index, since it is no more required to be updated.
  *
- * We then take a new reference snapshot, any tuples that are valid according
- * to this snap, but are not in the index, must be added to the index.
+ * We then take a new snapshot, any tuples that are valid according
+ * to this snap, but are not in the index, must be added to the index. In
+ * order to propagate xmin we reset that snapshot every so often.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
  * the snap need not be indexed, because we will wait out all transactions
@@ -3552,7 +3554,7 @@ IndexCheckExclusion(Relation heapRelation,
  * TIDs of both auxiliary and target indexes, and doing a "merge join" against
  * the TID lists to see which tuples from auxiliary index are missing from the
  * target index.  Thus we will ensure that all tuples valid according to the
- * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * latest snapshot are in the index. Notice we need to do bulkdelete in the
  * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
@@ -3565,21 +3567,24 @@ IndexCheckExclusion(Relation heapRelation,
  * before it declares a uniqueness error.
  *
  * After completing validate_index(), we wait until all transactions that
- * were alive at the time of the reference snapshot are gone; this is
- * necessary to be sure there are none left with a transaction snapshot
- * older than the reference (and hence possibly able to see tuples we did
- * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
- * transactions will be able to use it for queries.
+ * were alive at the time of the latest snapshot used during validation are
+ * gone; this is necessary to be sure there are none left with a transaction
+ * snapshot older than that (and hence possibly able to see tuples we did
+ * not index).  The snapshot is periodically refreshed during the heap scan
+ * to propagate the xmin horizon, so limitXmin tracks the most recent one.
+ * Then we mark the index "indisvalid" and commit.  Subsequent transactions
+ * will be able to use it for queries.
  *
  * Also, some actions to concurrent drop the auxiliary index are performed.
  */
-void
-validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
+TransactionId
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId)
 {
 	Relation	heapRelation,
 				indexRelation,
 				auxIndexRelation;
 	IndexInfo  *indexInfo;
+	TransactionId limitXmin;
 	IndexVacuumInfo ivinfo, auxivinfo;
 	ValidateIndexState state, auxState;
 	Oid			save_userid;
@@ -3592,6 +3597,16 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
 	int			aux_work_mem_part = maintenance_work_mem / 10;
 
+	/*
+	 * Under REPEATABLE READ or SERIALIZABLE (possible via
+	 * default_transaction_isolation), GetLatestSnapshot() returns the
+	 * transaction-level snapshot and xmin stays pinned.  Periodic snapshot
+	 * refresh is pointless in that case, so skip it.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	bool		reset_snapshot = XactIsoLevel <= XACT_READ_COMMITTED;
+#endif
+
 	{
 		const int	progress_index[] = {
 			PROGRESS_CREATEIDX_PHASE,
@@ -3629,8 +3644,12 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	 * Fetch info needed for index_insert.  (You might think this should be
 	 * passed in from DefineIndex, but its copy is long gone due to having
 	 * been built in a previous transaction.)
+	 *
+	 * We might need snapshot for index expressions or predicates.
 	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	indexInfo = BuildIndexInfo(indexRelation);
+	PopActiveSnapshot();
 
 	/* mark build is concurrent just for consistency */
 	indexInfo->ii_Concurrent = true;
@@ -3666,6 +3685,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 										   NULL, TUPLESORT_NONE);
 	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	(void) index_bulk_delete(&auxivinfo, NULL,
 							 validate_index_callback, &auxState);
 	/* If aux index is empty, merge may be skipped */
@@ -3685,7 +3707,13 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		index_close(indexRelation, NoLock);
 		table_close(heapRelation, NoLock);
 
-		return;
+		PushActiveSnapshot(GetTransactionSnapshot());
+		limitXmin = GetActiveSnapshot()->xmin;
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+		return limitXmin;
 	}
 
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
@@ -3694,6 +3722,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	/* ambulkdelete updates progress metrics */
 	(void) index_bulk_delete(&ivinfo, NULL,
 							 validate_index_callback, &state);
@@ -3713,19 +3744,24 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	tuplesort_performsort(auxState.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
 
 	/*
 	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
-	table_index_validate_scan(heapRelation,
-							  indexRelation,
-							  indexInfo,
-							  snapshot,
-							  &state,
-							  &auxState);
+	limitXmin = table_index_validate_scan(heapRelation,
+										  indexRelation,
+										  indexInfo,
+										  &state,
+										  &auxState);
 
 	/* Tuple sort closed by table_index_validate_scan */
 	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
@@ -3748,6 +3784,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
+
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 46c4ccc6789..a700068f8a2 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -596,7 +596,6 @@ DefineIndex(ParseState *pstate,
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -1816,32 +1815,11 @@ DefineIndex(ParseState *pstate,
 	/* Tell concurrent index builds to ignore us, if index qualifies */
 	if (safe_index)
 		set_indexsafe_procflags();
-
-	/*
-	 * Now take the "reference snapshot" that will be used by validate_index()
-	 * to filter candidate tuples.  Beware!  There might still be snapshots in
-	 * use that treat some transaction as in-progress that our reference
-	 * snapshot treats as committed.  If such a recently-committed transaction
-	 * deleted tuples in the table, we will not include them in the index; yet
-	 * those transactions which see the deleting one as still-in-progress will
-	 * expect such tuples to be there once we mark the index as valid.
-	 *
-	 * We solve this by waiting for all endangered transactions to exit before
-	 * we mark the index as valid.
-	 *
-	 * We also set ActiveSnapshot to this snap, since functions in indexes may
-	 * need a snapshot.
-	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
-	PushActiveSnapshot(snapshot);
 	/*
 	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
-	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
-	limitXmin = snapshot->xmin;
+	limitXmin = validate_index(tableId, indexRelationId, auxIndexRelationId);
 
-	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1863,8 +1841,8 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * The index is now valid in the sense that it contains all currently
 	 * interesting tuples.  But since it might not contain tuples deleted just
-	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.
+	 * before the last snapshot during validating was taken, we have to wait
+	 * out any transactions that might have older snapshots.
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
@@ -4433,7 +4411,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
-		Snapshot	snapshot;
 
 		StartTransactionCommand();
 
@@ -4448,13 +4425,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		if (newidx->safe)
 			set_indexsafe_procflags();
 
-		/*
-		 * Take the "reference snapshot" that will be used by validate_index()
-		 * to filter candidate tuples.
-		 */
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
-		PushActiveSnapshot(snapshot);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
@@ -4466,16 +4436,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
-
-		/*
-		 * We can now do away with our active snapshot, we still need to save
-		 * the xmin limit to wait for older snapshots.
-		 */
-		limitXmin = snapshot->xmin;
-
-		PopActiveSnapshot();
-		UnregisterSnapshot(snapshot);
+		limitXmin = validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId);
 
 		/*
 		 * To ensure no deadlocks, we must commit and start yet another
@@ -4485,10 +4446,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		CommitTransactionCommand();
 		StartTransactionCommand();
 
+		/* We should now definitely not be advertising any xmin. */
+		Assert(!TransactionIdIsValid(MyProc->xmin));
+
 		/*
 		 * The index is now valid in the sense that it contains all currently
 		 * interesting tuples.  But since it might not contain tuples deleted
-		 * just before the reference snap was taken, we have to wait out any
+		 * just before the latest snap was taken, we have to wait out any
 		 * transactions that might have older snapshots.
 		 *
 		 * Because we don't take a snapshot or Xid in this transaction,
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 4f8761de6b9..c566b6040c9 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -632,6 +632,15 @@
   boot_val => 'DEFAULT_ASSERT_ENABLED',
 },
 
+{ name => 'debug_cic_validate_snapshot_pages', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Refresh snapshot every N pages during CIC validation (0 to disable).',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'debug_cic_validate_snapshot_pages',
+  boot_val => '4096',
+  min => '0',
+  max => '1000000',
+},
+
 { name => 'debug_cic_validate_store_mem_pct', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Percentage of maintenance_work_mem used for CIC validation tuplestore.',
   flags => 'GUC_NOT_IN_SAMPLE',
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index fafca930aae..033237a9ce4 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -734,12 +734,11 @@ typedef struct TableAmRoutine
 										   TableScanDesc scan);
 
 	/* see table_index_validate_scan for reference about parameters */
-	void		(*index_validate_scan) (Relation table_rel,
-										Relation index_rel,
-										IndexInfo *index_info,
-										Snapshot snapshot,
-										ValidateIndexState *state,
-										ValidateIndexState *aux_state);
+	TransactionId		(*index_validate_scan) (Relation table_rel,
+												Relation index_rel,
+												IndexInfo *index_info,
+												ValidateIndexState *state,
+												ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1897,20 +1896,18 @@ table_index_build_range_scan(Relation table_rel,
  * Note: it is responsibility of that function to close sortstates in
  * both `state` and `auxstate`.
  */
-static inline void
+static inline TransactionId
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
-						  Snapshot snapshot,
 						  ValidateIndexState *state,
 						  ValidateIndexState *auxstate)
 {
-	table_rel->rd_tableam->index_validate_scan(table_rel,
-											   index_rel,
-											   index_info,
-											   snapshot,
-											   state,
-											   auxstate);
+	return table_rel->rd_tableam->index_validate_scan(table_rel,
+													  index_rel,
+													  index_info,
+													  state,
+													  auxstate);
 }
 
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 55a4ab26b34..923aadbab43 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -415,6 +415,21 @@ NormalTransactionIdOlder(TransactionId a, TransactionId b)
 	return b;
 }
 
+/* return the newer of the two IDs */
+static inline TransactionId
+TransactionIdNewer(TransactionId a, TransactionId b)
+{
+	if (!TransactionIdIsValid(a))
+		return b;
+
+	if (!TransactionIdIsValid(b))
+		return a;
+
+	if (TransactionIdFollows(a, b))
+		return a;
+	return b;
+}
+
 /* return the newer of the two IDs */
 static inline FullTransactionId
 FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 3239e5c716f..def7352a859 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -159,7 +159,7 @@ extern void index_build(Relation heapRelation,
 						bool parallel,
 						bool progress);
 
-extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
+extern TransactionId validate_index(Oid heapId, Oid indexId, Oid auxIndexId);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 7ea643b7f80..8b3e7b21da1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -269,6 +269,7 @@ extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
 extern PGDLLIMPORT int debug_cic_validate_store_mem_pct;
+extern PGDLLIMPORT int debug_cic_validate_snapshot_pages;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2d6abb15a89..758c5884ff5 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3382,6 +3382,9 @@ DROP INDEX aux_index_ind6;
 --------+---------+-----------+----------+---------
  c1     | integer |           |          | 
 
+SET default_transaction_isolation = 'repeatable read';
+CREATE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+SET default_transaction_isolation = 'read committed';
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index fd96d80abbc..65dd58b947d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1400,6 +1400,10 @@ DROP INDEX aux_index_ind6;
 -- Make sure auxiliary index dropped too
 \d aux_index_tab5
 
+SET default_transaction_isolation = 'repeatable read';
+CREATE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+SET default_transaction_isolation = 'read committed';
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-04-07 01:42  Josh Kupershmidt <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Josh Kupershmidt @ 2026-04-07 01:42 UTC (permalink / raw)
  To: Mihail Nikalayeu <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Hi,

I was interested in this feature, and took an initial look through the
patch. Sorry in advance that I'm missing some previous context from the
thread's history, I'm starting fresh here.

A few initial notes from looking at the v34 patches:

Usability and docs:
 * We're leaving behind two invalid indexes now that the user has to figure
out how to drop in case of an error - that seems like it could be confusing
for the user. Could we have some better way (error handler,
background worker) try to perform this cleanup automatically? If not, we
should at least tell the user clearly in the error message that both
invalid indexes are left behind (i.e. "idx" and "idx_ccaux" in the example)
 * Docs are inconsistent or confusing about whether there's one or two
indexes left behind in case of error - e.g. "command will fail but leave
behind *an* invalid index and its associated auxiliary index" - somewhat
implying there is only one invalid index, and somehow the auxiliary index
is valid?
 * Similarly, when the doc mentions e.g. "drop the index" - it's not
necessarily clear which index we're talking about when there are two
invalid indexes left behind that the user will see with `\d`
 * It would be nice to guard against users trying arbitrary CREATE INDEX
... USING stir(...) with a clear error

Few behavior notes and questions:
 * One of the testcases (line 2478 of patch 0004) does `DELETE FROM
concur_reindex_tab4 WHERE c1 = 1;` but the table `concur_reindex_tab4`
looks like it has been dropped a few lines above that?
 * The StirPageGetFreeSpace macro from patch 0002 reads
`StirPageGetMaxOffset(page)` which seems like it could cause an unsafe read
of opaque->maxoff if used on the metapage
 * A comment explains "No predicate evaluation is needed here" , i.e. we
are skipping predicate evaluation in the validation scan step, assuming
that the auxiliary index contains only qualifying TIDs. Is this really
bulletproof for e.g. partial indexes which may no longer satisfy the
predicate at the time of the validation scan due to conflicting HOT updates?

Thanks
Josh

On Mon, Apr 6, 2026 at 2:22 PM Mihail Nikalayeu <[email protected]>
wrote:

> Rebased once again.
>

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-04-07 23:19  Mihail Nikalayeu <[email protected]>
  parent: Josh Kupershmidt <[email protected]>
  0 siblings, 2 replies; 10+ messages in thread

From: Mihail Nikalayeu @ 2026-04-07 23:19 UTC (permalink / raw)
  To: Josh Kupershmidt <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Hello, Josh!

Your review looks a bit LLM-generated, but anyway - thanks for review! :)
Especially because at least one point seems to be valid.

> We're leaving behind two invalid indexes now that the user has to figure
> out how to drop in case of an error - that seems like it could be confusing for the user.
> Could we have some better way (error handler, background worker) try to perform this cleanup automatically?
> If not, we should at least tell the user clearly in the error message that both
> invalid indexes are left behind (i.e. "idx" and "idx_ccaux" in the example)

Commit 0005 adds automatic dropping of auxiliary indexes when the
original index is reindexed or dropped. Also, documentation reflects
the ccaux index (similar to ccnew).

> Docs are inconsistent or confusing about whether there's one or two indexes left behind in case of error
> - e.g. "command will fail but leave behind *an* invalid index and its associated auxiliary index"
> somewhat implying there is only one invalid index, and somehow the auxiliary index is valid?

Auxiliary index is never marked as valid; I'm not sure we need to
highlight it here. Or do you have an idea how to rephrase?

> Similarly, when the doc mentions e.g. "drop the index" - it's not necessarily clear which index
> we're talking about when there are two invalid indexes left behind that the user will see with `\d`

In one commit it says: "method in such cases is to drop these indexes
and try again to perform".
After 0005 "The auxiliary index (suffixed with
<literal>_ccaux</literal>) will be automatically dropped when the main
index is dropped".
It seems clear to me, but feel free to provide your variant.

>  * It would be nice to guard against users trying arbitrary CREATE INDEX ... USING stir(...) with a clear error

It will fail with "Building STIR indexes is not supported".

> One of the testcases (line 2478 of patch 0004) does `DELETE FROM concur_reindex_tab4 WHERE c1 = 1;`
> but the table `concur_reindex_tab4` looks like it has been dropped a few lines above that?

Hm, yep, I'll fix it.

> The StirPageGetFreeSpace macro from patch 0002 reads `StirPageGetMaxOffset(page)`
> which seems like it could cause an unsafe read of opaque->maxoff if used on the metapage

But it was never used for the metapage.

> A comment explains "No predicate evaluation is needed here" , i.e. we are skipping predicate
> evaluation in the validation scan step, assuming that the
> auxiliary index contains only qualifying TIDs. Is this really bulletproof for e.g. partial indexes which may
> no longer satisfy the predicate at the time of the validation scan due to conflicting HOT updates?

Conflicting HOT updates are not possible because the catalog contains
the new index definition from the start of the process.
Or do you mean a different scenario?

Best regards,
Mikhail.





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-04-11 16:56  Mihail Nikalayeu <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  1 sibling, 0 replies; 10+ messages in thread

From: Mihail Nikalayeu @ 2026-04-11 16:56 UTC (permalink / raw)
  To: Josh Kupershmidt <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

Hi!

Rebased, including fixes related to Josh's review.

Thanks!


Attachments:

  [application/octet-stream] v35-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch (97.5K, 2-v35-0004-Use-auxiliary-indexes-for-concurrent-index-opera.patch)
  download | inline diff:
From 707d09d1d73e0e1e770f4793db27d6a4370b3041 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 15:03:10 +0100
Subject: [PATCH v35 4/7] Use auxiliary indexes for concurrent index operations

Replace the second table full scan in concurrent index builds with an auxiliary index approach:
- create a STIR auxiliary index with the same predicate (if exists) as in main index
- use it to track tuples inserted during the first phase
- merge auxiliary index with main index during validation to catch up new index with any tuples missed during the first phase
- automatically drop auxiliary when main index is ready

To merge main and auxiliary indexes:
- index_bulk_delete called for both, TIDs put into tuplesort
- both tuplesort are being sorted
- both tuplesort scanned with two pointers looking for the TIDs present in auxiliary index, but absent in main one
- all such TIDs are put into tuplestore
- all TIDs in tuplestore are fetched using the stream, tuplestore used in heapam_index_validate_scan_read_stream_next to provide the next page to prefetch
- if fetched tuple is alive - it is inserted into the main index

This eliminates the need for a second full table scan during validation, improving performance, especially for large tables. Affects both CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY operations.
---
 doc/src/sgml/monitoring.sgml               |  26 +-
 doc/src/sgml/ref/create_index.sgml         |  34 +-
 doc/src/sgml/ref/reindex.sgml              |  40 +-
 src/backend/access/heap/README.HOT         |  13 +-
 src/backend/access/heap/heapam_handler.c   | 557 ++++++++++++++-------
 src/backend/catalog/index.c                | 322 ++++++++++--
 src/backend/catalog/system_views.sql       |  17 +-
 src/backend/commands/indexcmds.c           | 345 +++++++++++--
 src/backend/nodes/makefuncs.c              |   4 +-
 src/backend/utils/misc/guc_parameters.dat  |   9 +
 src/include/access/tableam.h               |  12 +-
 src/include/catalog/index.h                |   9 +-
 src/include/commands/progress.h            |  13 +-
 src/include/miscadmin.h                    |   1 +
 src/include/nodes/makefuncs.h              |   3 +-
 src/test/regress/expected/create_index.out |  35 ++
 src/test/regress/expected/indexing.out     |   3 +-
 src/test/regress/expected/rules.out        |  17 +-
 src/test/regress/sql/create_index.sql      |  21 +
 19 files changed, 1147 insertions(+), 334 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 08d5b824552..1f2cd0d6f7e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6971,6 +6971,18 @@ FROM pg_stat_get_backend_idset() AS backendid;
        information for this phase.
       </entry>
      </row>
+     <row>
+      <entry><literal>waiting for writers to use auxiliary index</literal></entry>
+      <entry>
+       <command>CREATE INDEX CONCURRENTLY</command> or <command>REINDEX CONCURRENTLY</command> is waiting for transactions
+       with write locks that can potentially see the table to finish, to ensure use of auxiliary index for new tuples in
+       future transactions.
+       This phase is skipped when not in concurrent mode.
+       Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
+       and <structname>current_locker_pid</structname> contain the progress
+       information for this phase.
+      </entry>
+     </row>
      <row>
       <entry><literal>building index</literal></entry>
       <entry>
@@ -7011,13 +7023,12 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </entry>
      </row>
      <row>
-      <entry><literal>index validation: scanning table</literal></entry>
+      <entry><literal>index validation: merging indexes</literal></entry>
       <entry>
-       <command>CREATE INDEX CONCURRENTLY</command> is scanning the table
-       to validate the index tuples collected in the previous two phases.
+       <command>CREATE INDEX CONCURRENTLY</command> is merging content of auxiliary index with the target index.
        This phase is skipped when not in concurrent mode.
-       Columns <structname>blocks_total</structname> (set to the total size of the table)
-       and <structname>blocks_done</structname> contain the progress information for this phase.
+       Columns <structname>tuples_total</structname> (set to the number of tuples to be merged)
+       and <structname>tuples_done</structname> contain the progress information for this phase.
       </entry>
      </row>
      <row>
@@ -7034,8 +7045,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
      <row>
       <entry><literal>waiting for readers before marking dead</literal></entry>
       <entry>
-       <command>REINDEX CONCURRENTLY</command> is waiting for transactions
-       with read locks on the table to finish, before marking the old index dead.
+       <command>CREATE INDEX CONCURRENTLY</command> is waiting for transactions
+       with read locks on the table to finish, before marking the auxiliary index as dead.
+       <command>REINDEX CONCURRENTLY</command> is also waiting before marking the old index as dead.
        This phase is skipped when not in concurrent mode.
        Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
        and <structname>current_locker_pid</structname> contain the progress
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index bb7505d171b..901c6cf22bc 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -620,10 +620,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     out writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>CREATE INDEX</command>.
     When this option is used,
-    <productname>PostgreSQL</productname> must perform two scans of the table, and in
-    addition it must wait for all existing transactions that could potentially
-    modify or use the index to terminate.  Thus
-    this method requires more total work than a standard index build and takes
+    <productname>PostgreSQL</productname> must perform a table scan followed by
+    a validation phase, and in addition it must wait for all existing transactions
+    that could potentially modify or use the index to terminate.  Thus
+    this method requires more total work than a standard index build and may take
     significantly longer to complete.  However, since it allows normal
     operations to continue while the index is built, this method is useful for
     adding new indexes in a production environment.  Of course, the extra CPU
@@ -631,14 +631,14 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </para>
 
    <para>
-    In a concurrent index build, the index is actually entered as an
-    <quote>invalid</quote> index into
-    the system catalogs in one transaction, then two table scans occur in
-    two more transactions.  Before each table scan, the index build must
+    In a concurrent index build, the main and auxiliary indexes are actually
+    entered as <quote>invalid</quote> indexes into
+    the system catalogs in one transaction, then two phases occur in
+    multiple transactions.  Before each phase, the index build must
     wait for existing transactions that have modified the table to terminate.
-    After the second scan, the index build must wait for any transactions
+    After the second phase, the index build must wait for any transactions
     that have a snapshot (see <xref linkend="mvcc"/>) predating the second
-    scan to terminate, including transactions used by any phase of concurrent
+    phase to terminate, including transactions used by any phase of concurrent
     index builds on other tables, if the indexes involved are partial or have
     columns that are not simple column references.
     Then finally the index can be marked <quote>valid</quote> and ready for use,
@@ -651,10 +651,11 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    <para>
     If a problem arises while scanning the table, such as a deadlock or a
     uniqueness violation in a unique index, the <command>CREATE INDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> index. This index
-    will be ignored for querying purposes because it might be incomplete;
-    however it will still consume update overhead. The <application>psql</application>
-    <command>\d</command> command will report such an index as <literal>INVALID</literal>:
+    command will fail but leave behind two <quote>invalid</quote> indexes:
+    the main index and its associated auxiliary index. These indexes
+    will be ignored for querying purposes because they might be incomplete;
+    however they will still consume update overhead. The <application>psql</application>
+    <command>\d</command> command will report such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -664,11 +665,12 @@ postgres=# \d tab
  col    | integer |           |          |
 Indexes:
     "idx" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     The recommended recovery
-    method in such cases is to drop the index and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>.  (Another possibility is
+    method in such cases is to drop these indexes and try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
     to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
    </para>
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 185cd75ca30..56c9a0fe1f3 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -368,9 +368,8 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <productname>PostgreSQL</productname> supports rebuilding indexes with minimum locking
     of writes.  This method is invoked by specifying the
     <literal>CONCURRENTLY</literal> option of <command>REINDEX</command>. When this option
-    is used, <productname>PostgreSQL</productname> must perform two scans of the table
-    for each index that needs to be rebuilt and wait for termination of
-    all existing transactions that could potentially use the index.
+    is used, <productname>PostgreSQL</productname> must perform several steps to ensure data
+    consistency while allowing normal operations to continue.
     This method requires more total work than a standard index
     rebuild and takes significantly longer to complete as it needs to wait
     for unfinished transactions that might modify the index. However, since
@@ -388,7 +387,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
     <orderedlist>
      <listitem>
       <para>
-       A new transient index definition is added to the catalog
+       A new transient index definition and an auxiliary index are added to the catalog
        <literal>pg_index</literal>.  This definition will be used to replace
        the old index.  A <literal>SHARE UPDATE EXCLUSIVE</literal> lock at
        session level is taken on the indexes being reindexed as well as their
@@ -398,7 +397,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       A first pass to build the index is done for each new index.  Once the
+       The auxiliary index is marked as "ready for inserts", making
+       it visible to other sessions. This index efficiently tracks all new
+       tuples during the reindex process.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The new main index is built by scanning the table.  Once the
        index is built, its flag <literal>pg_index.indisready</literal> is
        switched to <quote>true</quote> to make it ready for inserts, making it
        visible to other sessions once the transaction that performed the build
@@ -409,9 +416,9 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       Then a second pass is performed to add tuples that were added while the
-       first pass was running.  This step is also done in a separate
-       transaction for each index.
+       A validation phase merges any missing entries from the auxiliary index
+       into the main index, ensuring all concurrent changes are captured.
+       This step is also done in a separate transaction for each index.
       </para>
      </listitem>
 
@@ -428,7 +435,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes have <literal>pg_index.indisready</literal> switched to
+       The old and auxiliary indexes have <literal>pg_index.indisready</literal> switched to
        <quote>false</quote> to prevent any new tuple insertions, after waiting
        for running queries that might reference the old index to complete.
       </para>
@@ -436,7 +443,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
 
      <listitem>
       <para>
-       The old indexes are dropped.  The <literal>SHARE UPDATE
+       The old and auxiliary indexes are dropped.  The <literal>SHARE UPDATE
        EXCLUSIVE</literal> session locks for the indexes and the table are
        released.
       </para>
@@ -447,11 +454,11 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
    <para>
     If a problem arises while rebuilding the indexes, such as a
     uniqueness violation in a unique index, the <command>REINDEX</command>
-    command will fail but leave behind an <quote>invalid</quote> new index in addition to
-    the pre-existing one. This index will be ignored for querying purposes
-    because it might be incomplete; however it will still consume update
+    command will fail but leave behind an <quote>invalid</quote> new index and an <quote>invalid</quote> auxiliary index in addition to
+    the pre-existing one. These indexes will be ignored for querying purposes
+    because they might be incomplete; however they will still consume update
     overhead. The <application>psql</application> <command>\d</command> command will report
-    such an index as <literal>INVALID</literal>:
+    such indexes as <literal>INVALID</literal>:
 
 <programlisting>
 postgres=# \d tab
@@ -462,12 +469,13 @@ postgres=# \d tab
 Indexes:
     "idx" btree (col)
     "idx_ccnew" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>
 
     If the index marked <literal>INVALID</literal> is suffixed
-    <literal>_ccnew</literal>, then it corresponds to the transient
+    <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
-    recovery method is to drop it using <literal>DROP INDEX</literal>,
+    recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
     then attempt <command>REINDEX CONCURRENTLY</command> again.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 74e407f375a..b1c797517ee 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -375,6 +375,11 @@ constraint on which updates can be HOT.  Other transactions must include
 such an index when determining HOT-safety of updates, even though they
 must ignore it for both insertion and searching purposes.
 
+Also, special auxiliary index is created the same way. It is marked as
+"ready for inserts" without any actual table scan. Its purpose is to collect
+new tuples inserted into table while our target index is still "not ready
+for inserts".
+
 We must do this to avoid making incorrect index entries.  For example,
 suppose we are building an index on column X and we make an index entry for
 a non-HOT tuple with X=1.  Then some other backend, unaware that X is an
@@ -394,10 +399,10 @@ entry at the root of the HOT-update chain but we use the key value from the
 live tuple.
 
 We mark the index open for inserts (but still not ready for reads) then
-we again wait for transactions which have the table open.  Then we take
-a second reference snapshot and validate the index.  This searches for
-tuples missing from the index, and inserts any missing ones.  Again,
-the index entries have to have TIDs equal to HOT-chain root TIDs, but
+we again wait for transactions which have the table open.  Then validate
+the index.  This searches for tuples missing from the index in auxiliary
+index, and inserts any missing ones if they are visible to reference snapshot.
+Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 20d3b46e062..8cbc4855078 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -42,11 +42,16 @@
 #include "storage/lmgr.h"
 #include "storage/lock.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/tuplesort.h"
+#include "utils/tuplestore.h"
+
+/* GUC: percentage of maintenance_work_mem for CIC validation tuplestore */
+int			debug_cic_validate_store_mem_pct = 10;
 
 static void reform_and_rewrite_tuple(HeapTuple tuple,
 									 Relation OldHeap, Relation NewHeap,
@@ -1714,242 +1719,422 @@ heapam_index_build_range_scan(Relation heapRelation,
 	return reltuples;
 }
 
+/*
+ * Calculate set difference (relative complement) of main and aux
+ * sets.
+ *
+ * All records which are present in auxiliary tuplesort but not in
+ * main are added to the store.
+ *
+ * In set theory notation store = aux - main or store = aux / main.
+ *
+ * returns number of items added to store
+ */
+static int64
+heapam_index_validate_tuplesort_difference(Tuplesortstate *main,
+										   Tuplesortstate *aux,
+										   Tuplestorestate *store)
+{
+	int64		num = 0;
+	/* state variables for the merge */
+	ItemPointer	indexcursor = NULL,
+					auxindexcursor = NULL;
+	ItemPointerData decoded,
+					auxdecoded;
+	bool			tuplesort_empty = false,
+					auxtuplesort_empty = false;
+
+	/* Initialize pointers. */
+	ItemPointerSetInvalid(&decoded);
+	ItemPointerSetInvalid(&auxdecoded);
+
+	/*
+	 * Main loop: we step through the auxiliary sort (auxState->tuplesort),
+	 * which holds TIDs that must compared to those from the "main" sort
+	 * (state->tuplesort).
+	 */
+	while (!auxtuplesort_empty)
+	{
+		Datum		ts_val;
+		bool		ts_isnull;
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Attempt to fetch the next TID from the auxiliary sort. If it's
+		 * empty, we set auxindexcursor to NULL.
+		 */
+		auxtuplesort_empty = !tuplesort_getdatum(aux, true,
+												 false, &ts_val, &ts_isnull,
+												 NULL);
+		Assert(auxtuplesort_empty || !ts_isnull);
+		if (!auxtuplesort_empty)
+		{
+			itemptr_decode(&auxdecoded, DatumGetInt64(ts_val));
+			auxindexcursor = &auxdecoded;
+		}
+		else
+		{
+			auxindexcursor = NULL;
+		}
+
+		/*
+		 * If the auxiliary sort is not yet empty, we now try to synchronize
+		 * the "main" sort cursor (indexcursor) with auxindexcursor. We advance
+		 * the main sort cursor until we've reached or passed the auxiliary TID.
+		 */
+		if (!auxtuplesort_empty)
+		{
+			/*
+			 * Move the main sort forward while:
+			 *   (1) It's not exhausted (tuplesort_empty == false), and
+			 *   (2) Either indexcursor is NULL (first iteration) or
+			 *       indexcursor < auxindexcursor in TID order.
+			 */
+			while (!tuplesort_empty && (indexcursor == NULL || /* null on first time here */
+						ItemPointerCompare(indexcursor, auxindexcursor) < 0))
+			{
+				/*
+				 * Get the next TID from the main sort. If it's empty,
+				 * we set indexcursor to NULL.
+				 */
+				tuplesort_empty = !tuplesort_getdatum(main, true,
+													  false, &ts_val, &ts_isnull,
+													  NULL);
+				Assert(tuplesort_empty || !ts_isnull);
+
+				if (!tuplesort_empty)
+				{
+					itemptr_decode(&decoded, DatumGetInt64(ts_val));
+					indexcursor = &decoded;
+				}
+				else
+				{
+					indexcursor = NULL;
+				}
+
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * Now, if either:
+			 *  - the main sort is empty, or
+			 *  - indexcursor > auxindexcursor,
+			 *
+			 * then auxindexcursor identifies a TID that doesn't appear in
+			 * the main sort. We likely need to insert it
+			 * into the target index if it’s visible in the heap.
+			 */
+			if (tuplesort_empty || ItemPointerCompare(indexcursor, auxindexcursor) > 0)
+			{
+				tuplestore_putdatum(store, Int64GetDatum(itemptr_encode(auxindexcursor)));
+				num++;
+			}
+		}
+	}
+
+	return num;
+}
+
+typedef struct ValidateIndexScanState
+{
+	Tuplestorestate		*store;
+	BlockNumber			prev_block_number;
+	OffsetNumber		prev_offset_number;
+} ValidateIndexScanState;
+
+/*
+ * This is ReadStreamBlockNumberCB implementation which works as follows:
+ *
+ * 1) It iterates over a sorted tuplestore, where each element is an encoded
+ *    ItemPointer
+ *
+ * 2) It returns the current BlockNumber and collects all OffsetNumbers
+ *    for that block in per_buffer_data.
+ *
+ * 3) Once the code encounters a new BlockNumber, it stops reading more
+ *    offsets and saves the OffsetNumber of the new block for the next call.
+ *
+ * 4) The list of offsets for a block is always terminated with InvalidOffsetNumber.
+ *
+ * This function is intended to be repeatedly called, each time returning
+ * the next block and its corresponding set of offsets.
+ */
+static BlockNumber
+heapam_index_validate_scan_read_stream_next(
+								  ReadStream *stream,
+								  void *void_callback_private_data,
+								  void *void_per_buffer_data
+								  )
+{
+	bool should_free;
+	Datum datum;
+	BlockNumber result = InvalidBlockNumber;
+	int i = 0;
+
+	/*
+	 * Retrieve the specialized callback state and the output buffer.
+	 * callback_private_data keeps track of the previous block and offset
+	 * from a prior invocation, if any.
+	 */
+	ValidateIndexScanState *callback_private_data = void_callback_private_data;
+	OffsetNumber *per_buffer_data = void_per_buffer_data;
+
+	/*
+	 * If there is a "leftover" offset number from the previous invocation,
+	 * it means we had switched to a new block in the middle of the last call.
+	 * We place that leftover offset number into the buffer first.
+	 */
+	if (callback_private_data->prev_offset_number != InvalidOffsetNumber)
+	{
+		Assert(callback_private_data->prev_block_number != InvalidBlockNumber);
+		/*
+		 * 'result' is the block number to return. We set it to the block
+		 * from the previous leftover offset.
+		 */
+		result = callback_private_data->prev_block_number;
+		/* Place leftover offset number in the output buffer. */
+		per_buffer_data[i++] = callback_private_data->prev_offset_number;
+		/*
+		 * Clear the leftover offset number so it won't be reused unless
+		 * we encounter another block change.
+		 */
+		callback_private_data->prev_offset_number = InvalidOffsetNumber;
+	}
+
+	/*
+	 * Read from the tuplestore until we either run out of tuples or we
+	 * encounter a block change. For each tuple:
+	 *
+	 *   1) Decode its block/offset from the Datum.
+	 *   2) If it's the first time in this call (prev_block_number == InvalidBlockNumber),
+	 *      initialize prev_block_number.
+	 *   3) If the block number matches the current block, collect the offset.
+	 *   4) If the block number differs, save that offset as leftover and break
+	 *      so that the next call can handle the new block.
+	 */
+	while (tuplestore_getdatum(callback_private_data->store, true, &should_free, &datum))
+	{
+		BlockNumber next_block_number;
+		ItemPointerData next_data;
+
+		/* Decode the datum into an ItemPointer (block + offset). */
+		itemptr_decode(&next_data, DatumGetInt64(datum));
+		next_block_number = ItemPointerGetBlockNumber(&next_data);
+
+		/*
+		 * If we haven't set a block number yet this round, initialize it
+		 * using the first tuple we read.
+		 */
+		if (callback_private_data->prev_block_number == InvalidBlockNumber)
+			callback_private_data->prev_block_number = next_block_number;
+
+		/*
+		 * Always set the result to be the "current" block number
+		 * we are filling offsets for.
+		 */
+		result = callback_private_data->prev_block_number;
+
+		/*
+		 * If this tuple is from the same block, just store its offset
+		 * in our per_buffer_data array.
+		 */
+		if (next_block_number == callback_private_data->prev_block_number)
+		{
+			per_buffer_data[i++] = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+		}
+		else
+		{
+			/*
+			 * If the block just changed, store the offset of the new block
+			 * as leftover for the next invocation and break out.
+			 */
+			callback_private_data->prev_block_number = next_block_number;
+			callback_private_data->prev_offset_number = ItemPointerGetOffsetNumber(&next_data);
+
+			/* Free the datum if needed. */
+			if (should_free)
+				pfree(DatumGetPointer(datum));
+
+			/* Break to let the next call handle the new block. */
+			break;
+		}
+	}
+
+	/*
+	 * Terminate the list of offsets for this block with an InvalidOffsetNumber.
+	 */
+	per_buffer_data[i] = InvalidOffsetNumber;
+	return result;
+}
+
 static void
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
 						   Snapshot snapshot,
-						   ValidateIndexState *state)
+						   ValidateIndexState *state,
+						   ValidateIndexState *auxState)
 {
-	TableScanDesc scan;
-	HeapScanDesc hscan;
-	HeapTuple	heapTuple;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-	BlockNumber previous_blkno = InvalidBlockNumber;
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
+
+	TupleTableSlot  *slot;
+	EState			*estate;
+	ExprContext		*econtext;
+	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	int64			num_to_check;
+	Tuplestorestate *tuples_for_check;
+	ValidateIndexScanState callback_private_data;
+
+	Buffer buf;
+	OffsetNumber *tuples;
+	ReadStream *read_stream;
+
+	/* Use a percentage of maintenance_work_mem for tuple store. */
+	int		store_work_mem_part = maintenance_work_mem * debug_cic_validate_store_mem_pct / 100;
+
+	/*
+	 * Encode TIDs as int8 values for the sort, rather than directly sorting
+	 * item pointers.  This can be significantly faster, primarily because TID
+	 * is a pass-by-reference type on all platforms, whereas int8 is
+	 * pass-by-value on most platforms.
+	 */
+	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
 	/*
 	 * sanity checks
 	 */
 	Assert(OidIsValid(indexRelation->rd_rel->relam));
 
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
+	num_to_check = heapam_index_validate_tuplesort_difference(state->tuplesort,
+														 auxState->tuplesort,
+														 tuples_for_check);
+
+	/* It is our responsibility to close tuple sort as fast as we can */
+	tuplesort_end(state->tuplesort);
+	tuplesort_end(auxState->tuplesort);
+
+	state->tuplesort = auxState->tuplesort = NULL;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+									&TTSOpsBufferHeapTuple);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
 
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+	callback_private_data.prev_block_number = InvalidBlockNumber;
+	callback_private_data.store = tuples_for_check;
+	callback_private_data.prev_offset_number = InvalidOffsetNumber;
 
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	scan = table_beginscan_strat(heapRelation,	/* relation */
-								 snapshot,	/* snapshot */
-								 0, /* number of keys */
-								 NULL,	/* scan key */
-								 true,	/* buffer access strategy OK */
-								 false);	/* syncscan not OK */
-	hscan = (HeapScanDesc) scan;
+	read_stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING,
+														 bstrategy,
+														 heapRelation, MAIN_FORKNUM,
+														 heapam_index_validate_scan_read_stream_next,
+														 &callback_private_data,
+														 (MaxHeapTuplesPerPage + 1) * sizeof(OffsetNumber));
 
-	pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_TOTAL,
-								 hscan->rs_nblocks);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_TOTAL, num_to_check);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE, 0);
 
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while ((buf = read_stream_next_buffer(read_stream, (void **) &tuples)) != InvalidBuffer)
 	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
+		HeapTupleData	heap_tuple_data[MaxHeapTuplesPerPage];
+		int i;
+		OffsetNumber off;
+		BlockNumber block_number;
 
 		CHECK_FOR_INTERRUPTS();
 
-		state->htups += 1;
+		LockBuffer(buf, BUFFER_LOCK_SHARE);
+		block_number = BufferGetBlockNumber(buf);
 
-		if ((previous_blkno == InvalidBlockNumber) ||
-			(hscan->rs_cblock != previous_blkno))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
-										 hscan->rs_cblock);
-			previous_blkno = hscan->rs_cblock;
+			ItemPointerData tid;
+			bool		all_dead, found;
+			ItemPointerSet(&tid, block_number, off);
+
+			found = heap_hot_search_buffer(&tid, heapRelation, buf, snapshot,
+										   &heap_tuple_data[i], &all_dead, true);
+			if (!found)
+				ItemPointerSetInvalid(&heap_tuple_data[i].t_self);
+			i++;
+			state->htups += 1;
 		}
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		/*
-		 * As commented in table_index_build_scan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
+		 * It is safe to access tuple data after releasing the buffer lock
+		 * because the buffer pin is still held, and the only operation that
+		 * could physically move tuple data on the page is
+		 * PageRepairFragmentation via heap_page_prune.  VACUUM conflicts with
+		 * CIC (both take ShareUpdateExclusiveLock), and opportunistic pruning
+		 * from concurrent DML cannot affect root tuples we are referencing.
 		 */
-		if (hscan->rs_cblock != root_blkno)
-		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
 		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
+		 * No predicate evaluation is needed here: the auxiliary STIR index
+		 * only contains TIDs for tuples that already satisfied the partial
+		 * index predicate at DML time (checked in ExecInsertIndexTuples).
 		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
+		i = 0;
+		while ((off = tuples[i]) != InvalidOffsetNumber)
 		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
+			if (ItemPointerIsValid(&heap_tuple_data[i].t_self))
 			{
+				ItemPointerData root_tid;
+				ItemPointerSet(&root_tid, block_number, off);
+
+				/* Reset the per-tuple memory context for the next fetch. */
+				MemoryContextReset(econtext->ecxt_per_tuple_memory);
+				ExecStoreBufferHeapTuple(&heap_tuple_data[i], slot, buf);
+
+				/* Compute the key values and null flags for this tuple. */
+				FormIndexDatum(indexInfo,
+							   slot,
+							   estate,
+							   values,
+							   isnull);
+
 				/*
-				 * Remember index items seen earlier on the current heap page
+				 * Insert the tuple into the target index.
 				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+				index_insert(indexRelation,
+							 values,
+							 isnull,
+							 &root_tid, /* insert root tuple */
+							 heapRelation,
+							 indexInfo->ii_Unique ?
+							 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+							 false,
+							 indexInfo);
+
+				state->tups_inserted += 1;
 			}
 
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  false, &ts_val, &ts_isnull,
-												  NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_TUPLES_DONE, 1);
+			i++;
 		}
 
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 false,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
+		ReleaseBuffer(buf);
 	}
 
-	table_endscan(scan);
-
 	ExecDropSingleTupleTableSlot(slot);
 
 	FreeExecutorState(estate);
 
+	read_stream_end(read_stream);
+	tuplestore_end(tuples_for_check);
+
+	FreeAccessStrategy(bstrategy);
+
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index cc067e58d36..b1417ec05c6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -715,6 +715,8 @@ UpdateIndexRelation(Oid indexoid,
  *			already exists.
  *		INDEX_CREATE_PARTITIONED:
  *			create a partitioned index (table must be partitioned)
+ *		INDEX_CREATE_AUXILIARY:
+ *			mark index as auxiliary index
  *		INDEX_CREATE_SUPPRESS_PROGRESS:
  *			don't report progress during the index build.
  *
@@ -723,6 +725,9 @@ UpdateIndexRelation(Oid indexoid,
  * allow_system_table_mods: allow table to be a system catalog
  * is_internal: if true, post creation hook for new index
  * constraintId: if not NULL, receives OID of created constraint
+ * relpersistence: persistence level to use for index. In most of the
+ *		cases it should be equal to the persistence level of the table,
+ *		auxiliary indexes are only exception here.
  *
  * Returns the OID of the created index.
  */
@@ -763,6 +768,7 @@ index_create(Relation heapRelation,
 	bool		invalid = (flags & INDEX_CREATE_INVALID) != 0;
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
+	bool		auxiliary = (flags & INDEX_CREATE_AUXILIARY) != 0;
 	bool		progress = (flags & INDEX_CREATE_SUPPRESS_PROGRESS) == 0;
 	char		relkind;
 	TransactionId relfrozenxid;
@@ -789,7 +795,10 @@ index_create(Relation heapRelation,
 	namespaceId = RelationGetNamespace(heapRelation);
 	shared_relation = heapRelation->rd_rel->relisshared;
 	mapped_relation = RelationIsMapped(heapRelation);
-	relpersistence = heapRelation->rd_rel->relpersistence;
+	if (auxiliary)
+		relpersistence = RELPERSISTENCE_UNLOGGED; /* aux indexes are always unlogged */
+	else
+		relpersistence = heapRelation->rd_rel->relpersistence;
 
 	/*
 	 * check parameters
@@ -797,6 +806,11 @@ index_create(Relation heapRelation,
 	if (indexInfo->ii_NumIndexAttrs < 1)
 		elog(ERROR, "must index at least one column");
 
+	if (indexInfo->ii_Am == STIR_AM_OID && !auxiliary)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("user-defined indexes with STIR access method are not supported")));
+
 	if (!allow_system_table_mods &&
 		IsSystemRelation(heapRelation) &&
 		IsNormalProcessingMode())
@@ -1402,7 +1416,8 @@ index_create_copy(Relation heapRelation, uint16 flags,
 							!concurrently,	/* isready */
 							concurrently,	/* concurrent */
 							indexRelation->rd_indam->amsummarizing,
-							oldInfo->ii_WithoutOverlaps);
+							oldInfo->ii_WithoutOverlaps,
+							false);
 
 	/* fetch exclusion constraint info if any */
 	if (indexRelation->rd_index->indisexclusion)
@@ -1422,13 +1437,16 @@ index_create_copy(Relation heapRelation, uint16 flags,
 	 * index information.  All this information will be used for the index
 	 * creation.
 	 */
-	for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
 	{
 		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
-		Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
 
-		indexColNames = lappend(indexColNames, NameStr(att->attname));
-		newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+		{
+			Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+			indexColNames = lappend(indexColNames, NameStr(att->attname));
+			newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		}
 	}
 
 	/* Extract opclass options for each attribute */
@@ -1490,6 +1508,157 @@ index_create_copy(Relation heapRelation, uint16 flags,
 	return newIndexId;
 }
 
+/*
+ * index_concurrently_create_aux
+ *
+ * Create concurrently an auxiliary index based on the definition of the one
+ * provided by caller.  The index is inserted into catalogs and needs to be
+ * built later on. This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ */
+Oid
+index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
+							   Oid tablespaceOid, const char *newName)
+{
+	Relation	indexRelation;
+	IndexInfo  *oldInfo,
+			*newInfo;
+	Oid			newIndexId = InvalidOid;
+	HeapTuple	indexTuple;
+
+	List	   *indexColNames = NIL;
+	List	   *indexExprs = NIL;
+	List	   *indexPreds = NIL;
+
+	Oid *auxOpclassIds;
+	int16 *auxColoptions;
+
+	indexRelation = index_open(mainIndexId, RowExclusiveLock);
+
+	/* The new index needs some information from the old index */
+	oldInfo = BuildIndexInfo(indexRelation);
+
+	/*
+	 * Build of an auxiliary index with exclusion constraints is not
+	 * supported.
+	 */
+	if (oldInfo->ii_ExclusionOps != NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("auxiliary index creation for exclusion constraints is not supported")));
+
+	/* Get the array of class and column options IDs from index info */
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(mainIndexId));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", mainIndexId);
+
+
+	/*
+	 * Fetch the list of expressions and predicates directly from the
+	 * catalogs.  This cannot rely on the information from IndexInfo of the
+	 * old index as these have been flattened for the planner.
+	 */
+	if (oldInfo->ii_Expressions != NIL)
+	{
+		Datum		exprDatum;
+		char	   *exprString;
+
+		exprDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indexprs);
+		exprString = TextDatumGetCString(exprDatum);
+		indexExprs = (List *) stringToNode(exprString);
+		pfree(exprString);
+	}
+	if (oldInfo->ii_Predicate != NIL)
+	{
+		Datum		predDatum;
+		char	   *predString;
+
+		predDatum = SysCacheGetAttrNotNull(INDEXRELID, indexTuple,
+										   Anum_pg_index_indpred);
+		predString = TextDatumGetCString(predDatum);
+		indexPreds = (List *) stringToNode(predString);
+
+		/* Also convert to implicit-AND format */
+		indexPreds = make_ands_implicit((Expr *) indexPreds);
+		pfree(predString);
+	}
+
+	/*
+	 * Build the index information for the new index.  Note that rebuild of
+	 * indexes with exclusion constraints is not supported, hence there is no
+	 * need to fill all the ii_Exclusion* fields.
+	 */
+	newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
+							oldInfo->ii_NumIndexKeyAttrs,
+							STIR_AM_OID, /* special AM for aux indexes */
+							indexExprs,
+							indexPreds,
+							false,	/* aux index are not unique */
+							oldInfo->ii_NullsNotDistinct,
+							false,	/* not ready for inserts */
+							true,
+							false,	/* aux are not summarizing */
+							false,	/* aux are not without overlaps */
+							true	/* auxiliary */);
+
+	/*
+	 * Extract the list of column names and the column numbers for the new
+	 * index information.  All this information will be used for the index
+	 * creation.
+	 */
+	{
+		TupleDesc	indexTupDesc = RelationGetDescr(indexRelation);
+
+		for (int i = 0; i < oldInfo->ii_NumIndexAttrs; i++)
+		{
+			Form_pg_attribute att = TupleDescAttr(indexTupDesc, i);
+
+			indexColNames = lappend(indexColNames, NameStr(att->attname));
+			newInfo->ii_IndexAttrNumbers[i] = oldInfo->ii_IndexAttrNumbers[i];
+		}
+	}
+
+	auxOpclassIds = palloc0(sizeof(Oid) * newInfo->ii_NumIndexAttrs);
+	auxColoptions = palloc0(sizeof(int16) * newInfo->ii_NumIndexAttrs);
+
+	/* Fill with "any ops" */
+	for (int i = 0; i < newInfo->ii_NumIndexAttrs; i++)
+	{
+		auxOpclassIds[i] = ANY_STIR_OPS_OID;
+		auxColoptions[i] = 0;
+	}
+
+	newIndexId = index_create(heapRelation,
+							  newName,
+							  InvalidOid,    /* indexRelationId */
+							  InvalidOid,    /* parentIndexRelid */
+							  InvalidOid,    /* parentConstraintId */
+							  InvalidRelFileNumber, /* relFileNumber */
+							  newInfo,
+							  indexColNames,
+							  STIR_AM_OID,
+							  tablespaceOid,
+							  indexRelation->rd_indcollation,
+							  auxOpclassIds,
+							  NULL,
+							  auxColoptions,
+							  NULL,
+							  (Datum) 0,
+							  INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT | INDEX_CREATE_AUXILIARY,
+							  0,
+							  true, /* allow table to be a system catalog? */
+							  false,    /* is_internal? */
+							  NULL);
+
+	/* Close the relations used and clean up */
+	index_close(indexRelation, NoLock);
+	ReleaseSysCache(indexTuple);
+
+	return newIndexId;
+}
+
 /*
  * index_concurrently_build
  *
@@ -2470,7 +2639,8 @@ BuildIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2530,7 +2700,8 @@ BuildDummyIndexInfo(Relation index)
 					   indexStruct->indisready,
 					   false,
 					   index->rd_indam->amsummarizing,
-					   indexStruct->indisexclusion && indexStruct->indisunique);
+					   indexStruct->indisexclusion && indexStruct->indisunique,
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3309,12 +3480,21 @@ IndexCheckExclusion(Relation heapRelation,
  *
  * We do a concurrent index build by first inserting the catalog entry for the
  * index via index_create(), marking it not indisready and not indisvalid.
+ * Then we create special auxiliary index the same way. It based on STIR AM.
  * Then we commit our transaction and start a new one, then we wait for all
  * transactions that could have been modifying the table to terminate.  Now
- * we know that any subsequently-started transactions will see the index and
+ * we know that any subsequently-started transactions will see indexes and
  * honor its constraints on HOT updates; so while existing HOT-chains might
  * be broken with respect to the index, no currently live tuple will have an
- * incompatible HOT update done to it.  We now build the index normally via
+ * incompatible HOT update done to it.
+ *
+ * After that, we build the auxiliary index. It is fast operation without any actual
+ * table scan. As result, we have empty STIR index. We commit transaction and
+ * again wait for all transactions that could have been modifying the table
+ * to terminate. At that moment all new tuples are going to be inserted into
+ * auxiliary index.
+ *
+ * We now build the index normally via
  * index_build(), while holding a weak lock that allows concurrent
  * insert/update/delete.  Also, we index only tuples that are valid
  * as of the start of the scan (see table_index_build_scan), whereas a normal
@@ -3324,14 +3504,17 @@ IndexCheckExclusion(Relation heapRelation,
  * bogus unique-index failures due to concurrent UPDATEs (we might see
  * different versions of the same row as being valid when we pass over them,
  * if we used HeapTupleSatisfiesVacuum).  This leaves us with an index that
- * does not contain any tuples added to the table while we built the index.
+ * does not contain any tuples added to the table while we built the index
+ * (but these tuples contained in auxiliary index).
  *
  * Next, we mark the index "indisready" (but still not "indisvalid") and
- * commit the second transaction and start a third.  Again we wait for all
+ * commit the third transaction and start a fourth.  Again we wait for all
  * transactions that could have been modifying the table to terminate.  Now
  * we know that any subsequently-started transactions will see the index and
- * insert their new tuples into it.  We then take a new reference snapshot
- * which is passed to validate_index().  Any tuples that are valid according
+ * insert their new tuples into it. At the same moment we clear "indisready" for
+ * auxiliary index, since it is no more required to be updated.
+ *
+ * We then take a new reference snapshot, any tuples that are valid according
  * to this snap, but are not in the index, must be added to the index.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
@@ -3339,12 +3522,14 @@ IndexCheckExclusion(Relation heapRelation,
  * that might care about them before we mark the index valid.)
  *
  * validate_index() works by first gathering all the TIDs currently in the
- * index, using a bulkdelete callback that just stores the TIDs and doesn't
+ * indexes, using a bulkdelete callback that just stores the TIDs and doesn't
  * ever say "delete it".  (This should be faster than a plain indexscan;
  * also, not all index AMs support full-index indexscan.)  Then we sort the
- * TIDs, and finally scan the table doing a "merge join" against the TID list
- * to see which tuples are missing from the index.  Thus we will ensure that
- * all tuples valid according to the reference snapshot are in the index.
+ * TIDs of both auxiliary and target indexes, and doing a "merge join" against
+ * the TID lists to see which tuples from auxiliary index are missing from the
+ * target index.  Thus we will ensure that all tuples valid according to the
+ * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
  * tuple that is already dead or is in process of being deleted, and we
@@ -3362,22 +3547,26 @@ IndexCheckExclusion(Relation heapRelation,
  * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
  * transactions will be able to use it for queries.
  *
- * Doing two full table scans is a brute-force strategy.  We could try to be
- * cleverer, eg storing new tuples in a special area of the table (perhaps
- * making the table append-only by setting use_fsm).  However that would
- * add yet more locking issues.
+ * Also, some actions to concurrent drop the auxiliary index are performed.
  */
 void
-validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 {
 	Relation	heapRelation,
-				indexRelation;
+				indexRelation,
+				auxIndexRelation;
 	IndexInfo  *indexInfo;
-	IndexVacuumInfo ivinfo;
-	ValidateIndexState state;
+	IndexVacuumInfo ivinfo, auxivinfo;
+	ValidateIndexState state, auxState;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	/* Use 80% of maintenance_work_mem to target index sorting and
+	 * 10% rest for auxiliary.
+	 *
+	 * Rest 10% will be used for tuplestore later. */
+	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
+	int			aux_work_mem_part = maintenance_work_mem / 10;
 
 	{
 		const int	progress_index[] = {
@@ -3410,6 +3599,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	RestrictSearchPath();
 
 	indexRelation = index_open(indexId, RowExclusiveLock);
+	auxIndexRelation = index_open(auxIndexId, RowExclusiveLock);
 
 	/*
 	 * Fetch info needed for index_insert.  (You might think this should be
@@ -3434,15 +3624,49 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.strategy = NULL;
 	ivinfo.validate_index = true;
 
+	/*
+	 * Copy all info to auxiliary info, changing only relation.
+	 */
+	auxivinfo = ivinfo;
+	auxivinfo.index = auxIndexRelation;
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
 	 * is a pass-by-reference type on all platforms, whereas int8 is
 	 * pass-by-value on most platforms.
 	 */
+	auxState.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
+										   InvalidOid, false,
+										   aux_work_mem_part,
+										   NULL, TUPLESORT_NONE);
+	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
+
+	(void) index_bulk_delete(&auxivinfo, NULL,
+							 validate_index_callback, &auxState);
+	/* If aux index is empty, merge may be skipped */
+	if (auxState.itups == 0)
+	{
+		tuplesort_end(auxState.tuplesort);
+		auxState.tuplesort = NULL;
+
+		/* Roll back any GUC changes executed by index functions */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+
+		/* Close rels, but keep locks */
+		index_close(auxIndexRelation, NoLock);
+		index_close(indexRelation, NoLock);
+		table_close(heapRelation, NoLock);
+
+		return;
+	}
+
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
 											InvalidOid, false,
-											maintenance_work_mem,
+											(int) main_work_mem_part,
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
@@ -3465,27 +3689,30 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	tuplesort_performsort(auxState.tuplesort);
 
 	/*
-	 * Now scan the heap and "merge" it with the index
+	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN);
+								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
 	table_index_validate_scan(heapRelation,
 							  indexRelation,
 							  indexInfo,
 							  snapshot,
-							  &state);
+							  &state,
+							  &auxState);
 
-	/* Done with tuplesort object */
-	tuplesort_end(state.tuplesort);
+	/* Tuple sort closed by table_index_validate_scan */
+	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
 
 	/* Make sure to release resources cached in indexInfo (if needed). */
 	index_insert_cleanup(indexRelation, indexInfo);
 
 	elog(DEBUG2,
-		 "validate_index found %.0f heap tuples, %.0f index tuples; inserted %.0f missing tuples",
-		 state.htups, state.itups, state.tups_inserted);
+		 "validate_index fetched %.0f heap tuples, %.0f index tuples;"
+						" %.0f aux index tuples; inserted %.0f missing tuples",
+		 state.htups, state.itups, auxState.itups, state.tups_inserted);
 
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
@@ -3494,6 +3721,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
 	/* Close rels, but keep locks */
+	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
 }
@@ -3554,6 +3782,12 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			Assert(!indexForm->indisvalid);
 			indexForm->indisvalid = true;
 			break;
+		case INDEX_DROP_CLEAR_READY:
+			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
+			Assert(indexForm->indisready);
+			Assert(!indexForm->indisvalid);
+			indexForm->indisready = false;
+			break;
 		case INDEX_DROP_CLEAR_VALID:
 
 			/*
@@ -3825,6 +4059,13 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		indexInfo->ii_ExclusionStrats = NULL;
 	}
 
+	/* Auxiliary indexes are not allowed to be rebuilt */
+	if (indexInfo->ii_Auxiliary)
+		ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("reindex of auxiliary index \"%s\" not supported",
+					RelationGetRelationName(iRel))));
+
 	/* Suppress use of the target index while rebuilding it */
 	SetReindexProcessing(heapId, indexId);
 
@@ -4067,6 +4308,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
+		Oid			indexAm = get_rel_relam(indexOid);
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4092,6 +4334,18 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
+		if (indexAm == STIR_AM_OID)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+							get_namespace_name(indexNamespaceId),
+							get_rel_name(indexOid))));
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			continue;
+		}
+
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 73a1c1c4670..23d292aaced 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1409,16 +1409,17 @@ CREATE VIEW pg_stat_progress_create_index AS
                       END AS command,
         CASE S.param10 WHEN 0 THEN 'initializing'
                        WHEN 1 THEN 'waiting for writers before build'
-                       WHEN 2 THEN 'building index' ||
+                       WHEN 2 THEN 'waiting for writers to use auxiliary index'
+                       WHEN 3 THEN 'building index' ||
                            COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)),
                                     '')
-                       WHEN 3 THEN 'waiting for writers before validation'
-                       WHEN 4 THEN 'index validation: scanning index'
-                       WHEN 5 THEN 'index validation: sorting tuples'
-                       WHEN 6 THEN 'index validation: scanning table'
-                       WHEN 7 THEN 'waiting for old snapshots'
-                       WHEN 8 THEN 'waiting for readers before marking dead'
-                       WHEN 9 THEN 'waiting for readers before dropping'
+                       WHEN 4 THEN 'waiting for writers before validation'
+                       WHEN 5 THEN 'index validation: scanning index'
+                       WHEN 6 THEN 'index validation: sorting tuples'
+                       WHEN 7 THEN 'index validation: merging indexes'
+                       WHEN 8 THEN 'waiting for old snapshots'
+                       WHEN 9 THEN 'waiting for readers before marking dead'
+                       WHEN 10 THEN 'waiting for readers before dropping'
                        END as phase,
         S.param4 AS lockers_total,
         S.param5 AS lockers_done,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9ab74c8df0a..2d7b6b7eb8b 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -183,6 +183,7 @@ CheckIndexCompatible(Oid oldId,
 					 bool isWithoutOverlaps)
 {
 	bool		isconstraint;
+	bool		isauxiliary;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
 	Oid		   *opclassIds;
@@ -233,6 +234,7 @@ CheckIndexCompatible(Oid oldId,
 
 	amcanorder = amRoutine->amcanorder;
 	amsummarizing = amRoutine->amsummarizing;
+	isauxiliary = accessMethodId == STIR_AM_OID;
 
 	/*
 	 * Compute the operator classes, collations, and exclusion operators for
@@ -244,7 +246,8 @@ CheckIndexCompatible(Oid oldId,
 	 */
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
-							  false, false, amsummarizing, isWithoutOverlaps);
+							  false, false, amsummarizing,
+							  isWithoutOverlaps, isauxiliary);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -557,6 +560,7 @@ DefineIndex(ParseState *pstate,
 {
 	bool		concurrent;
 	char	   *indexRelationName;
+	char	   *auxIndexRelationName = NULL;
 	char	   *accessMethodName;
 	Oid		   *typeIds;
 	Oid		   *collationIds;
@@ -566,6 +570,7 @@ DefineIndex(ParseState *pstate,
 	Oid			namespaceId;
 	Oid			tablespaceId;
 	Oid			createdConstraintId = InvalidOid;
+	Oid			auxIndexRelationId = InvalidOid;
 	List	   *indexColNames;
 	List	   *allIndexParams;
 	Relation	rel;
@@ -587,6 +592,7 @@ DefineIndex(ParseState *pstate,
 	int			numberOfKeyAttributes;
 	TransactionId limitXmin;
 	ObjectAddress address;
+	ObjectAddress auxAddress;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
@@ -837,6 +843,15 @@ DefineIndex(ParseState *pstate,
 											stmt->excludeOpNames,
 											stmt->primary,
 											stmt->isconstraint);
+	/*
+	 * Select name for auxiliary index
+	 */
+	if (concurrent)
+		auxIndexRelationName = ChooseRelationName(indexRelationName,
+												  NULL,
+												  "ccaux",
+												  namespaceId,
+												  false);
 
 	/*
 	 * look up the access method, verify it can handle the requested features
@@ -931,7 +946,8 @@ DefineIndex(ParseState *pstate,
 							  !concurrent,
 							  concurrent,
 							  amissummarizing,
-							  stmt->iswithoutoverlaps);
+							  stmt->iswithoutoverlaps,
+							  false);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -1603,6 +1619,16 @@ DefineIndex(ParseState *pstate,
 		return address;
 	}
 
+	/*
+	 * In case of concurrent build - create auxiliary index record.
+	 */
+	if (concurrent)
+	{
+		auxIndexRelationId = index_concurrently_create_aux(rel, indexRelationId,
+											tablespaceId, auxIndexRelationName);
+		ObjectAddressSet(auxAddress, RelationRelationId, auxIndexRelationId);
+	}
+
 	AtEOXact_GUC(false, root_save_nestlevel);
 	SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 
@@ -1631,11 +1657,11 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * For a concurrent build, it's important to make the catalog entries
 	 * visible to other transactions before we start to build the index. That
-	 * will prevent them from making incompatible HOT updates.  The new index
-	 * will be marked not indisready and not indisvalid, so that no one else
-	 * tries to either insert into it or use it for queries.
+	 * will prevent them from making incompatible HOT updates. New indexes
+	 * (main and auxiliary) will be marked not indisready and not indisvalid,
+	 * so that no one else tries to either insert into it or use it for queries.
 	 *
-	 * We must commit our current transaction so that the index becomes
+	 * We must commit our current transaction so that the indexes becomes
 	 * visible; then start another.  Note that all the data structures we just
 	 * built are lost in the commit.  The only data we keep past here are the
 	 * relation IDs.
@@ -1645,7 +1671,7 @@ DefineIndex(ParseState *pstate,
 	 * cannot block, even if someone else is waiting for access, because we
 	 * already have the same lock within our transaction.
 	 *
-	 * Note: we don't currently bother with a session lock on the index,
+	 * Note: we don't currently bother with a session lock on the indexes,
 	 * because there are no operations that could change its state while we
 	 * hold lock on the parent table.  This might need to change later.
 	 */
@@ -1684,7 +1710,7 @@ DefineIndex(ParseState *pstate,
 	 * with the old list of indexes.  Use ShareLock to consider running
 	 * transactions that hold locks that permit writing to the table.  Note we
 	 * do not need to worry about xacts that open the table for writing after
-	 * this point; they will see the new index when they open it.
+	 * this point; they will see the new indexes when they open it.
 	 *
 	 * Note: the reason we use actual lock acquisition here, rather than just
 	 * checking the ProcArray and sleeping, is that deadlock is possible if
@@ -1696,14 +1722,44 @@ DefineIndex(ParseState *pstate,
 
 	/*
 	 * At this moment we are sure that there are no transactions with the
-	 * table open for write that don't have this new index in their list of
+	 * table open for write that don't have this new indexes in their list of
 	 * indexes.  We have waited out all the existing transactions and any new
-	 * transaction will have the new index in its list, but the index is still
-	 * marked as "not-ready-for-inserts".  The index is consulted while
+	 * transaction will have both new indexes in its list, but indexes are still
+	 * marked as "not-ready-for-inserts". The indexes are consulted while
 	 * deciding HOT-safety though.  This arrangement ensures that no new HOT
 	 * chains can be created where the new tuple and the old tuple in the
 	 * chain have different index keys.
 	 *
+	 * Now call build on auxiliary index. Index will be created empty without
+	 * any actual heap scan, but marked as "ready-for-inserts". The goal of
+	 * that index is accumulate new tuples while main index is actually built.
+	 */
+
+	/* Set ActiveSnapshot since functions in the indexes may need it */
+	PushActiveSnapshot(GetTransactionSnapshot());
+
+	index_concurrently_build(tableId, auxIndexRelationId);
+	/* we can do away with our snapshot */
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Now we need to ensure there are no transactions with the auxiliary index
+	 * marked as "not-ready-for-inserts".
+	 */
+	WaitForLockers(heaplocktag, ShareLock, true);
+
+	/*
+	 * At this moment we are sure that all new tuples in table are inserted into
+	 * the auxiliary index. Now it is time to build the target index itself.
+	 *
 	 * We now take a new snapshot, and build the index using all tuples that
 	 * are visible in this snapshot.  We can be sure that any HOT updates to
 	 * these tuples will be compatible with the index, since any updates made
@@ -1738,9 +1794,28 @@ DefineIndex(ParseState *pstate,
 	 * the index marked as read-only for updates.
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
 	WaitForLockers(heaplocktag, ShareLock, true);
 
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/*
+	 * Now target index is marked as "ready" for all transactions. So, auxiliary
+	 * index is no longer needed. So, start removing process by reverting "ready"
+	 * flag.
+	 */
+	index_set_state_flags(auxIndexRelationId, INDEX_DROP_CLEAR_READY);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
 	/*
 	 * Now take the "reference snapshot" that will be used by validate_index()
 	 * to filter candidate tuples.  Beware!  There might still be snapshots in
@@ -1758,24 +1833,14 @@ DefineIndex(ParseState *pstate,
 	 */
 	snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	PushActiveSnapshot(snapshot);
-
 	/*
-	 * Scan the index and the heap, insert any missing index entries.
-	 */
-	validate_index(tableId, indexRelationId, snapshot);
-
-	/*
-	 * Drop the reference snapshot.  We must do this before waiting out other
-	 * snapshot holders, else we will deadlock against other processes also
-	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
-	 * they must wait for.  But first, save the snapshot's xmin to use as
-	 * limitXmin for GetCurrentVirtualXIDs().
+	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
+	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
 	limitXmin = snapshot->xmin;
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
-
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1802,7 +1867,7 @@ DefineIndex(ParseState *pstate,
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 	WaitForOlderSnapshots(limitXmin, true);
 
 	/*
@@ -1827,6 +1892,53 @@ DefineIndex(ParseState *pstate,
 	 * to replan; so relcache flush on the index itself was sufficient.)
 	 */
 	CacheInvalidateRelcacheByRelid(heaprelid.relId);
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+	/* Now wait for all transaction to see auxiliary as "non-ready for inserts" */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Updating pg_index might involve TOAST table access, so ensure we
+	 * have a valid snapshot.
+	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
+	/* Now it is time to mark auxiliary index as dead */
+	index_concurrently_set_dead(tableId, auxIndexRelationId);
+	PopActiveSnapshot();
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/* Tell concurrent index builds to ignore us, if index qualifies */
+	if (safe_index)
+		set_indexsafe_procflags();
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_6);
+	/* Now wait for all transaction to ignore auxiliary because it is dead */
+	WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
+	/*
+	 * Drop auxiliary index.
+	 *
+	 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
+	 * right lock level.
+	 */
+	performDeletion(&auxAddress, DROP_RESTRICT,
+							 PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
 
 	/*
 	 * Last thing to do is release the session-level lock on the parent table.
@@ -3598,6 +3710,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	typedef struct ReindexIndexInfo
 	{
 		Oid			indexId;
+		Oid			auxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -3703,8 +3816,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
 														   ShareUpdateExclusiveLock);
+					IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-					if (!indexRelation->rd_index->indisvalid)
+
+					if (indexInfo->ii_Auxiliary)
+						ereport(WARNING,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+									get_namespace_name(get_rel_namespace(cellOid)),
+									get_rel_name(cellOid))));
+					else if (!indexRelation->rd_index->indisvalid)
 						ereport(WARNING,
 								(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 								 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3756,8 +3876,15 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
 															   ShareUpdateExclusiveLock);
+						IndexInfo*	indexInfo = BuildDummyIndexInfo(indexRelation);
 
-						if (!indexRelation->rd_index->indisvalid)
+						if (indexInfo->ii_Auxiliary)
+							ereport(WARNING,
+									(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+									 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
+											get_namespace_name(get_rel_namespace(cellOid)),
+											get_rel_name(cellOid))));
+						else if (!indexRelation->rd_index->indisvalid)
 							ereport(WARNING,
 									(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 									 errmsg("skipping reindex of invalid index \"%s.%s\"",
@@ -3818,6 +3945,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot reindex invalid index on TOAST table")));
 
+				/* Auxiliary indexes are not allowed to be rebuilt */
+				if (get_rel_relam(relationOid) == STIR_AM_OID)
+					ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("reindex of auxiliary index \"%s\" not supported",
+								get_rel_name(relationOid))));
+
 				/*
 				 * Check if parent relation can be locked and if it exists,
 				 * this needs to be done at this stage as the list of indexes
@@ -3921,15 +4055,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	foreach(lc, indexIds)
 	{
 		char	   *concurrentName;
+		char	   *auxConcurrentName;
 		ReindexIndexInfo *idx = lfirst(lc);
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
+		Oid			auxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
 		int			save_sec_context;
 		int			save_nestlevel;
 		Relation	newIndexRel;
+		Relation	auxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -3980,6 +4117,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 											"ccnew",
 											get_rel_namespace(indexRel->rd_index->indrelid),
 											false);
+		auxConcurrentName = ChooseRelationName(get_rel_name(idx->indexId),
+											NULL,
+											"ccaux",
+											get_rel_namespace(indexRel->rd_index->indrelid),
+											false);
 
 		/* Choose the new tablespace, indexes of toast tables are not moved */
 		if (OidIsValid(params->tablespaceOid) &&
@@ -3997,11 +4139,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 									   tablespaceid,
 									   concurrentName);
 
+		auxIndexId = index_concurrently_create_aux(heapRel,
+												   newIndexId,
+												   tablespaceid,
+												   auxConcurrentName);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
+		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4010,6 +4158,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
+		newidx->auxIndexId = auxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4028,10 +4177,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		lockrelid = palloc_object(LockRelId);
+		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
+		relationLocks = lappend(relationLocks, lockrelid);
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
 		/* Roll back any GUC changes executed by index functions */
@@ -4112,13 +4265,60 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * doing that, wait until no running transactions could have the table of
 	 * the index open with the old list of indexes.  See "phase 2" in
 	 * DefineIndex() for more details.
+	*/
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+							 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * Now build all auxiliary indexes and mark them as "ready-for-inserts".
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		StartTransactionCommand();
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/* Set ActiveSnapshot since functions in the indexes may need it */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Build auxiliary index, it is fast - without any actual heap scan, just an empty index. */
+		index_concurrently_build(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+
+	StartTransactionCommand();
+
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_1);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
+	/*
+	 * Wait until all auxiliary indexes are taken into account by all
+	 * transactions.
+	 */
 	WaitForLockersMultiple(lockTags, ShareLock, true);
 	CommitTransactionCommand();
 
+	/* Now it is time to perform target index build. */
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4165,6 +4365,41 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * need to set the PROC_IN_SAFE_IC flag here.
 	 */
 
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+	WaitForLockersMultiple(lockTags, ShareLock, true);
+	CommitTransactionCommand();
+
+	/*
+	 * At this moment all target indexes are marked as "ready-to-insert". So,
+	 * we are free to start process of dropping auxiliary indexes.
+	 */
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+		StartTransactionCommand();
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		PopActiveSnapshot();
+
+		CommitTransactionCommand();
+	}
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -4172,12 +4407,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	 * were created during the previous phase.  See "phase 3" in DefineIndex()
 	 * for more details.
 	 */
-
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
-	WaitForLockersMultiple(lockTags, ShareLock, true);
-	CommitTransactionCommand();
-
 	foreach(lc, newIndexIds)
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
@@ -4215,7 +4444,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -4244,7 +4473,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * there's no need to set the PROC_IN_SAFE_IC flag here.
 		 */
 		pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-									 PROGRESS_CREATEIDX_PHASE_WAIT_3);
+									 PROGRESS_CREATEIDX_PHASE_WAIT_4);
 		WaitForOlderSnapshots(limitXmin, true);
 
 		CommitTransactionCommand();
@@ -4335,14 +4564,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 5 of REINDEX CONCURRENTLY
 	 *
-	 * Mark the old indexes as dead.  First we must wait until no running
-	 * transaction could be using the index for a query.  See also
+	 * Mark the old and auxiliary indexes as dead. First we must wait until no
+	 * running transaction could be using the index for a query.  See also
 	 * index_drop() for more details.
 	 */
 
 	INJECTION_POINT("reindex-relation-concurrently-before-set-dead", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_4);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	foreach(lc, indexIds)
@@ -4367,6 +4596,28 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PopActiveSnapshot();
 	}
 
+	foreach(lc, newIndexIds)
+	{
+		ReindexIndexInfo *newidx = lfirst(lc);
+
+		/*
+		 * Check for user-requested abort.  This is inside a transaction so as
+		 * xact.c does not issue a useless WARNING, and ensures that
+		 * session-level locks are cleaned up on abort.
+		 */
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Updating pg_index might involve TOAST table access, so ensure we
+		 * have a valid snapshot.
+		 */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+
+		PopActiveSnapshot();
+	}
+
 	/* Commit this transaction to make the updates visible. */
 	CommitTransactionCommand();
 	StartTransactionCommand();
@@ -4380,11 +4631,11 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	/*
 	 * Phase 6 of REINDEX CONCURRENTLY
 	 *
-	 * Drop the old indexes.
+	 * Drop the old and auxiliary indexes.
 	 */
 
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_5);
+								 PROGRESS_CREATEIDX_PHASE_WAIT_6);
 	WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -4404,6 +4655,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			add_exact_object_address(&object, objects);
 		}
 
+		foreach(lc, newIndexIds)
+		{
+			ReindexIndexInfo *idx = lfirst(lc);
+			ObjectAddress object;
+
+			object.classId = RelationRelationId;
+			object.objectId = idx->auxIndexId;
+			object.objectSubId = 0;
+
+			add_exact_object_address(&object, objects);
+		}
+
 		/*
 		 * Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
 		 * right lock level.
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 5359dab1176..84f7cf9824e 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps)
+			  bool withoutoverlaps, bool auxiliary)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -850,6 +850,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Concurrent = concurrent;
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
+	n->ii_Auxiliary = auxiliary;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
@@ -875,7 +876,6 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
-	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 83af594d4af..3477866d729 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -640,6 +640,15 @@
   boot_val => 'DEFAULT_ASSERT_ENABLED',
 },
 
+{ name => 'debug_cic_validate_store_mem_pct', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Percentage of maintenance_work_mem used for CIC validation tuplestore.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'debug_cic_validate_store_mem_pct',
+  boot_val => '10',
+  min => '1',
+  max => '90',
+},
+
 { name => 'debug_copy_parse_plan_trees', type => 'bool', context => 'PGC_SUSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Set this to force all parse and plan trees to be passed through copyObject(), to facilitate catching errors and omissions in copyObject().',
   flags => 'GUC_NOT_IN_SAMPLE',
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index c13f05d39db..da3598663bc 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -743,7 +743,8 @@ typedef struct TableAmRoutine
 										Relation index_rel,
 										IndexInfo *index_info,
 										Snapshot snapshot,
-										ValidateIndexState *state);
+										ValidateIndexState *state,
+										ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1906,19 +1907,24 @@ table_index_build_range_scan(Relation table_rel,
  * table_index_validate_scan - second table scan for concurrent index build
  *
  * See validate_index() for an explanation.
+ *
+ * Note: it is responsibility of that function to close sortstates in
+ * both `state` and `auxstate`.
  */
 static inline void
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
 						  Snapshot snapshot,
-						  ValidateIndexState *state)
+						  ValidateIndexState *state,
+						  ValidateIndexState *auxstate)
 {
 	table_rel->rd_tableam->index_validate_scan(table_rel,
 											   index_rel,
 											   index_info,
 											   snapshot,
-											   state);
+											   state,
+											   auxstate);
 }
 
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9aee8226347..3239e5c716f 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -31,6 +31,7 @@ typedef enum
 {
 	INDEX_CREATE_SET_READY,
 	INDEX_CREATE_SET_VALID,
+	INDEX_DROP_CLEAR_READY,
 	INDEX_DROP_CLEAR_VALID,
 	INDEX_DROP_SET_DEAD,
 } IndexStateFlagsAction;
@@ -72,6 +73,7 @@ extern void index_check_primary_key(Relation heapRel,
 #define	INDEX_CREATE_PARTITIONED			(1 << 5)
 #define INDEX_CREATE_INVALID				(1 << 6)
 #define INDEX_CREATE_SUPPRESS_PROGRESS		(1 << 7)
+#define INDEX_CREATE_AUXILIARY				(1 << 8)
 
 extern Oid	index_create(Relation heapRelation,
 						 const char *indexRelationName,
@@ -106,6 +108,11 @@ extern Oid	index_create_copy(Relation heapRelation, uint16 flags,
 							  Oid oldIndexId, Oid tablespaceOid,
 							  const char *newName);
 
+extern Oid	index_concurrently_create_aux(Relation heapRelation,
+										  Oid mainIndexId,
+										  Oid tablespaceOid,
+										  const char *newName);
+
 extern void index_concurrently_build(Oid heapRelationId,
 									 Oid indexRelationId);
 
@@ -152,7 +159,7 @@ extern void index_build(Relation heapRelation,
 						bool parallel,
 						bool progress);
 
-extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
+extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 2a12920c75f..daac9f4f34e 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -120,14 +120,15 @@
 
 /* Phases of CREATE INDEX (as advertised via PROGRESS_CREATEIDX_PHASE) */
 #define PROGRESS_CREATEIDX_PHASE_WAIT_1			1
-#define PROGRESS_CREATEIDX_PHASE_BUILD			2
-#define PROGRESS_CREATEIDX_PHASE_WAIT_2			3
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	4
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		5
-#define PROGRESS_CREATEIDX_PHASE_VALIDATE_TABLESCAN	6
-#define PROGRESS_CREATEIDX_PHASE_WAIT_3			7
+#define PROGRESS_CREATEIDX_PHASE_WAIT_2			2
+#define PROGRESS_CREATEIDX_PHASE_BUILD			3
+#define PROGRESS_CREATEIDX_PHASE_WAIT_3			4
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN	5
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_SORT		6
+#define PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE	7
 #define PROGRESS_CREATEIDX_PHASE_WAIT_4			8
 #define PROGRESS_CREATEIDX_PHASE_WAIT_5			9
+#define PROGRESS_CREATEIDX_PHASE_WAIT_6			10
 
 /*
  * Subphases of CREATE INDEX, for index_build.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 8ccdf61246b..8c2b3a9c5e7 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -271,6 +271,7 @@ extern PGDLLIMPORT bool allowSystemTableMods;
 extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
+extern PGDLLIMPORT int debug_cic_validate_store_mem_pct;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
 
 /*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index bf54d39feb0..cd7f1eb0592 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -99,7 +99,8 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								List *expressions, List *predicates,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
-								bool summarizing, bool withoutoverlaps);
+								bool summarizing, bool withoutoverlaps,
+								bool auxiliary);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 55538c4c41e..937c3b48a0f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1437,6 +1437,7 @@ DETAIL:  Key (f1)=(b) already exists.
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
 ERROR:  could not create unique index "concur_index3"
 DETAIL:  Key (f2)=(b) is duplicated.
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -3211,6 +3212,7 @@ INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
 ERROR:  could not create unique index "concur_reindex_ind5"
 DETAIL:  Key (c1)=(1) is duplicated.
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
@@ -3223,8 +3225,10 @@ DETAIL:  Key (c1)=(1) is duplicated.
  c1     | integer |           |          | 
 Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1) INVALID
+    "concur_reindex_ind5_ccaux" stir (c1) INVALID
     "concur_reindex_ind5_ccnew" UNIQUE, btree (c1) INVALID
 
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -3252,6 +3256,37 @@ Indexes:
     "concur_reindex_ind5" UNIQUE, btree (c1)
 
 DROP TABLE concur_reindex_tab4;
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index f50868ca6a6..b34009f868c 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1585,10 +1585,11 @@ select indexrelid::regclass, indisvalid,
 --------------------------------+------------+-----------------------+-------------------------------
  parted_isvalid_idx             | f          | parted_isvalid_tab    | 
  parted_isvalid_idx_11          | f          | parted_isvalid_tab_11 | parted_isvalid_tab_1_expr_idx
+ parted_isvalid_idx_11_ccaux    | f          | parted_isvalid_tab_11 | 
  parted_isvalid_tab_12_expr_idx | t          | parted_isvalid_tab_12 | parted_isvalid_tab_1_expr_idx
  parted_isvalid_tab_1_expr_idx  | f          | parted_isvalid_tab_1  | parted_isvalid_idx
  parted_isvalid_tab_2_expr_idx  | t          | parted_isvalid_tab_2  | parted_isvalid_idx
-(5 rows)
+(6 rows)
 
 drop table parted_isvalid_tab;
 -- Check state of replica indexes when attaching a partition.
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a65a5bf0c4f..9800b9f1440 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2079,14 +2079,15 @@ pg_stat_progress_create_index| SELECT s.pid,
         CASE s.param10
             WHEN 0 THEN 'initializing'::text
             WHEN 1 THEN 'waiting for writers before build'::text
-            WHEN 2 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
-            WHEN 3 THEN 'waiting for writers before validation'::text
-            WHEN 4 THEN 'index validation: scanning index'::text
-            WHEN 5 THEN 'index validation: sorting tuples'::text
-            WHEN 6 THEN 'index validation: scanning table'::text
-            WHEN 7 THEN 'waiting for old snapshots'::text
-            WHEN 8 THEN 'waiting for readers before marking dead'::text
-            WHEN 9 THEN 'waiting for readers before dropping'::text
+            WHEN 2 THEN 'waiting for writers to use auxiliary index'::text
+            WHEN 3 THEN ('building index'::text || COALESCE((': '::text || pg_indexam_progress_phasename((s.param9)::oid, s.param11)), ''::text))
+            WHEN 4 THEN 'waiting for writers before validation'::text
+            WHEN 5 THEN 'index validation: scanning index'::text
+            WHEN 6 THEN 'index validation: sorting tuples'::text
+            WHEN 7 THEN 'index validation: merging indexes'::text
+            WHEN 8 THEN 'waiting for old snapshots'::text
+            WHEN 9 THEN 'waiting for readers before marking dead'::text
+            WHEN 10 THEN 'waiting for readers before dropping'::text
             ELSE NULL::text
         END AS phase,
     s.param4 AS lockers_total,
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 82e4062a215..805d2eb8485 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -503,6 +503,7 @@ CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS concur_index2 ON concur_heap(f1);
 INSERT INTO concur_heap VALUES ('b','x');
 -- check if constraint is enforced properly at build time
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
+DROP INDEX concur_index3_ccaux;
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
@@ -1315,10 +1316,12 @@ CREATE TABLE concur_reindex_tab4 (c1 int);
 INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 -- This trick creates an invalid index.
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
+DROP INDEX concur_reindex_ind5_ccaux;
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
+DROP INDEX concur_reindex_ind5_ccaux;
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
@@ -1330,6 +1333,24 @@ REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
 \d concur_reindex_tab4
 DROP TABLE concur_reindex_tab4;
 
+-- Check handling of auxiliary indexes
+CREATE TABLE aux_index_tab5 (c1 int);
+INSERT INTO aux_index_tab5 VALUES (1), (1), (2);
+-- This trick creates an invalid index and auxiliary index for it
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+\d aux_index_tab5
+-- Not allowed to reindex auxiliary index
+REINDEX INDEX aux_index_ind6_ccaux;
+-- Concurrently also
+REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex
+REINDEX TABLE aux_index_tab5;
+-- Should be skipped during concurrent reindex
+REINDEX TABLE CONCURRENTLY aux_index_tab5;
+DROP TABLE aux_index_tab5;
+
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
 -- definitions.
-- 
2.43.0



  [application/octet-stream] v35-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch (31.6K, 3-v35-0005-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch)
  download | inline diff:
From a3cb8e3e33c03904d455ac986cc0ee0be41ad0e4 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 31 Dec 2024 14:36:31 +0100
Subject: [PATCH v35 5/7] Track and drop auxiliary indexes in DROP/REINDEX

During concurrent index operations, auxiliary indexes may be left as orphaned objects when errors occur (junk auxiliary indexes).

This patch improves the handling of such auxiliary indexes:
- add auxiliaryForIndexId parameter to index_create() to track dependencies between main and auxiliary indexes
- automatically drop auxiliary indexes when the main index is dropped
- delete junk auxiliary indexes properly during REINDEX operations
---
 doc/src/sgml/ref/create_index.sgml         |  14 ++-
 doc/src/sgml/ref/reindex.sgml              |  10 +-
 src/backend/catalog/dependency.c           |   2 +-
 src/backend/catalog/index.c                |  78 ++++++++++++----
 src/backend/catalog/pg_depend.c            |  62 +++++++++++++
 src/backend/catalog/toasting.c             |   1 +
 src/backend/commands/indexcmds.c           |  37 +++++++-
 src/backend/commands/tablecmds.c           |  52 ++++++++++-
 src/backend/nodes/makefuncs.c              |   3 +-
 src/include/catalog/dependency.h           |   1 +
 src/include/nodes/execnodes.h              |   2 +
 src/include/nodes/makefuncs.h              |   2 +-
 src/test/regress/expected/create_index.out | 102 ++++++++++++++++++++-
 src/test/regress/sql/create_index.sql      |  55 ++++++++++-
 14 files changed, 382 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 901c6cf22bc..b0407c840b3 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -668,10 +668,16 @@ Indexes:
     "idx_ccaux" stir (col) INVALID
 </programlisting>
 
-    The recommended recovery
-    method in such cases is to drop these indexes and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
-    to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
+    The recommended recovery method in such cases is to drop the main index with
+    <command>DROP INDEX</command>. The auxiliary index (suffixed with
+    <literal>_ccaux</literal>) will be automatically dropped when the main
+    index is dropped. After dropping the indexes, you can try again to perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is to
+    rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>,
+    which will also handle cleanup of any invalid auxiliary indexes.)
+    If the only invalid index is one suffixed <literal>_ccaux</literal>,
+    the recommended recovery method is just <literal>DROP INDEX</literal>
+    for that index.
    </para>
 
    <para>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 56c9a0fe1f3..297b8b5fde2 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -475,12 +475,16 @@ Indexes:
     If the index marked <literal>INVALID</literal> is suffixed
     <literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
     index created during the concurrent operation, and the recommended
-    recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
-    then attempt <command>REINDEX CONCURRENTLY</command> again.
+    recovery method is to drop the transient index using <literal>DROP INDEX</literal>,
+    then attempt <command>REINDEX CONCURRENTLY</command> again. The auxiliary index
+    (suffixed with <literal>_ccaux</literal>) will be automatically dropped
+    along with its main index.
     If the invalid index is instead suffixed <literal>_ccold</literal>,
     it corresponds to the original index which could not be dropped;
     the recommended recovery method is to just drop said index, since the
-    rebuild proper has been successful.
+    rebuild proper has been successful. If the only
+    invalid index is one suffixed <literal>_ccaux</literal>, the recommended
+    recovery method is just <literal>DROP INDEX</literal> for that index.
     A nonzero number may be appended to the suffix of the invalid index
     names to keep them unique, like <literal>_ccnew1</literal>,
     <literal>_ccold2</literal>, etc.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fdb8e67e1f5..c6941fb19d1 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -292,7 +292,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.  (Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object, 0);
+	AcquireDeletionLock(object, flags);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b1417ec05c6..9136dfc7c73 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -780,6 +780,8 @@ index_create(Relation heapRelation,
 		   ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
 	/* partitioned indexes must never be "built" by themselves */
 	Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
+	/* ii_AuxiliaryForIndexId and INDEX_CREATE_AUXILIARY are required both or neither */
+	Assert(OidIsValid(indexInfo->ii_AuxiliaryForIndexId) == auxiliary);
 
 	relkind = partitioned ? RELKIND_PARTITIONED_INDEX : RELKIND_INDEX;
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -1185,6 +1187,15 @@ index_create(Relation heapRelation,
 			recordDependencyOn(&myself, &referenced, DEPENDENCY_PARTITION_SEC);
 		}
 
+		/*
+		 * Record dependency on the main index in case of auxiliary index.
+		 */
+		if (OidIsValid(indexInfo->ii_AuxiliaryForIndexId))
+		{
+			ObjectAddressSet(referenced, RelationRelationId, indexInfo->ii_AuxiliaryForIndexId);
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
+		}
+
 		/* placeholder for normal dependencies */
 		addrs = new_object_addresses();
 
@@ -1417,7 +1428,8 @@ index_create_copy(Relation heapRelation, uint16 flags,
 							concurrently,	/* concurrent */
 							indexRelation->rd_indam->amsummarizing,
 							oldInfo->ii_WithoutOverlaps,
-							false);
+							false,
+							InvalidOid);
 
 	/* fetch exclusion constraint info if any */
 	if (indexRelation->rd_index->indisexclusion)
@@ -1601,7 +1613,8 @@ index_concurrently_create_aux(Relation heapRelation, Oid mainIndexId,
 							true,
 							false,	/* aux are not summarizing */
 							false,	/* aux are not without overlaps */
-							true	/* auxiliary */);
+							true	/* auxiliary */,
+							mainIndexId /* auxiliaryForIndexId */);
 
 	/*
 	 * Extract the list of column names and the column numbers for the new
@@ -2640,7 +2653,8 @@ BuildIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid /* auxiliary_for_index_id is set only during build */);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -2701,7 +2715,8 @@ BuildDummyIndexInfo(Relation index)
 					   false,
 					   index->rd_indam->amsummarizing,
 					   indexStruct->indisexclusion && indexStruct->indisunique,
-					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */);
+					   index->rd_rel->relam == STIR_AM_OID /* auxiliary iff STIR */,
+					   InvalidOid);
 
 	/* fill in attribute numbers */
 	for (i = 0; i < numAtts; i++)
@@ -3783,8 +3798,11 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
 			indexForm->indisvalid = true;
 			break;
 		case INDEX_DROP_CLEAR_READY:
-			/* Clear indisready during a CREATE INDEX CONCURRENTLY sequence */
-			Assert(indexForm->indisready);
+			/*
+			 * Clear indisready during a CREATE INDEX CONCURRENTLY sequence.
+			 * indisready may already be false if the CIC failed before
+			 * index_concurrently_build had a chance to set it.
+			 */
 			Assert(!indexForm->indisvalid);
 			indexForm->indisready = false;
 			break;
@@ -3869,6 +3887,7 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 				heapRelation;
 	Oid			heapId;
 	Oid			save_userid;
+	Oid			junkAuxIndexId;
 	int			save_sec_context;
 	int			save_nestlevel;
 	IndexInfo  *indexInfo;
@@ -3925,6 +3944,19 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 		pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
 	}
 
+	/* Check for the auxiliary index for that index, it needs to be dropped */
+	junkAuxIndexId = get_auxiliary_index(indexId);
+	if (OidIsValid(junkAuxIndexId))
+	{
+		ObjectAddress object;
+		object.classId = RelationRelationId;
+		object.objectId = junkAuxIndexId;
+		object.objectSubId = 0;
+		performDeletion(&object, DROP_RESTRICT,
+								 PERFORM_DELETION_INTERNAL |
+								 PERFORM_DELETION_QUIETLY);
+	}
+
 	/*
 	 * Open the target index relation and get an exclusive lock on it, to
 	 * ensure that no one else is touching this particular index.
@@ -4213,7 +4245,8 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 {
 	Relation	rel;
 	Oid			toast_relid;
-	List	   *indexIds;
+	List	   *indexIds,
+			   *auxIndexIds = NIL;
 	char		persistence;
 	bool		result = false;
 	ListCell   *indexId;
@@ -4302,13 +4335,30 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 	else
 		persistence = rel->rd_rel->relpersistence;
 
+	foreach(indexId, indexIds)
+	{
+		Oid			indexOid = lfirst_oid(indexId);
+		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* All STIR indexes are auxiliary indexes */
+		if (indexAm == STIR_AM_OID)
+		{
+			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
+				RemoveReindexPending(indexOid);
+			auxIndexIds = lappend_oid(auxIndexIds, indexOid);
+		}
+	}
+
 	/* Reindex all the indexes. */
 	i = 1;
 	foreach(indexId, indexIds)
 	{
 		Oid			indexOid = lfirst_oid(indexId);
 		Oid			indexNamespaceId = get_rel_namespace(indexOid);
-		Oid			indexAm = get_rel_relam(indexOid);
+
+		/* Auxiliary indexes are going to be dropped during main index rebuild */
+		if (list_member_oid(auxIndexIds, indexOid))
+			continue;
 
 		/*
 		 * Skip any invalid indexes on a TOAST table.  These can only be
@@ -4334,18 +4384,6 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
-		if (indexAm == STIR_AM_OID)
-		{
-			ereport(WARNING,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("skipping reindex of auxiliary index \"%s.%s\"",
-							get_namespace_name(indexNamespaceId),
-							get_rel_name(indexOid))));
-			if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
-				RemoveReindexPending(indexOid);
-			continue;
-		}
-
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
 					  persistence, params);
 
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 07c2d41c189..deacd2f7c95 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -20,6 +20,7 @@
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am_d.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_depend.h"
 #include "catalog/pg_extension.h"
@@ -1108,6 +1109,67 @@ get_index_constraint(Oid indexId)
 	return constraintId;
 }
 
+/*
+ * get_auxiliary_index
+ *		Given the OID of an index, return the OID of its auxiliary
+ *		index, or InvalidOid if there is no auxiliary index.
+ */
+Oid
+get_auxiliary_index(Oid indexId)
+{
+	Oid			auxiliaryIndexOid = InvalidOid;
+	Relation	depRel;
+	ScanKeyData key[3];
+	SysScanDesc scan;
+	HeapTuple	tup;
+
+	/* Search the dependency table for the index */
+	depRel = table_open(DependRelationId, AccessShareLock);
+
+	ScanKeyInit(&key[0],
+				Anum_pg_depend_refclassid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationRelationId));
+	ScanKeyInit(&key[1],
+				Anum_pg_depend_refobjid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indexId));
+	ScanKeyInit(&key[2],
+				Anum_pg_depend_refobjsubid,
+				BTEqualStrategyNumber, F_INT4EQ,
+				Int32GetDatum(0));
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+							  NULL, 3, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend deprec = (Form_pg_depend) GETSTRUCT(tup);
+
+		/*
+		 * Look for an AUTO dependency on a STIR index.  There can be at most
+		 * one STIR auxiliary per index, so we stop at the first match.
+		 * Transitive auxiliaries (e.g. ccnew_ccaux from a failed REINDEX
+		 * CONCURRENTLY) are found by calling this with the ccnew OID, and
+		 * are also cleaned up automatically via cascading AUTO dependency
+		 * when the intermediate index is dropped.
+		 */
+		if (deprec->classid == RelationRelationId &&
+			(deprec->deptype == DEPENDENCY_AUTO) &&
+			get_rel_relkind(deprec->objid) == RELKIND_INDEX &&
+			get_rel_relam(deprec->objid) == STIR_AM_OID)
+		{
+			auxiliaryIndexOid = deprec->objid;
+			break;
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(depRel, AccessShareLock);
+
+	return auxiliaryIndexOid;
+}
+
 /*
  * get_index_ref_constraints
  *		Given the OID of an index, return the OID of all foreign key
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index d7ea86b2805..f428dcdf10f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -315,6 +315,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
 	indexInfo->ii_Auxiliary = false;
+	indexInfo->ii_AuxiliaryForIndexId = InvalidOid;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2d7b6b7eb8b..46c4ccc6789 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -247,7 +247,7 @@ CheckIndexCompatible(Oid oldId,
 	indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
 							  accessMethodId, NIL, NIL, false, false,
 							  false, false, amsummarizing,
-							  isWithoutOverlaps, isauxiliary);
+							  isWithoutOverlaps, isauxiliary, InvalidOid);
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
 	opclassIds = palloc_array(Oid, numberOfAttributes);
@@ -947,7 +947,8 @@ DefineIndex(ParseState *pstate,
 							  concurrent,
 							  amissummarizing,
 							  stmt->iswithoutoverlaps,
-							  false);
+							  false,
+							  InvalidOid);
 
 	typeIds = palloc_array(Oid, numberOfAttributes);
 	collationIds = palloc_array(Oid, numberOfAttributes);
@@ -3711,6 +3712,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		Oid			indexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Oid			tableId;
 		Oid			amId;
 		bool		safe;		/* for set_indexsafe_procflags */
@@ -4060,6 +4062,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		ReindexIndexInfo *newidx;
 		Oid			newIndexId;
 		Oid			auxIndexId;
+		Oid			junkAuxIndexId;
 		Relation	indexRel;
 		Relation	heapRel;
 		Oid			save_userid;
@@ -4067,6 +4070,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		int			save_nestlevel;
 		Relation	newIndexRel;
 		Relation	auxIndexRel;
+		Relation	junkAuxIndexRel;
 		LockRelId  *lockrelid;
 		Oid			tablespaceid;
 
@@ -4144,12 +4148,17 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 												   tablespaceid,
 												   auxConcurrentName);
 
+		/* Search for auxiliary index for reindexed index, to drop it */
+		junkAuxIndexId = get_auxiliary_index(idx->indexId);
+
 		/*
 		 * Now open the relation of the new index, a session-level lock is
 		 * also needed on it.
 		 */
 		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 		auxIndexRel = index_open(auxIndexId, ShareUpdateExclusiveLock);
+		if (OidIsValid(junkAuxIndexId))
+			junkAuxIndexRel = index_open(junkAuxIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context
@@ -4159,6 +4168,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		newidx = palloc_object(ReindexIndexInfo);
 		newidx->indexId = newIndexId;
 		newidx->auxIndexId = auxIndexId;
+		newidx->junkAuxIndexId = junkAuxIndexId;
 		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
@@ -4180,10 +4190,18 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		lockrelid = palloc_object(LockRelId);
 		*lockrelid = auxIndexRel->rd_lockInfo.lockRelId;
 		relationLocks = lappend(relationLocks, lockrelid);
+		if (OidIsValid(junkAuxIndexId))
+		{
+			lockrelid = palloc_object(LockRelId);
+			*lockrelid = junkAuxIndexRel->rd_lockInfo.lockRelId;
+			relationLocks = lappend(relationLocks, lockrelid);
+		}
 
 		MemoryContextSwitchTo(oldcontext);
 
 		index_close(indexRel, NoLock);
+		if (OidIsValid(junkAuxIndexId))
+			index_close(junkAuxIndexRel, NoLock);
 		index_close(auxIndexRel, NoLock);
 		index_close(newIndexRel, NoLock);
 
@@ -4372,7 +4390,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 	/*
 	 * At this moment all target indexes are marked as "ready-to-insert". So,
-	 * we are free to start process of dropping auxiliary indexes.
+	 * we are free to start process of dropping auxiliary indexes - including
+	 * junk indexes detected earlier.
 	 */
 	foreach(lc, newIndexIds)
 	{
@@ -4395,6 +4414,9 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 */
 		PushActiveSnapshot(GetTransactionSnapshot());
 		index_set_state_flags(newidx->auxIndexId, INDEX_DROP_CLEAR_READY);
+		/* Ensure the junk index is marked as non-ready */
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_set_state_flags(newidx->junkAuxIndexId, INDEX_DROP_CLEAR_READY);
 		PopActiveSnapshot();
 
 		CommitTransactionCommand();
@@ -4614,6 +4636,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		index_concurrently_set_dead(newidx->tableId, newidx->auxIndexId);
+		if (OidIsValid(newidx->junkAuxIndexId))
+			index_concurrently_set_dead(newidx->tableId, newidx->junkAuxIndexId);
 
 		PopActiveSnapshot();
 	}
@@ -4665,6 +4689,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 			object.objectSubId = 0;
 
 			add_exact_object_address(&object, objects);
+
+			if (OidIsValid(idx->junkAuxIndexId))
+			{
+				object.objectId = idx->junkAuxIndexId;
+				object.objectSubId = 0;
+				add_exact_object_address(&object, objects);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eec09ba1ded..eaae8f7ca5f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1567,6 +1567,8 @@ RemoveRelations(DropStmt *drop)
 	ListCell   *cell;
 	int			flags = 0;
 	LOCKMODE	lockmode = AccessExclusiveLock;
+	MemoryContext private_context,
+				  oldcontext;
 
 	/* DROP CONCURRENTLY uses a weaker lock, and has some restrictions */
 	if (drop->concurrent)
@@ -1631,9 +1633,20 @@ RemoveRelations(DropStmt *drop)
 			relkind = 0;		/* keep compiler quiet */
 			break;
 	}
+	/*
+	 * Create a memory context that will survive forced transaction commits we
+	 * may need to do below (in case of concurrent index drop).
+	 * Since it is a child of PortalContext, it will go away eventually even if
+	 * we suffer an error; there's no need for special abort cleanup logic.
+	 */
+	private_context = AllocSetContextCreate(PortalContext,
+											"RemoveRelations",
+											ALLOCSET_SMALL_SIZES);
 
+	oldcontext = MemoryContextSwitchTo(private_context);
 	/* Lock and validate each relation; build a list of object addresses */
 	objects = new_object_addresses();
+	MemoryContextSwitchTo(oldcontext);
 
 	foreach(cell, drop->objects)
 	{
@@ -1685,6 +1698,38 @@ RemoveRelations(DropStmt *drop)
 			flags |= PERFORM_DELETION_CONCURRENTLY;
 		}
 
+		/*
+		 * Concurrent index drop requires it to be the first transaction. But in
+		 * case we have junk auxiliary index - we want to drop it too (and also
+		 * in a concurrent way). In this case perform silent internal deletion
+		 * of auxiliary index, and restore transaction state. It is fine to do it
+		 * in the loop because there is only single element in drop->objects.
+		 */
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
+			state.actual_relkind == RELKIND_INDEX)
+		{
+			Oid junkAuxIndexOid = get_auxiliary_index(relOid);
+			if (OidIsValid(junkAuxIndexOid))
+			{
+				ObjectAddress object;
+				object.classId = RelationRelationId;
+				object.objectId = junkAuxIndexOid;
+				object.objectSubId = 0;
+				performDeletion(&object, DROP_RESTRICT,
+										 PERFORM_DELETION_CONCURRENTLY |
+										 PERFORM_DELETION_INTERNAL |
+										 PERFORM_DELETION_QUIETLY);
+				CommitTransactionCommand();
+				MemoryContextDelete(private_context);
+
+				/* And start again - now without auxiliary index. */
+				StartTransactionCommand();
+				PushActiveSnapshot(GetTransactionSnapshot());
+				RemoveRelations(drop);
+				return;
+			}
+		}
+
 		/*
 		 * Concurrent index drop cannot be used with partitioned indexes,
 		 * either.
@@ -1713,12 +1758,17 @@ RemoveRelations(DropStmt *drop)
 		obj.objectId = relOid;
 		obj.objectSubId = 0;
 
+		oldcontext = MemoryContextSwitchTo(private_context);
 		add_exact_object_address(&obj, objects);
+		MemoryContextSwitchTo(oldcontext);
 	}
 
+	/* Deletion may involve multiple commits, so, switch to memory context */
+	oldcontext = MemoryContextSwitchTo(private_context);
 	performMultipleDeletions(objects, drop->behavior, flags);
+	MemoryContextSwitchTo(oldcontext);
 
-	free_object_addresses(objects);
+	MemoryContextDelete(private_context);
 }
 
 /*
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 84f7cf9824e..c54748ff644 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -834,7 +834,7 @@ IndexInfo *
 makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 			  List *predicates, bool unique, bool nulls_not_distinct,
 			  bool isready, bool concurrent, bool summarizing,
-			  bool withoutoverlaps, bool auxiliary)
+			  bool withoutoverlaps, bool auxiliary, Oid auxiliary_for_index_id)
 {
 	IndexInfo  *n = makeNode(IndexInfo);
 
@@ -851,6 +851,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	n->ii_Summarizing = summarizing;
 	n->ii_WithoutOverlaps = withoutoverlaps;
 	n->ii_Auxiliary = auxiliary;
+	n->ii_AuxiliaryForIndexId = auxiliary_for_index_id;
 
 	/* summarizing indexes cannot contain non-key attributes */
 	Assert(!summarizing || (numkeyattrs == numattrs));
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 2f3c1eae3c7..6ae210c584e 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -193,6 +193,7 @@ extern List *getOwnedSequences(Oid relid);
 extern Oid	getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok);
 
 extern Oid	get_index_constraint(Oid indexId);
+extern Oid	get_auxiliary_index(Oid indexId);
 
 extern List *get_index_ref_constraints(Oid indexId);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3eaeed3c141..af58d4cf4b5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -232,6 +232,8 @@ typedef struct IndexInfo
 	int			ii_ParallelWorkers;
 	/* is auxiliary for concurrent index build? */
 	bool		ii_Auxiliary;
+	/* if creating an auxiliary index, the OID of the main index; otherwise InvalidOid. */
+	Oid			ii_AuxiliaryForIndexId;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index cd7f1eb0592..3a704781c8b 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -100,7 +100,7 @@ extern IndexInfo *makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid,
 								bool unique, bool nulls_not_distinct,
 								bool isready, bool concurrent,
 								bool summarizing, bool withoutoverlaps,
-								bool auxiliary);
+								bool auxiliary, Oid auxiliary_for_index_id);
 
 extern Node *makeStringConst(char *str, int location);
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 937c3b48a0f..2d6abb15a89 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3280,12 +3280,108 @@ REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 ERROR:  reindex of auxiliary index "aux_index_ind6_ccaux" not supported
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
-WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
+WARNING:  skipping reindex of invalid index "public.aux_index_ind6"
+HINT:  Use DROP INDEX or REINDEX INDEX.
 WARNING:  skipping reindex of auxiliary index "public.aux_index_ind6_ccaux"
+NOTICE:  table "aux_index_tab5" has no indexes that can be reindexed concurrently
+-- Make sure it is still exists
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1) INVALID
+    "aux_index_ind6_ccaux" stir (c1) INVALID
+
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+Indexes:
+    "aux_index_ind6" UNIQUE, btree (c1)
+
+DROP INDEX aux_index_ind6;
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
+DROP INDEX aux_index_ind6;
+ERROR:  index "aux_index_ind6" does not exist
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+ERROR:  could not create unique index "aux_index_ind6"
+DETAIL:  Key (c1)=(1) is duplicated.
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+           Table "public.aux_index_tab5"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ c1     | integer |           |          | 
+
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 805d2eb8485..fd96d80abbc 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1345,10 +1345,61 @@ REINDEX INDEX aux_index_ind6_ccaux;
 REINDEX INDEX CONCURRENTLY aux_index_ind6_ccaux;
 -- This makes the previous failure go away, so the index can become valid.
 DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
 -- Should be skipped during reindex
-REINDEX TABLE aux_index_tab5;
--- Should be skipped during concurrent reindex
 REINDEX TABLE CONCURRENTLY aux_index_tab5;
+-- Make sure it is still exists
+\d aux_index_tab5
+-- Should be skipped during reindex and dropped
+REINDEX TABLE aux_index_tab5;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Should be skipped during reindex and dropped
+REINDEX INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure aux index is dropped
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- This makes the previous failure go away, so the index can become valid.
+DELETE FROM aux_index_tab5 WHERE c1 = 1;
+-- Drop main index CONCURRENTLY
+DROP INDEX CONCURRENTLY aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+DROP INDEX aux_index_ind6;
+
+-- Insert duplicates again
+INSERT INTO aux_index_tab5 VALUES (1), (1);
+-- Create invalid index again
+CREATE UNIQUE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+-- Drop main index
+DROP INDEX aux_index_ind6;
+-- Make sure auxiliary index dropped too
+\d aux_index_tab5
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.43.0



  [application/octet-stream] v35-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch (21.0K, 4-v35-0003-Add-Datum-storage-support-to-tuplestore-Extend-t.patch)
  download | inline diff:
From c705ccc819f3d4ef26407f0eb7fe9e3da56f2304 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 12 Jan 2026 00:57:56 +0300
Subject: [PATCH v35 3/7] Add Datum storage support to tuplestore Extend
 tuplestore to store individual Datum values

This support enables usages of tuplestore for non-tuple data (TIDs) in the next commit.
---
 src/backend/utils/sort/tuplestore.c | 367 +++++++++++++++++++++++-----
 src/include/utils/tuplestore.h      |  33 +--
 2 files changed, 327 insertions(+), 73 deletions(-)

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index f9e2d95186a..2a9b25bd238 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1,16 +1,19 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.c
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
+ *
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
  * Also, it is possible to support multiple independent read pointers.
  *
  * A temporary file is used to handle the data if it exceeds the
@@ -61,6 +64,8 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "storage/buffile.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
 #include "utils/tuplestore.h"
@@ -116,16 +121,15 @@ struct Tuplestorestate
 	BufFile    *myfile;			/* underlying file, or NULL if none */
 	MemoryContext context;		/* memory context for holding tuples */
 	ResourceOwner resowner;		/* resowner for holding temp files */
+	Oid			datumType;		/* InvalidOid or oid of Datum's to be stored */
+	int16		datumTypeLen;	/* typelen of that Datum */
+	bool		datumTypeByVal; /* by-value of that Datum */
 
 	/*
 	 * These function pointers decouple the routines that must know what kind
 	 * of tuple we are handling from the routines that don't need to know it.
 	 * They are set up by the tuplestore_begin_xxx routines.
 	 *
-	 * (Although tuplestore.c currently only supports heap tuples, I've copied
-	 * this part of tuplesort.c so that extension to other kinds of objects
-	 * will be easy if it's ever needed.)
-	 *
 	 * Function to copy a supplied input tuple into palloc'd space. (NB: we
 	 * assume that a single pfree() is enough to release the tuple later, so
 	 * the representation must be "flat" in one palloc chunk.) state->availMem
@@ -150,6 +154,12 @@ struct Tuplestorestate
 	 */
 	void	   *(*readtup) (Tuplestorestate *state, unsigned int len);
 
+	/*
+	 * Function to get length of tuple from tape. Used to provide 'len' argument
+	 * for readtup (see above).
+	 */
+	unsigned int(*lentup) (Tuplestorestate *state, bool eofOK);
+
 	/*
 	 * This array holds pointers to tuples in memory if we are in state INMEM.
 	 * In states WRITEFILE and READFILE it's not used.
@@ -186,6 +196,7 @@ struct Tuplestorestate
 #define COPYTUP(state,tup)	((*(state)->copytup) (state, tup))
 #define WRITETUP(state,tup) ((*(state)->writetup) (state, tup))
 #define READTUP(state,len)	((*(state)->readtup) (state, len))
+#define LENTUP(state,eofOK)	((*(state)->lentup) (state, eofOK))
 #define LACKMEM(state)		((state)->availMem < 0)
 #define USEMEM(state,amt)	((state)->availMem -= (amt))
 #define FREEMEM(state,amt)	((state)->availMem += (amt))
@@ -194,9 +205,9 @@ struct Tuplestorestate
  *
  * NOTES about on-tape representation of tuples:
  *
- * We require the first "unsigned int" of a stored tuple to be the total size
- * on-tape of the tuple, including itself (so it is never zero).
- * The remainder of the stored tuple
+ * In case of tuples we use first "unsigned int" of a stored tuple
+ * to be the total size on-tape of the tuple, including itself
+ * (so it is never zero). The remainder of the stored tuple
  * may or may not match the in-memory representation of the tuple ---
  * any conversion needed is the job of the writetup and readtup routines.
  *
@@ -207,10 +218,13 @@ struct Tuplestorestate
  * state->backward is not set, the write/read routines may omit the extra
  * length word.
  *
- * writetup is expected to write both length words as well as the tuple
+ * In the case of Datum with constant length, both "unsigned int" are omitted.
+ *
+ * writetup is expected to write both length words and the tuple
  * data.  When readtup is called, the tape is positioned just after the
- * front length word; readtup must read the tuple data and advance past
- * the back length word (if present).
+ * front length word (if it is not omitted like in case of content-size Datum);
+ * readtup must read the tuple data and advance past the back length word
+ * (if present).
  *
  * The write/read routines can make use of the tuple description data
  * stored in the Tuplestorestate record, if needed. They are also expected
@@ -242,11 +256,16 @@ static Tuplestorestate *tuplestore_begin_common(int eflags,
 static void tuplestore_puttuple_common(Tuplestorestate *state, void *tuple);
 static void dumptuples(Tuplestorestate *state);
 static void tuplestore_updatemax(Tuplestorestate *state);
-static unsigned int getlen(Tuplestorestate *state, bool eofOK);
+
+static unsigned int lentup_heap(Tuplestorestate *state, bool eofOK);
 static void *copytup_heap(Tuplestorestate *state, void *tup);
 static void writetup_heap(Tuplestorestate *state, void *tup);
 static void *readtup_heap(Tuplestorestate *state, unsigned int len);
 
+static unsigned int lentup_datum(Tuplestorestate *state, bool eofOK);
+static void *copytup_datum(Tuplestorestate *state, void *datum);
+static void writetup_datum(Tuplestorestate *state, void *datum);
+static void *readtup_datum(Tuplestorestate *state, unsigned int len);
 
 /*
  *		tuplestore_begin_xxx
@@ -269,6 +288,12 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
 	state->allowedMem = maxKBytes * (int64) 1024;
 	state->availMem = state->allowedMem;
 	state->myfile = NULL;
+	/*
+	 * Set Datum related data to invalid by default.
+	 */
+	state->datumType = InvalidOid;
+	state->datumTypeLen = 0;
+	state->datumTypeByVal = false;
 
 	/*
 	 * The palloc/pfree pattern for tuple memory is in a FIFO pattern.  A
@@ -346,6 +371,37 @@ tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
 	state->copytup = copytup_heap;
 	state->writetup = writetup_heap;
 	state->readtup = readtup_heap;
+	state->lentup = lentup_heap;
+
+	return state;
+}
+
+/*
+ * The same as tuplestore_begin_heap but create store for Datum values.
+ */
+Tuplestorestate *
+tuplestore_begin_datum(Oid datumType, bool randomAccess, bool interXact, int maxKBytes)
+{
+	Tuplestorestate *state;
+	int			eflags;
+
+	/*
+	 * This interpretation of the meaning of randomAccess is compatible with
+	 * the pre-8.3 behavior of tuplestores.
+	 */
+	eflags = randomAccess ?
+		(EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND) :
+		(EXEC_FLAG_REWIND);
+
+	state = tuplestore_begin_common(eflags, interXact, maxKBytes);
+	state->datumType = datumType;
+	get_typlenbyval(state->datumType, &state->datumTypeLen, &state->datumTypeByVal);
+	Assert(!(state->datumTypeByVal && randomAccess));
+
+	state->copytup = copytup_datum;
+	state->writetup = writetup_datum;
+	state->readtup = readtup_datum;
+	state->lentup = lentup_datum;
 
 	return state;
 }
@@ -444,16 +500,19 @@ tuplestore_clear(Tuplestorestate *state)
 	{
 		int64		availMem = state->availMem;
 
-		/*
-		 * Below, we reset the memory context for storing tuples.  To save
-		 * from having to always call GetMemoryChunkSpace() on all stored
-		 * tuples, we adjust the availMem to forget all the tuples and just
-		 * recall USEMEM for the space used by the memtuples array.  Here we
-		 * just Assert that's correct and the memory tracking hasn't gone
-		 * wrong anywhere.
-		 */
-		for (i = state->memtupdeleted; i < state->memtupcount; i++)
-			availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			/*
+			 * Below, we reset the memory context for storing tuples.  To save
+			 * from having to always call GetMemoryChunkSpace() on all stored
+			 * tuples, we adjust the availMem to forget all the tuples and just
+			 * recall USEMEM for the space used by the memtuples array.  Here we
+			 * just Assert that's correct and the memory tracking hasn't gone
+			 * wrong anywhere.
+			 */
+			for (i = state->memtupdeleted; i < state->memtupcount; i++)
+				availMem += GetMemoryChunkSpace(state->memtuples[i]);
+		}
 
 		availMem += GetMemoryChunkSpace(state->memtuples);
 
@@ -777,6 +836,25 @@ tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple)
 	MemoryContextSwitchTo(oldcxt);
 }
 
+/*
+ * Like tuplestore_puttupleslot but for single Datum.
+ */
+void
+tuplestore_putdatum(Tuplestorestate *state, Datum datum)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->context);
+
+	/*
+	 * Copy the Datum.  (Must do this even in WRITEFILE case.  Note that
+	 * COPYTUP includes USEMEM, so we needn't do that here.)
+	 */
+	datum = PointerGetDatum(COPYTUP(state, DatumGetPointer(datum)));
+
+	tuplestore_puttuple_common(state, DatumGetPointer(datum));
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
 /*
  * Similar to tuplestore_puttuple(), but work from values + nulls arrays.
  * This avoids an extra tuple-construction operation.
@@ -1028,10 +1106,10 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 			pg_fallthrough;
 
 		case TSS_READFILE:
-			*should_free = true;
+			*should_free = !state->datumTypeByVal;
 			if (forward)
 			{
-				if ((tuplen = getlen(state, true)) != 0)
+				if ((tuplen = LENTUP(state, true)) != 0)
 				{
 					tup = READTUP(state, tuplen);
 					return tup;
@@ -1043,6 +1121,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				}
 			}
 
+			Assert(!state->datumTypeByVal);
 			/*
 			 * Backward.
 			 *
@@ -1060,7 +1139,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 				Assert(!state->truncated);
 				return NULL;
 			}
-			tuplen = getlen(state, false);
+			tuplen = LENTUP(state, false);
 
 			if (readptr->eof_reached)
 			{
@@ -1091,7 +1170,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 					Assert(!state->truncated);
 					return NULL;
 				}
-				tuplen = getlen(state, false);
+				tuplen = LENTUP(state, false);
 			}
 
 			/*
@@ -1153,6 +1232,41 @@ tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 	}
 }
 
+bool
+tuplestore_getdatum(Tuplestorestate *state, bool forward,
+					bool *should_free, Datum *result)
+{
+	Datum datum;
+	*should_free = false;
+
+	datum = (Datum) tuplestore_gettuple(state, forward, should_free);
+
+	/* For by-value datum we may receive zero as valid value. */
+	if (state->datumTypeByVal)
+	{
+		/* Return false only on EOF */
+		if (state->readptrs[state->activeptr].eof_reached)
+		{
+			*result = PointerGetDatum(NULL);
+			return false;
+		}
+
+		*result = datum;
+		return true;
+	}
+
+	if (datum)
+	{
+		*result = datum;
+		return true;
+	}
+	else
+	{
+		*result = PointerGetDatum(NULL);
+		return false;
+	}
+}
+
 /*
  * tuplestore_gettupleslot_force - exported function to fetch a tuple
  *
@@ -1205,10 +1319,20 @@ tuplestore_advance(Tuplestorestate *state, bool forward)
 			pfree(tuple);
 		return true;
 	}
-	else
+
+	/*
+	 * A NULL return normally means end-of-data, but for by-value datum
+	 * stores a valid zero-valued datum (e.g., false, 0) is indistinguishable
+	 * from NULL via pointer check.  Use eof_reached to distinguish.
+	 */
+	if (state->datumTypeByVal)
 	{
-		return false;
+		TSReadPointer *readptr = &state->readptrs[state->activeptr];
+
+		return !readptr->eof_reached;
 	}
+
+	return false;
 }
 
 /*
@@ -1271,7 +1395,13 @@ tuplestore_skiptuples(Tuplestorestate *state, int64 ntuples, bool forward)
 				tuple = tuplestore_gettuple(state, forward, &should_free);
 
 				if (tuple == NULL)
-					return false;
+				{
+					/* See tuplestore_advance for why pointer check is insufficient */
+					if (!state->datumTypeByVal ||
+						state->readptrs[state->activeptr].eof_reached)
+						return false;
+					continue;
+				}
 				if (should_free)
 					pfree(tuple);
 				CHECK_FOR_INTERRUPTS();
@@ -1505,8 +1635,11 @@ tuplestore_trim(Tuplestorestate *state)
 	/* Release no-longer-needed tuples */
 	for (i = state->memtupdeleted; i < nremove; i++)
 	{
-		FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
-		pfree(state->memtuples[i]);
+		if (!state->datumTypeByVal)
+		{
+			FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
+			pfree(state->memtuples[i]);
+		}
 		state->memtuples[i] = NULL;
 		/* As in dumptuples(), increment memtupdeleted synchronously */
 		state->memtupdeleted++;
@@ -1603,25 +1736,6 @@ tuplestore_in_memory(Tuplestorestate *state)
 	return (state->status == TSS_INMEM);
 }
 
-
-/*
- * Tape interface routines
- */
-
-static unsigned int
-getlen(Tuplestorestate *state, bool eofOK)
-{
-	unsigned int len;
-	size_t		nbytes;
-
-	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
-	if (nbytes == 0)
-		return 0;
-	else
-		return len;
-}
-
-
 /*
  * Routines specialized for HeapTuple case
  *
@@ -1632,6 +1746,19 @@ getlen(Tuplestorestate *state, bool eofOK)
  * to write that separately.
  */
 
+static unsigned int
+lentup_heap(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	else
+		return len;
+}
+
 static void *
 copytup_heap(Tuplestorestate *state, void *tup)
 {
@@ -1678,3 +1805,127 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 		BufFileReadExact(state->myfile, &tuplen, sizeof(tuplen));
 	return tuple;
 }
+
+/*
+ * Routines specialized for Datum case.
+ *
+ * Handles both fixed and variable-length Datums efficiently:
+ * - Fixed-length and Variable-length includes length prefix (and suffix if backward scan)
+ * - By-value types handled inline without extra copying, storing single extra byte
+ *   XXX: consider refactoring to avoid it, currently need it for correct rewind logic
+ */
+
+static unsigned int
+lentup_datum(Tuplestorestate *state, bool eofOK)
+{
+	unsigned int len;
+	size_t		nbytes;
+
+	Assert(state->datumType != InvalidOid);
+
+	if (state->datumTypeByVal)
+	{
+		uint8	junk;
+		nbytes = BufFileReadMaybeEOF(state->myfile, &junk, sizeof(uint8), eofOK);
+		if (nbytes == 0)
+			return 0;
+		Assert(junk == (uint8) state->datumTypeLen);
+		return state->datumTypeLen;
+	}
+
+	nbytes = BufFileReadMaybeEOF(state->myfile, &len, sizeof(len), eofOK);
+	if (nbytes == 0)
+		return 0;
+	return len;
+}
+
+static void *
+copytup_datum(Tuplestorestate *state, void *datum)
+{
+	Datum d;
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+		return DatumGetPointer(PointerGetDatum(datum));
+
+	if (datum == NULL)
+		return NULL;
+
+	d = datumCopy(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+	USEMEM(state, GetMemoryChunkSpace(DatumGetPointer(d)));
+	return DatumGetPointer(d);
+}
+
+static void
+writetup_datum(Tuplestorestate *state, void *datum)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		uint8 junk = state->datumTypeLen; /* overflow is ok */
+		Datum v;
+		Assert(state->datumTypeLen > 0);
+
+		/* just marker byte used to track the end of data for rewind logic */
+		BufFileWrite(state->myfile, &junk, sizeof(junk));
+		store_att_byval(&v, PointerGetDatum(datum), state->datumTypeLen);
+		BufFileWrite(state->myfile, &v, state->datumTypeLen);
+		Assert(!state->backward);
+	}
+	else
+	{
+		unsigned int size;
+		unsigned int tuplen;
+
+		if (state->datumTypeLen < 0)
+			size = datumGetSize(PointerGetDatum(datum), state->datumTypeByVal, state->datumTypeLen);
+		else
+			size = state->datumTypeLen;
+
+		/*
+		 * Include sizeof(unsigned int) in the stored length, matching the
+		 * convention used by writetup_heap.  The backward-scan seek
+		 * arithmetic in tuplestore_gettuple assumes this.
+		 */
+		tuplen = size + sizeof(unsigned int);
+		BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		BufFileWrite(state->myfile, datum, size);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileWrite(state->myfile, &tuplen, sizeof(tuplen));
+
+		FREEMEM(state, GetMemoryChunkSpace(datum));
+		pfree(datum);
+	}
+}
+
+static void *
+readtup_datum(Tuplestorestate *state, unsigned int len)
+{
+	Assert(state->datumType != InvalidOid);
+	if (state->datumTypeByVal)
+	{
+		Datum datum = 0;
+
+		Assert(state->datumTypeLen > 0);
+		Assert(len == state->datumTypeLen);
+		BufFileReadExact(state->myfile, &datum, state->datumTypeLen);
+
+		Assert(!state->backward);
+		return DatumGetPointer(fetch_att(&datum, true, state->datumTypeLen));
+	}
+	else
+	{
+		unsigned int datalen = len - sizeof(unsigned int);
+		void *data = palloc(datalen);
+
+		BufFileReadExact(state->myfile, data, datalen);
+
+		/* need trailing length word? */
+		if (state->backward)
+			BufFileReadExact(state->myfile, &len, sizeof(len));
+
+		return data;
+	}
+}
diff --git a/src/include/utils/tuplestore.h b/src/include/utils/tuplestore.h
index f638b96e156..e16d9a3d352 100644
--- a/src/include/utils/tuplestore.h
+++ b/src/include/utils/tuplestore.h
@@ -1,17 +1,18 @@
 /*-------------------------------------------------------------------------
  *
  * tuplestore.h
- *	  Generalized routines for temporary tuple storage.
+ *	  Generalized routines for temporary storage of tuples and Datums.
  *
- * This module handles temporary storage of tuples for purposes such
- * as Materialize nodes, hashjoin batch files, etc.  It is essentially
- * a dumbed-down version of tuplesort.c; it does no sorting of tuples
- * but can only store and regurgitate a sequence of tuples.  However,
- * because no sort is required, it is allowed to start reading the sequence
- * before it has all been written.  This is particularly useful for cursors,
- * because it allows random access within the already-scanned portion of
- * a query without having to process the underlying scan to completion.
- * Also, it is possible to support multiple independent read pointers.
+ * This module handles temporary storage of either tuples or single
+ * Datum values for purposes such as Materialize nodes, hashjoin batch
+ * files, etc. It is essentially a dumbed-down version of tuplesort.c;
+ * it does no sorting of tuples but can only store and regurgitate a sequence
+ * of tuples.  However, because no sort is required, it is allowed to start
+ * reading the sequence before it has all been written.
+ *
+ * This is particularly useful for cursors, because it allows random access
+ * within the already-scanned portion of a query without having to process
+ * the underlying scan to completion.
  *
  * A temporary file is used to handle the data if it exceeds the
  * space limit specified by the caller.
@@ -39,14 +40,13 @@
  */
 typedef struct Tuplestorestate Tuplestorestate;
 
-/*
- * Currently we only need to store MinimalTuples, but it would be easy
- * to support the same behavior for IndexTuples and/or bare Datums.
- */
-
 extern Tuplestorestate *tuplestore_begin_heap(bool randomAccess,
 											  bool interXact,
 											  int maxKBytes);
+extern Tuplestorestate *tuplestore_begin_datum(Oid datumType,
+											   bool randomAccess,
+											   bool interXact,
+											   int maxKBytes);
 
 extern void tuplestore_set_eflags(Tuplestorestate *state, int eflags);
 
@@ -55,6 +55,7 @@ extern void tuplestore_puttupleslot(Tuplestorestate *state,
 extern void tuplestore_puttuple(Tuplestorestate *state, HeapTuple tuple);
 extern void tuplestore_putvalues(Tuplestorestate *state, TupleDesc tdesc,
 								 const Datum *values, const bool *isnull);
+extern void tuplestore_putdatum(Tuplestorestate *state, Datum datum);
 
 extern int	tuplestore_alloc_read_pointer(Tuplestorestate *state, int eflags);
 
@@ -72,6 +73,8 @@ extern bool tuplestore_in_memory(Tuplestorestate *state);
 
 extern bool tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
 									bool copy, TupleTableSlot *slot);
+extern bool tuplestore_getdatum(Tuplestorestate *state, bool forward,
+								bool *should_free, Datum *result);
 
 extern bool tuplestore_gettupleslot_force(Tuplestorestate *state, bool forward,
 										  bool copy, TupleTableSlot *slot);
-- 
2.43.0



  [application/octet-stream] v35-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch (36.6K, 5-v35-0002-Add-STIR-access-method-and-flags-related-to-auxi.patch)
  download | inline diff:
From e752c9ff26e718e209dfb928bfd379c5302a1f77 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sun, 11 Jan 2026 19:27:52 +0300
Subject: [PATCH v35 2/7] Add STIR access method and flags related to auxiliary
 indexes

This patch provides infrastructure for following enhancements to concurrent index builds by:
- ii_Auxiliary in IndexInfo: indicates that an index is an auxiliary index used during concurrent index build
- validate_index in IndexVacuumInfo: set if index_bulk_delete called during the validation phase of concurrent index build
- STIR (Short-Term Index Replacement) access method is introduced, intended solely for short-lived, auxiliary usage

STIR functions are designed as an ephemeral helper during concurrent index builds, temporarily storing TIDs without providing the full features of a typical access method. As such, it raises warnings or errors when accessed outside its specialized usage path.

Planned to be used in following commits.
---
 contrib/pgstattuple/pgstattuple.c        |   3 +
 src/backend/access/Makefile              |   1 +
 src/backend/access/heap/vacuumlazy.c     |   2 +
 src/backend/access/meson.build           |   1 +
 src/backend/access/stir/Makefile         |  18 +
 src/backend/access/stir/meson.build      |   5 +
 src/backend/access/stir/stir.c           | 567 +++++++++++++++++++++++
 src/backend/catalog/index.c              |   1 +
 src/backend/catalog/toasting.c           |   1 +
 src/backend/commands/analyze.c           |   1 +
 src/backend/commands/vacuumparallel.c    |   1 +
 src/backend/nodes/makefuncs.c            |   1 +
 src/include/access/genam.h               |   1 +
 src/include/access/reloptions.h          |   3 +-
 src/include/access/stir.h                | 110 +++++
 src/include/catalog/pg_am.dat            |   3 +
 src/include/catalog/pg_opclass.dat       |   4 +
 src/include/catalog/pg_opfamily.dat      |   2 +
 src/include/catalog/pg_proc.dat          |   4 +
 src/include/nodes/execnodes.h            |   7 +-
 src/include/utils/index_selfuncs.h       |   8 +
 src/test/regress/expected/amutils.out    |   8 +-
 src/test/regress/expected/opr_sanity.out |   7 +-
 src/test/regress/expected/psql.out       |  24 +-
 24 files changed, 765 insertions(+), 18 deletions(-)
 create mode 100644 src/backend/access/stir/Makefile
 create mode 100644 src/backend/access/stir/meson.build
 create mode 100644 src/backend/access/stir/stir.c
 create mode 100644 src/include/access/stir.h

diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 6a7f8cb4a7c..5b5984e3aa2 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -285,6 +285,9 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 			case SPGIST_AM_OID:
 				err = "spgist index";
 				break;
+			case STIR_AM_OID:
+				err = "stir index";
+				break;
 			case BRIN_AM_OID:
 				err = "brin index";
 				break;
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index e88d72ea039..ebbcfa90715 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -19,6 +19,7 @@ SUBDIRS	    = \
 	nbtree \
 	rmgrdesc \
 	spgist \
+	stir \
 	sequence \
 	table \
 	tablesample \
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 39395aed0d5..a6ac89360fc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3024,6 +3024,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
@@ -3075,6 +3076,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	ivinfo.num_heap_tuples = reltuples;
 	ivinfo.strategy = vacrel->bstrategy;
+	ivinfo.validate_index = false;
 
 	/*
 	 * Update error traceback information.
diff --git a/src/backend/access/meson.build b/src/backend/access/meson.build
index 5fd18de74f9..7219c65f365 100644
--- a/src/backend/access/meson.build
+++ b/src/backend/access/meson.build
@@ -11,6 +11,7 @@ subdir('nbtree')
 subdir('rmgrdesc')
 subdir('sequence')
 subdir('spgist')
+subdir('stir')
 subdir('table')
 subdir('tablesample')
 subdir('transam')
diff --git a/src/backend/access/stir/Makefile b/src/backend/access/stir/Makefile
new file mode 100644
index 00000000000..8785dab37bd
--- /dev/null
+++ b/src/backend/access/stir/Makefile
@@ -0,0 +1,18 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+#    Makefile for access/stir
+#
+# IDENTIFICATION
+#    src/backend/access/stir/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/stir
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+	stir.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/stir/meson.build b/src/backend/access/stir/meson.build
new file mode 100644
index 00000000000..4b7ad15346c
--- /dev/null
+++ b/src/backend/access/stir/meson.build
@@ -0,0 +1,5 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+backend_sources += files(
+	'stir.c',
+)
diff --git a/src/backend/access/stir/stir.c b/src/backend/access/stir/stir.c
new file mode 100644
index 00000000000..932590d9ccb
--- /dev/null
+++ b/src/backend/access/stir/stir.c
@@ -0,0 +1,567 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.c
+ *	  Implementation of Short-Term Index Replacement.
+ *
+ * STIR is a specialized access method type designed for temporary storage
+ * of TID values during concurrent index build operations.
+ *
+ * The typical lifecycle of a STIR index is:
+ * 1. created as an auxiliary index for CIC/RIC
+ * 2. accepts inserts for a period
+ * 3. stirbulkdelete called during index validation phase
+ * 4. gets dropped
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/stir/stir.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amvalidate.h"
+#include "access/htup_details.h"
+#include "access/stir.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_amop.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_opfamily.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "utils/catcache.h"
+#include "utils/fmgrprotos.h"
+#include "utils/index_selfuncs.h"
+#include "utils/memutils.h"
+#include "utils/regproc.h"
+#include "utils/syscache.h"
+
+/*
+ * Stir handler function: return IndexAmRoutine with access method parameters
+ * and callbacks.
+ */
+Datum
+stirhandler(PG_FUNCTION_ARGS)
+{
+	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
+
+	/* Set STIR-specific strategy and procedure numbers */
+	amroutine->amstrategies = STIR_NSTRATEGIES;
+	amroutine->amsupport = STIR_NPROC;
+	amroutine->amoptsprocnum = STIR_OPTIONS_PROC;
+
+	/* STIR doesn't support most index operations */
+	amroutine->amcanorder = false;
+	amroutine->amcanorderbyop = false;
+	amroutine->amcanbackward = false;
+	amroutine->amcanunique = false;
+	amroutine->amcanmulticol = true;
+	amroutine->amoptionalkey = true;
+	amroutine->amsearcharray = false;
+	amroutine->amsearchnulls = false;
+	amroutine->amstorage = false;
+	amroutine->amclusterable = false;
+	amroutine->ampredlocks = false;
+	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
+	amroutine->amcaninclude = true;
+	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amparallelvacuumoptions = VACUUM_OPTION_NO_PARALLEL;
+	amroutine->amkeytype = InvalidOid;
+
+	/* Set up function callbacks */
+	amroutine->ambuild = stirbuild;
+	amroutine->ambuildempty = stirbuildempty;
+	amroutine->aminsert = stirinsert;
+	amroutine->aminsertcleanup = NULL;
+	amroutine->ambulkdelete = stirbulkdelete;
+	amroutine->amvacuumcleanup = stirvacuumcleanup;
+	amroutine->amcanreturn = NULL;
+	amroutine->amcostestimate = stircostestimate;
+	amroutine->amoptions = stiroptions;
+	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
+	amroutine->amvalidate = stirvalidate;
+	amroutine->amadjustmembers = NULL;
+	amroutine->ambeginscan = stirbeginscan;
+	amroutine->amrescan = stirrescan;
+	amroutine->amgettuple = NULL;
+	amroutine->amgetbitmap = NULL;
+	amroutine->amendscan = stirendscan;
+	amroutine->ammarkpos = NULL;
+	amroutine->amrestrpos = NULL;
+	amroutine->amestimateparallelscan = NULL;
+	amroutine->aminitparallelscan = NULL;
+	amroutine->amparallelrescan = NULL;
+
+	PG_RETURN_POINTER(amroutine);
+}
+
+/*
+ * Validates operator class for STIR index.
+ *
+ * STIR is not a real index, so validate may be skipped.
+ * But we do it just for consistency.
+ */
+bool
+stirvalidate(Oid opclassoid)
+{
+	bool result = true;
+	HeapTuple classtup;
+	Form_pg_opclass classform;
+	Oid opfamilyoid;
+	HeapTuple familytup;
+	Form_pg_opfamily familyform;
+	char *opfamilyname;
+	CatCList *oprlist;
+	int i;
+
+	/* Fetch opclass information */
+	classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
+	if (!HeapTupleIsValid(classtup))
+		elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
+	classform = (Form_pg_opclass) GETSTRUCT(classtup);
+
+	opfamilyoid = classform->opcfamily;
+
+	/* Fetch opfamily information */
+	familytup = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfamilyoid));
+	if (!HeapTupleIsValid(familytup))
+		elog(ERROR, "cache lookup failed for operator family %u", opfamilyoid);
+	familyform = (Form_pg_opfamily) GETSTRUCT(familytup);
+
+	opfamilyname = NameStr(familyform->opfname);
+
+	/* Fetch all operators and support functions of the opfamily */
+	oprlist = SearchSysCacheList1(AMOPSTRATEGY, ObjectIdGetDatum(opfamilyoid));
+
+	/* Check individual operators */
+	for (i = 0; i < oprlist->n_members; i++)
+	{
+		HeapTuple oprtup = &oprlist->members[i]->tuple;
+		Form_pg_amop oprform = (Form_pg_amop) GETSTRUCT(oprtup);
+
+		/* Check it's allowed strategy for stir */
+		if (oprform->amopstrategy < 1 ||
+			oprform->amopstrategy > STIR_NSTRATEGIES)
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with invalid strategy number %d",
+					        opfamilyname,
+					        format_operator(oprform->amopopr),
+					        oprform->amopstrategy)));
+			result = false;
+		}
+
+		/* stir doesn't support ORDER BY operators */
+		if (oprform->amoppurpose != AMOP_SEARCH ||
+			OidIsValid(oprform->amopsortfamily))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains invalid ORDER BY specification for operator %s",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+
+		/* Check operator signature --- same for all stir strategies */
+		if (!check_amop_signature(oprform->amopopr, BOOLOID,
+		                          oprform->amoplefttype,
+		                          oprform->amoprighttype))
+		{
+			ereport(INFO,
+			        (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				        errmsg("stir opfamily %s contains operator %s with wrong signature",
+					        opfamilyname,
+					        format_operator(oprform->amopopr))));
+			result = false;
+		}
+	}
+
+	ReleaseCatCacheList(oprlist);
+	ReleaseSysCache(familytup);
+	ReleaseSysCache(classtup);
+
+	return result;
+}
+
+/*
+ * Initialize meta-page of a STIR index.
+ * The skipInserts flag determines if new inserts will be accepted or skipped.
+ */
+void
+StirFillMetapage(Relation index, Page metaPage, bool skipInserts)
+{
+	StirMetaPageData *metadata;
+
+	StirInitPage(metaPage, STIR_META);
+	metadata = StirPageGetMeta(metaPage);
+	memset(metadata, 0, sizeof(StirMetaPageData));
+	metadata->magicNumber = STIR_MAGIC_NUMBER;
+	metadata->skipInserts = skipInserts;
+	((PageHeader) metaPage)->pd_lower = ((char *) metadata + sizeof(StirMetaPageData)) - (char *) metaPage;
+}
+
+/*
+ * Create and initialize the metapage for a STIR index.
+ * This is called during index creation.
+ */
+void
+StirInitMetapage(Relation index, ForkNumber forknum)
+{
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	/*
+	 * Make a new page; since it is the first page it should be associated with
+	 * block number 0 (STIR_METAPAGE_BLKNO).  No need to hold the extension
+	 * lock because there cannot be concurrent inserters yet.
+	 */
+	metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+	Assert(BufferGetBlockNumber(metaBuffer) == STIR_METAPAGE_BLKNO);
+
+	metaPage = BufferGetPage(metaBuffer);
+	StirFillMetapage(index, metaPage, forknum == INIT_FORKNUM);
+
+	MarkBufferDirty(metaBuffer);
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * Initialize any page of a stir index.
+ */
+void
+StirInitPage(Page page, uint16 flags)
+{
+	StirPageOpaque opaque;
+
+	PageInit(page, BLCKSZ, sizeof(StirPageOpaqueData));
+
+	opaque = StirPageGetOpaque(page);
+	opaque->flags = flags;
+	opaque->stir_page_id = STIR_PAGE_ID;
+}
+
+/*
+ * Add a tuple to a STIR page. Returns false if the tuple doesn't fit.
+ * The tuple is added to the end of the page.
+ */
+static bool
+StirPageAddItem(Page page, StirTuple *tuple)
+{
+	StirTuple *itup;
+	StirPageOpaque opaque;
+	char *ptr;
+
+	/* We shouldn't be pointed to an invalid page */
+	Assert(!PageIsNew(page));
+
+	/* Does the new tuple fit on the page? */
+	if (StirPageGetFreeSpace(page) < sizeof(StirTuple))
+		return false;
+
+	/* Copy a new tuple to the end of the page */
+	opaque = StirPageGetOpaque(page);
+	itup = StirPageGetTuple(page, opaque->maxoff + 1);
+	memcpy(itup, tuple, sizeof(StirTuple));
+
+	/* Adjust maxoff and pd_lower */
+	opaque->maxoff++;
+	ptr = (char *) StirPageGetTuple(page, opaque->maxoff + 1);
+	((PageHeader) page)->pd_lower = ptr - page;
+
+	/* Assert we didn't overrun available space */
+	Assert(((PageHeader) page)->pd_lower <= ((PageHeader) page)->pd_upper);
+	return true;
+}
+
+/*
+ * Insert a new tuple into a STIR index.
+ */
+bool
+stirinsert(Relation index, Datum *values, bool *isnull,
+		  ItemPointer ht_ctid, Relation heapRel,
+		  IndexUniqueCheck checkUnique,
+		  bool indexUnchanged,
+		  struct IndexInfo *indexInfo)
+{
+	StirTuple itup;
+	StirMetaPageData *metaData;
+	Buffer buffer,
+			metaBuffer;
+	Page page;
+	BlockNumber blkNo;
+
+	itup.heapPtr = *ht_ctid;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+
+	for (;;)
+	{
+		LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+		/* Check if inserts are allowed */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+		blkNo = metaData->lastBlkNo;
+		/* Don't hold metabuffer lock while doing insert */
+		LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+
+		if (blkNo > 0)
+		{
+			buffer = ReadBuffer(index, blkNo);
+			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+			page = BufferGetPage(buffer);
+
+			Assert(!PageIsNew(page));
+
+			/* Try to add tuple to the existing page */
+			if (StirPageAddItem(page, &itup))
+			{
+				/* Success!  Apply the change, clean up, and exit */
+				MarkBufferDirty(buffer);
+
+				UnlockReleaseBuffer(buffer);
+				ReleaseBuffer(metaBuffer);
+				return false;
+			}
+
+			UnlockReleaseBuffer(buffer);
+		}
+
+		/* Need to add a new page - get exclusive lock on meta-page */
+		LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+		metaData = StirPageGetMeta(BufferGetPage(metaBuffer));
+
+		/* Re-check after acquiring exclusive lock */
+		if (metaData->skipInserts)
+		{
+			UnlockReleaseBuffer(metaBuffer);
+			return false;
+		}
+
+		/* Check if another backend already extended the index */
+		if (blkNo != metaData->lastBlkNo)
+		{
+			Assert(blkNo < metaData->lastBlkNo);
+			/* Someone else inserted the new page into the index, let's try again */
+			LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
+			continue;
+		}
+		else
+		{
+			/* Must extend the file */
+			buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
+									   EB_LOCK_FIRST);
+			page = BufferGetPage(buffer);
+
+			StirInitPage(page, 0);
+
+			if (!StirPageAddItem(page, &itup))
+			{
+				/* We shouldn't be here since we're inserting to an empty page */
+				elog(ERROR, "could not add new stir tuple to empty page");
+			}
+
+			/* Update meta-page with new last block number */
+			metaData->lastBlkNo = BufferGetBlockNumber(buffer);
+
+			MarkBufferDirty(metaBuffer);
+			MarkBufferDirty(buffer);
+
+			UnlockReleaseBuffer(buffer);
+			UnlockReleaseBuffer(metaBuffer);
+
+			return false;
+		}
+	}
+}
+
+/*
+ * STIR doesn't support scans - these functions all error out
+ */
+IndexScanDesc
+stirbeginscan(Relation r, int nkeys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void
+stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+		  ScanKey orderbys, int norderbys)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+void stirendscan(IndexScanDesc scan)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
+
+/*
+ * Build a STIR index - only allowed for auxiliary indexes.
+ * Just initializes the meta-page without any heap scans.
+ */
+IndexBuildResult *
+stirbuild(Relation heap, Relation index,
+						   struct IndexInfo *indexInfo)
+{
+	IndexBuildResult *result;
+
+	if (!indexInfo->ii_Auxiliary)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Building STIR indexes is not supported")));
+
+	StirInitMetapage(index, MAIN_FORKNUM);
+
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	result->heap_tuples = 0;
+	result->index_tuples = 0;
+	return result;
+}
+
+void stirbuildempty(Relation index)
+{
+	StirInitMetapage(index, INIT_FORKNUM);
+}
+
+IndexBulkDeleteResult *
+stirbulkdelete(IndexVacuumInfo *info,
+									 IndexBulkDeleteResult *stats,
+									 IndexBulkDeleteCallback callback,
+									 void *callback_state)
+{
+	Relation index = info->index;
+	BlockNumber blkno, npages;
+	Buffer buffer;
+	Page page;
+
+	/*
+	 * For normal VACUUM, mark to skip inserts and warn about an index drop
+	 * needed.  In practice this path is not reachable during CREATE INDEX
+	 * CONCURRENTLY because the table-level locks held by CIC prevent concurrent
+	 * VACUUM from opening the auxiliary index.  It can only be reached if a
+	 * leftover STIR index somehow survives after a failed CIC and a later
+	 * VACUUM encounters it.
+	 */
+	if (!info->validate_index)
+	{
+		StirMarkAsSkipInserts(index);
+
+		ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+		return NULL;
+	}
+
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	/*
+	 * Iterate over the pages. We don't care about concurrently added pages,
+	 * because the index is marked as not-ready for that moment and the index is not
+	 * used for insert.
+	 */
+	npages = RelationGetNumberOfBlocks(index);
+	for (blkno = STIR_HEAD_BLKNO; blkno < npages; blkno++)
+	{
+		StirTuple *itup, *itupEnd;
+
+		vacuum_delay_point(false);
+
+		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
+									RBM_NORMAL, info->strategy);
+
+		LockBuffer(buffer, BUFFER_LOCK_SHARE);
+		page = BufferGetPage(buffer);
+
+		if (PageIsNew(page))
+		{
+			UnlockReleaseBuffer(buffer);
+			continue;
+		}
+
+		itup = StirPageGetTuple(page, FirstOffsetNumber);
+		itupEnd = StirPageGetTuple(page, OffsetNumberNext(StirPageGetMaxOffset(page)));
+		while (itup < itupEnd)
+		{
+			/* Do we have to delete this tuple? */
+			if (callback(&itup->heapPtr, callback_state))
+			{
+				ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("we never delete in stir")));
+			}
+
+			itup = StirPageGetNextTuple(itup);
+		}
+
+		UnlockReleaseBuffer(buffer);
+	}
+
+	return stats;
+}
+
+/*
+ * Mark a STIR index to skip future inserts
+ */
+void
+StirMarkAsSkipInserts(Relation index)
+{
+	StirMetaPageData *metaData;
+	Buffer metaBuffer;
+	Page metaPage;
+
+	Assert(!RelationNeedsWAL(index));
+	metaBuffer = ReadBuffer(index, STIR_METAPAGE_BLKNO);
+	LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	metaPage = BufferGetPage(metaBuffer);
+	metaData = StirPageGetMeta(metaPage);
+
+	if (!metaData->skipInserts)
+	{
+		metaData->skipInserts = true;
+		MarkBufferDirty(metaBuffer);
+	}
+	UnlockReleaseBuffer(metaBuffer);
+}
+
+/*
+ * As with stirbulkdelete, this is not reachable during a normal CIC due to
+ * table-level locking.  It serves as a safety net for leftover STIR indexes
+ * from failed concurrent index builds.
+ */
+IndexBulkDeleteResult *
+stirvacuumcleanup(IndexVacuumInfo *info,
+				  IndexBulkDeleteResult *stats)
+{
+	StirMarkAsSkipInserts(info->index);
+	ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("\"%s\" is not implemented, seems like this index needs to be dropped", __func__)));
+	return NULL;
+}
+
+bytea *
+stiroptions(Datum reloptions, bool validate)
+{
+	return NULL;
+}
+
+void
+stircostestimate(PlannerInfo *root, IndexPath *path,
+					 double loop_count, Cost *indexStartupCost,
+					 Cost *indexTotalCost, Selectivity *indexSelectivity,
+					 double *indexCorrelation, double *indexPages)
+{
+	ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("\"%s\" is not implemented", __func__)));
+}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9407c357f27..cc067e58d36 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3432,6 +3432,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	ivinfo.message_level = DEBUG2;
 	ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
 	ivinfo.strategy = NULL;
+	ivinfo.validate_index = true;
 
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 4aa52a4bd25..d7ea86b2805 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -314,6 +314,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_ParallelWorkers = 0;
 	indexInfo->ii_Am = BTREE_AM_OID;
 	indexInfo->ii_AmCache = NULL;
+	indexInfo->ii_Auxiliary = false;
 	indexInfo->ii_Context = CurrentMemoryContext;
 
 	collationIds[0] = InvalidOid;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 020a5919b84..e82a6926e8c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -731,6 +731,7 @@ do_analyze_rel(Relation onerel, const VacuumParams *params,
 			ivinfo.message_level = elevel;
 			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
 			ivinfo.strategy = vac_strategy;
+			ivinfo.validate_index = false;
 
 			stats = index_vacuum_cleanup(&ivinfo, NULL);
 
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 979c2be4abd..9db6b17abdc 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1092,6 +1092,7 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
 	ivinfo.estimated_count = pvs->shared->estimated_count;
 	ivinfo.num_heap_tuples = pvs->shared->reltuples;
 	ivinfo.strategy = pvs->bstrategy;
+	ivinfo.validate_index = false;
 
 	/* Update error traceback information */
 	pvs->indname = pstrdup(RelationGetRelationName(indrel));
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 3cd35c5c457..5359dab1176 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -875,6 +875,7 @@ makeIndexInfo(int numattrs, int numkeyattrs, Oid amoid, List *expressions,
 	/* initialize index-build state to default */
 	n->ii_BrokenHotChain = false;
 	n->ii_ParallelWorkers = 0;
+	n->ii_Auxiliary = false;
 
 	/* set up for possible use by index AM */
 	n->ii_Am = amoid;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 68bfe405db3..e4a666b2f72 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -58,6 +58,7 @@ typedef struct IndexVacuumInfo
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
 	double		num_heap_tuples;	/* tuples remaining in heap */
+	bool		validate_index; /* validating concurrently built index? */
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index e8cb7f7a627..7f3f08a70ac 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -51,8 +51,9 @@ typedef enum relopt_kind
 	RELOPT_KIND_VIEW = (1 << 9),
 	RELOPT_KIND_BRIN = (1 << 10),
 	RELOPT_KIND_PARTITIONED = (1 << 11),
+	RELOPT_KIND_STIR = (1 << 12),
 	/* if you add a new kind, make sure you update "last_default" too */
-	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_STIR,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/access/stir.h b/src/include/access/stir.h
new file mode 100644
index 00000000000..b08cf4d4ef0
--- /dev/null
+++ b/src/include/access/stir.h
@@ -0,0 +1,110 @@
+/*-------------------------------------------------------------------------
+ *
+ * stir.h
+ *	  header file for postgres stir access method implementation.
+ *
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * src/include/access/stir.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef STIR_H
+#define STIR_H
+
+#include "access/amapi.h"
+#include "nodes/pathnodes.h"
+#include "storage/bufpage.h"
+
+/* Support procedures numbers */
+#define STIR_NPROC				0
+
+/* Scan strategies */
+#define STIR_NSTRATEGIES		1
+
+#define STIR_OPTIONS_PROC				0
+
+/* Macros for accessing stir page structures */
+#define StirPageGetOpaque(page) ((StirPageOpaque) PageGetSpecialPointer(page))
+#define StirPageGetMaxOffset(page) (StirPageGetOpaque(page)->maxoff)
+#define StirPageIsMeta(page) \
+	((StirPageGetOpaque(page)->flags & STIR_META) != 0)
+#define StirPageGetTuple(page, offset) \
+	((StirTuple *)(PageGetContents(page) \
+		+ sizeof(StirTuple) * ((offset) - 1)))
+#define StirPageGetNextTuple(tuple) \
+	((StirTuple *)((char *)(tuple) + sizeof(StirTuple)))
+
+
+
+/* Preserved page numbers */
+#define STIR_METAPAGE_BLKNO	(0)
+#define STIR_HEAD_BLKNO		(1) /* first data page */
+
+
+/* Opaque for stir pages */
+typedef struct StirPageOpaqueData
+{
+	OffsetNumber maxoff;		/* number of index tuples on the page */
+	uint16		flags;			/* see bit definitions below */
+	uint16		stir_page_id;	/* for identification of STIR indexes */
+} StirPageOpaqueData;
+
+/* Stir page flags */
+#define STIR_META		(1<<0)
+
+typedef StirPageOpaqueData *StirPageOpaque;
+
+#define STIR_PAGE_ID		0xFF84
+
+/* Metadata of stir index */
+typedef struct StirMetaPageData
+{
+	uint32		magicNumber;
+	BlockNumber	lastBlkNo;
+	bool		skipInserts;	/* should we just exit without any inserts? */
+} StirMetaPageData;
+
+/* Magic number to distinguish stir pages from others */
+#define STIR_MAGIC_NUMBER (0xDBAC0DEF)
+
+#define StirPageGetMeta(page)	((StirMetaPageData *) PageGetContents(page))
+
+typedef struct StirTuple
+{
+	ItemPointerData heapPtr;
+} StirTuple;
+
+#define StirPageGetFreeSpace(page) \
+	(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
+		- StirPageGetMaxOffset(page) * (sizeof(StirTuple)) \
+		- MAXALIGN(sizeof(StirPageOpaqueData)))
+
+extern void StirFillMetapage(Relation index, Page metaPage, bool skipInserts);
+extern void StirInitMetapage(Relation index, ForkNumber forknum);
+extern void StirInitPage(Page page, uint16 flags);
+extern void StirMarkAsSkipInserts(Relation index);
+
+/* index access method interface functions */
+extern bool stirvalidate(Oid opclassoid);
+extern bool stirinsert(Relation index, Datum *values, bool *isnull,
+					 ItemPointer ht_ctid, Relation heapRel,
+					 IndexUniqueCheck checkUnique,
+					 bool indexUnchanged,
+					 struct IndexInfo *indexInfo);
+extern IndexScanDesc stirbeginscan(Relation r, int nkeys, int norderbys);
+extern void stirrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
+					 ScanKey orderbys, int norderbys);
+extern void stirendscan(IndexScanDesc scan);
+extern IndexBuildResult *stirbuild(Relation heap, Relation index,
+								 struct IndexInfo *indexInfo);
+extern void stirbuildempty(Relation index);
+extern IndexBulkDeleteResult *stirbulkdelete(IndexVacuumInfo *info,
+										   IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+										   void *callback_state);
+extern IndexBulkDeleteResult *stirvacuumcleanup(IndexVacuumInfo *info,
+											  IndexBulkDeleteResult *stats);
+extern bytea *stiroptions(Datum reloptions, bool validate);
+
+#endif			/* STIR_H */
diff --git a/src/include/catalog/pg_am.dat b/src/include/catalog/pg_am.dat
index 46d361047fe..8bd2c2b46ba 100644
--- a/src/include/catalog/pg_am.dat
+++ b/src/include/catalog/pg_am.dat
@@ -33,5 +33,8 @@
 { oid => '3580', oid_symbol => 'BRIN_AM_OID',
   descr => 'block range index (BRIN) access method',
   amname => 'brin', amhandler => 'brinhandler', amtype => 'i' },
+{ oid => '5555', oid_symbol => 'STIR_AM_OID',
+  descr => 'short term index replacement access method',
+  amname => 'stir', amhandler => 'stirhandler', amtype => 'i' },
 
 ]
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index df170b80840..a3457e749db 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -492,4 +492,8 @@
 
 # no brin opclass for the geometric types except box
 
+# allow any types for STIR
+{ opcmethod => 'stir', oid_symbol => 'ANY_STIR_OPS_OID', opcname => 'stir_ops',
+  opcfamily => 'stir/any_ops', opcintype => 'any'},
+
 ]
diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat
index 7a027c4810e..6ffc20a061c 100644
--- a/src/include/catalog/pg_opfamily.dat
+++ b/src/include/catalog/pg_opfamily.dat
@@ -308,5 +308,7 @@
   opfmethod => 'hash', opfname => 'multirange_ops' },
 { oid => '6158',
   opfmethod => 'gist', opfname => 'multirange_ops' },
+{ oid => '5558',
+  opfmethod => 'stir', opfname => 'any_ops' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fa9ae79082b..8f701faf6dd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -935,6 +935,10 @@
   proname => 'brinhandler', provolatile => 'v',
   prorettype => 'index_am_handler', proargtypes => 'internal',
   prosrc => 'brinhandler' },
+{ oid => '5556', descr => 'short term index replacement access method handler',
+  proname => 'stirhandler', provolatile => 'v',
+  prorettype => 'index_am_handler', proargtypes => 'internal',
+  prosrc => 'stirhandler' },
 { oid => '3952', descr => 'brin: standalone scan new table pages',
   proname => 'brin_summarize_new_values', provolatile => 'v',
   proparallel => 'u', prorettype => 'int4', proargtypes => 'regclass',
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 13359180d25..3eaeed3c141 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -169,8 +169,8 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
- * during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
+ * are used only during index build; they're conventionally zeroed otherwise.
  * ----------------
  */
 typedef struct IndexInfo
@@ -230,7 +230,8 @@ typedef struct IndexInfo
 	bool		ii_WithoutOverlaps;
 	/* # of workers requested (excludes leader) */
 	int			ii_ParallelWorkers;
-
+	/* is auxiliary for concurrent index build? */
+	bool		ii_Auxiliary;
 	/* Oid of index AM */
 	Oid			ii_Am;
 	/* private cache area for index AM */
diff --git a/src/include/utils/index_selfuncs.h b/src/include/utils/index_selfuncs.h
index 74793a1a19d..bf0e30dabe9 100644
--- a/src/include/utils/index_selfuncs.h
+++ b/src/include/utils/index_selfuncs.h
@@ -62,6 +62,14 @@ extern void spgcostestimate(struct PlannerInfo *root,
 							Selectivity *indexSelectivity,
 							double *indexCorrelation,
 							double *indexPages);
+extern void stircostestimate(struct PlannerInfo *root,
+							struct IndexPath *path,
+							double loop_count,
+							Cost *indexStartupCost,
+							Cost *indexTotalCost,
+							Selectivity *indexSelectivity,
+							double *indexCorrelation,
+							double *indexPages);
 extern void gincostestimate(struct PlannerInfo *root,
 							struct IndexPath *path,
 							double loop_count,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index 7ab6113c619..92c033a2010 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -173,7 +173,13 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
  spgist | can_exclude   | t
  spgist | can_include   | t
  spgist | bogus         | 
-(36 rows)
+ stir   | can_order     | f
+ stir   | can_unique    | f
+ stir   | can_multi_col | t
+ stir   | can_exclude   | f
+ stir   | can_include   | t
+ stir   | bogus         | 
+(42 rows)
 
 --
 -- additional checks for pg_index_column_has_property
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index cfdc6b1a17a..cc947194aa7 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2131,9 +2131,10 @@ FROM pg_opclass AS c1
 WHERE NOT EXISTS(SELECT 1 FROM pg_amop AS a1
                  WHERE a1.amopfamily = c1.opcfamily
                    AND binary_coercible(c1.opcintype, a1.amoplefttype));
- opcname | opcfamily 
----------+-----------
-(0 rows)
+ opcname  | opcfamily 
+----------+-----------
+ stir_ops |      5558
+(1 row)
 
 -- Check that each operator listed in pg_amop has an associated opclass,
 -- that is one whose opcintype matches oprleft (possibly by coercion).
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index c8f3932edf0..ecc2c2a6049 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5171,7 +5171,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA *
 List of access methods
@@ -5185,7 +5186,8 @@ List of access methods
  heap   | Table
  heap2  | Table
  spgist | Index
-(8 rows)
+ stir   | Index
+(9 rows)
 
 \dA h*
 List of access methods
@@ -5210,9 +5212,9 @@ List of access methods
 
 \dA: extra argument "bar" ignored
 \dA+
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5221,12 +5223,13 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ *
-                             List of access methods
-  Name  | Type  |       Handler        |              Description               
---------+-------+----------------------+----------------------------------------
+                               List of access methods
+  Name  | Type  |       Handler        |                Description                 
+--------+-------+----------------------+--------------------------------------------
  brin   | Index | brinhandler          | block range index (BRIN) access method
  btree  | Index | bthandler            | b-tree index access method
  gin    | Index | ginhandler           | GIN index access method
@@ -5235,7 +5238,8 @@ List of access methods
  heap   | Table | heap_tableam_handler | heap table access method
  heap2  | Table | heap_tableam_handler | 
  spgist | Index | spghandler           | SP-GiST index access method
-(8 rows)
+ stir   | Index | stirhandler          | short term index replacement access method
+(9 rows)
 
 \dA+ h*
                      List of access methods
-- 
2.43.0



  [application/octet-stream] v35-0001-Add-stress-tests-for-concurrent-index-builds.patch (12.6K, 6-v35-0001-Add-stress-tests-for-concurrent-index-builds.patch)
  download | inline diff:
From 3b08777401347224a791b05b4c9c566fe98757f5 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 30 Nov 2024 16:24:20 +0100
Subject: [PATCH v35 1/7] Add stress tests for concurrent index builds

Introduce stress tests for concurrent index operations:
- test concurrent inserts/updates during CREATE/REINDEX INDEX CONCURRENTLY
- cover various index types (btree, gin, gist, brin, hash, spgist)
- test unique and non-unique indexes
- test with expressions and predicates
- test both parallel and non-parallel operations
- test both read-committed and repeatable-read isolation levels

These tests verify the behavior of the following commits.
---
 src/bin/pg_amcheck/meson.build  |   1 +
 src/bin/pg_amcheck/t/006_cic.pl | 293 ++++++++++++++++++++++++++++++++
 2 files changed, 294 insertions(+)
 create mode 100644 src/bin/pg_amcheck/t/006_cic.pl

diff --git a/src/bin/pg_amcheck/meson.build b/src/bin/pg_amcheck/meson.build
index 592cef74ecb..51a62dccb7b 100644
--- a/src/bin/pg_amcheck/meson.build
+++ b/src/bin/pg_amcheck/meson.build
@@ -28,6 +28,7 @@ tests += {
       't/003_check.pl',
       't/004_verify_heapam.pl',
       't/005_opclass_damage.pl',
+      't/006_cic.pl',
     ],
   },
 }
diff --git a/src/bin/pg_amcheck/t/006_cic.pl b/src/bin/pg_amcheck/t/006_cic.pl
new file mode 100644
index 00000000000..dd7a1eff0ef
--- /dev/null
+++ b/src/bin/pg_amcheck/t/006_cic.pl
@@ -0,0 +1,293 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+use constant STRESS_PGBENCH_CLIENTS => 30;
+use constant STRESS_PGBENCH_JOBS => 8;
+use constant STRESS_PGBENCH_TRANSACTIONS => 10000;
+use constant STRESS_MAX_SLEEP_MS => 10;
+
+use constant DEFAULT_PGBENCH_CLIENTS => 15;
+use constant DEFAULT_PGBENCH_JOBS => 4;
+use constant DEFAULT_PGBENCH_TRANSACTIONS => 500;
+use constant DEFAULT_MAX_SLEEP_MS => 1;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my $node;
+my $pg_test_extra = $ENV{PG_TEST_EXTRA} // '';
+my $is_stress = $pg_test_extra =~ /\bstress\b/ ? 1 : 0;
+my $pgbench_clients =
+  $is_stress ? STRESS_PGBENCH_CLIENTS : DEFAULT_PGBENCH_CLIENTS;
+my $pgbench_jobs = $is_stress ? STRESS_PGBENCH_JOBS : DEFAULT_PGBENCH_JOBS;
+my $pgbench_transactions =
+  $is_stress ? STRESS_PGBENCH_TRANSACTIONS : DEFAULT_PGBENCH_TRANSACTIONS;
+my $max_sleep_ms = $is_stress ? STRESS_MAX_SLEEP_MS : DEFAULT_MAX_SLEEP_MS;
+my $pgbench_options = sprintf(
+	'--no-vacuum --client=%d --jobs=%d --exit-on-abort --transactions=%d',
+	$pgbench_clients,
+	$pgbench_jobs,
+	$pgbench_transactions);
+my $no_hot = $is_stress ? int(rand(2)) : 0;
+
+print(
+		sprintf(
+		'settings: PG_TEST_EXTRA=%s stress=%d clients=%d jobs=%d transactions=%d max_sleep_ms=%d no_hot=%d',
+		defined($ENV{PG_TEST_EXTRA})
+		? ($pg_test_extra eq '' ? '(empty)' : $pg_test_extra)
+		: '(undef)',
+		$is_stress,
+		$pgbench_clients,
+		$pgbench_jobs,
+		$pgbench_transactions,
+		$max_sleep_ms,
+		$no_hot));
+print "\n";
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'maintenance_work_mem = 32MB'); # to avoid OOM
+$node->append_conf('postgresql.conf', 'shared_buffers = 32MB'); # to avoid OOM
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE UNLOGGED TABLE tbl(i int primary key,
+								c1 money default 0, c2 money default 0,
+								c3 money default 0, updated_at timestamp,
+								ia int4[], p point)));
+
+if ($no_hot) { $node->safe_psql('postgres', q(CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);)); }
+
+# create sequence
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+# Create helper functions for predicate tests
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_stable() RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		EXECUTE 'SELECT txid_current()';
+		RETURN true;
+	END; $$;
+));
+
+$node->safe_psql('postgres', q(
+	CREATE FUNCTION predicate_const(integer) RETURNS bool IMMUTABLE
+	LANGUAGE plpgsql AS $$
+	BEGIN
+		RETURN MOD($1, 2) = 0;
+	END; $$;
+));
+
+# Run CIC/RIC in different options concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+	{
+		'concurrent_ops' => sprintf(q(
+			SET debug_parallel_query = off; -- this is because predicate_stable implementation
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set variant random(0, 5)
+				\set parallels random(0, 4)
+				\set use_rr random(0, 9)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :use_rr = 0
+						SET default_transaction_isolation = 'repeatable read';
+					\endif
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_stable();
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE MOD(i, 2) = 0;
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, updated_at) WHERE predicate_const(i);
+					\elif :variant = 4
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(predicate_const(i));
+					\elif :variant = 5
+						CREATE INDEX CONCURRENTLY new_idx ON tbl(i, predicate_const(i), updated_at) WHERE predicate_const(i);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+					RESET default_transaction_isolation;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1000, 100000)
+				BEGIN;
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+				COMMIT;
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for unique index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for unique BTREE',
+	{
+		'concurrent_ops_unique_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\set use_rr random(0, 9)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :use_rr = 0
+						SET default_transaction_isolation = 'repeatable read';
+					\endif
+					CREATE UNIQUE INDEX CONCURRENTLY new_idx ON tbl(i);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT bt_index_check('new_idx', heapallindexed => true, checkunique => true);
+					DROP INDEX CONCURRENTLY new_idx;
+					RESET default_transaction_isolation;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIN with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIN',
+	{
+		'concurrent_ops_gin_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\set use_rr random(0, 9)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :use_rr = 0
+						SET default_transaction_isolation = 'repeatable read';
+					\endif
+					CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIN (ia);
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					SELECT gin_index_check('new_idx');
+					DROP INDEX CONCURRENTLY new_idx;
+					RESET default_transaction_isolation;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+	});
+
+$node->safe_psql('postgres', q(TRUNCATE TABLE tbl;));
+
+# Run CIC/RIC for GIST/BRIN/HASH/SPGIST index concurrently with upserts
+$node->pgbench(
+	$pgbench_options,
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY for GIST/BRIN/HASH/SPGIST',
+	{
+		'concurrent_ops_other_idx' => sprintf(q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+				SELECT nextval('in_row_rebuild') AS last_value \gset
+				\set parallels random(0, 4)
+				\set use_rr random(0, 9)
+				\if :last_value < 3
+					ALTER TABLE tbl SET (parallel_workers=:parallels);
+					\if :use_rr = 0
+						SET default_transaction_isolation = 'repeatable read';
+					\endif
+					\set variant random(0, 3)
+					\if :variant = 0
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING GIST (p);
+					\elif :variant = 1
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING BRIN (updated_at);
+					\elif :variant = 2
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING HASH (updated_at);
+					\elif :variant = 3
+						CREATE INDEX CONCURRENTLY new_idx ON tbl USING SPGIST (p);
+					\endif
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					REINDEX INDEX CONCURRENTLY new_idx;
+					\set sleep_ms random(0, %d)
+					\sleep :sleep_ms ms
+					DROP INDEX CONCURRENTLY new_idx;
+					RESET default_transaction_isolation;
+				\endif
+				SELECT pg_advisory_unlock(42);
+			\else
+				\set num random(1, power(10, random(1, 5)))
+				INSERT INTO tbl VALUES(floor(random()*:num),0,0,0,now(),ARRAY[floor(random()*100)::int],point(random(),random()))
+					ON CONFLICT(i) DO UPDATE SET updated_at = now(), ia = ARRAY[floor(random()*100)::int], p = point(random(),random());
+				SELECT setval('in_row_rebuild', 1);
+			\endif
+			), $max_sleep_ms, $max_sleep_ms)
+		});
+
+$node->stop;
+done_testing();
-- 
2.43.0



  [application/octet-stream] v35-0006-Optimize-auxiliary-index-handling.patch (3.0K, 7-v35-0006-Optimize-auxiliary-index-handling.patch)
  download | inline diff:
From fe5dc728bf0b00388d244a8d6141b57299d9b327 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 30 Dec 2024 16:37:12 +0100
Subject: [PATCH v35 6/7] Optimize auxiliary index handling

Skip unnecessary computations for auxiliary indices by:
- in the index-insert path, detect auxiliary indexes and bypass Datum value computation
- set indexUnchanged=false for auxiliary indices to avoid redundant checks

These optimizations reduce overhead during concurrent index operations.
---
 src/backend/catalog/index.c         | 9 +++++++++
 src/backend/executor/execIndexing.c | 5 ++++-
 src/include/nodes/execnodes.h       | 6 ++++--
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9136dfc7c73..4edf68aced2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2940,6 +2940,15 @@ FormIndexDatum(IndexInfo *indexInfo,
 	ListCell   *indexpr_item;
 	int			i;
 
+	/* Auxiliary index does not need any values to be computed */
+	if (unlikely(indexInfo->ii_Auxiliary))
+	{
+		Assert(indexInfo->ii_Am == STIR_AM_OID);
+		memset(values, 0, sizeof(Datum) * indexInfo->ii_NumIndexAttrs);
+		memset(isnull, true, sizeof(bool) * indexInfo->ii_NumIndexAttrs);
+		return;
+	}
+
 	if (indexInfo->ii_Expressions != NIL &&
 		indexInfo->ii_ExpressionsState == NIL)
 	{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index eb383812901..2df77a0606d 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -439,8 +439,11 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
 		 * There's definitely going to be an index_insert() call for this
 		 * index.  If we're being called as part of an UPDATE statement,
 		 * consider if the 'indexUnchanged' = true hint should be passed.
+		 *
+		 * For auxiliary indexes, always pass false to skip value comparison checks,
+		 * since auxiliary indexes only store TIDs and don't track value changes.
 		 */
-		indexUnchanged = ((flags & EIIT_IS_UPDATE) &&
+		indexUnchanged = ((flags & EIIT_IS_UPDATE) && !indexInfo->ii_Auxiliary &&
 						  index_unchanged_by_update(resultRelInfo,
 													estate,
 													indexInfo,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index af58d4cf4b5..99916e150fd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -169,8 +169,10 @@ typedef struct ExprState
  *		entries for a particular index.  Used for both index_build and
  *		retail creation of index entries.
  *
- * ii_Concurrent, ii_BrokenHotChain, ii_Auxiliary and ii_ParallelWorkers
- * are used only during index build; they're conventionally zeroed otherwise.
+ * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
+ * during index build; they're conventionally zeroed otherwise.  ii_Auxiliary
+ * is also used during retail inserts to skip datum formation for auxiliary
+ * indexes.
  * ----------------
  */
 typedef struct IndexInfo
-- 
2.43.0



  [application/octet-stream] v35-0007-Refresh-snapshot-periodically-during-index-valid.patch (27.1K, 8-v35-0007-Refresh-snapshot-periodically-during-index-valid.patch)
  download | inline diff:
From 5a7eea41aa335b3592f83b6f8fb701e357c47e6b Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Mon, 21 Apr 2025 14:11:53 +0200
Subject: [PATCH v35 7/7] Refresh snapshot periodically during index validation

Enhances validation phase of concurrently built indexes by periodically refreshing snapshots rather than using a single reference snapshot. This addresses issues with xmin propagation during long-running validations.

The validation now takes a fresh snapshot every few pages, allowing the xmin horizon to advance. This restores feature of commit d9d076222f5b, which was reverted in commit e28bb8851969. New STIR-based approach does not depend on single reference snapshot anymore.
---
 src/backend/access/heap/README.HOT         |  4 +-
 src/backend/access/heap/heapam_handler.c   | 77 +++++++++++++++++++++-
 src/backend/access/spgist/spgvacuum.c      | 12 +++-
 src/backend/catalog/index.c                | 73 +++++++++++++++-----
 src/backend/commands/indexcmds.c           | 52 +++------------
 src/backend/utils/misc/guc_parameters.dat  |  9 +++
 src/include/access/tableam.h               | 25 ++++---
 src/include/access/transam.h               | 15 +++++
 src/include/catalog/index.h                |  2 +-
 src/include/miscadmin.h                    |  1 +
 src/test/regress/expected/create_index.out |  3 +
 src/test/regress/sql/create_index.sql      |  4 ++
 12 files changed, 194 insertions(+), 83 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index b1c797517ee..382fe1723a5 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -401,12 +401,12 @@ live tuple.
 We mark the index open for inserts (but still not ready for reads) then
 we again wait for transactions which have the table open.  Then validate
 the index.  This searches for tuples missing from the index in auxiliary
-index, and inserts any missing ones if they are visible to reference snapshot.
+index, and inserts any missing ones if they are visible to a fresh snapshot.
 Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
 the value to be inserted is the one from the live tuple.
 
 Then we wait until every transaction that could have a snapshot older than
-the second reference snapshot is finished.  This ensures that nobody is
+the latest used snapshot is finished.  This ensures that nobody is
 alive any longer who could need to see any tuples that might be missing
 from the index, as well as ensuring that no one can see any inconsistent
 rows in a broken HOT chain (the first condition is stronger than the
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 8cbc4855078..65731224111 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -53,6 +53,9 @@
 /* GUC: percentage of maintenance_work_mem for CIC validation tuplestore */
 int			debug_cic_validate_store_mem_pct = 10;
 
+/* GUC: refresh snapshot every N pages during CIC validation (0 = disable) */
+int			debug_cic_validate_snapshot_pages = 4096;
+
 static void reform_and_rewrite_tuple(HeapTuple tuple,
 									 Relation OldHeap, Relation NewHeap,
 									 Datum *values, bool *isnull, RewriteState rwstate);
@@ -1971,24 +1974,35 @@ heapam_index_validate_scan_read_stream_next(
 	return result;
 }
 
-static void
+static TransactionId
 heapam_index_validate_scan(Relation heapRelation,
 						   Relation indexRelation,
 						   IndexInfo *indexInfo,
-						   Snapshot snapshot,
 						   ValidateIndexState *state,
 						   ValidateIndexState *auxState)
 {
+	TransactionId limitXmin;
+
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Snapshot		snapshot;
 	TupleTableSlot  *slot;
 	EState			*estate;
 	ExprContext		*econtext;
 	BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	int64			num_to_check;
+	int64			page_read_counter = 1; /* set to 1 to skip snapshot reset at start */
 	Tuplestorestate *tuples_for_check;
+
+	/*
+	 * Under REPEATABLE READ or SERIALIZABLE (possible via
+	 * default_transaction_isolation), GetLatestSnapshot() returns the
+	 * transaction-level snapshot and xmin stays pinned.  Periodic snapshot
+	 * refresh is pointless in that case, so skip it.
+	 */
+	bool		reset_snapshot = XactIsoLevel <= XACT_READ_COMMITTED;
 	ValidateIndexScanState callback_private_data;
 
 	Buffer buf;
@@ -1998,6 +2012,8 @@ heapam_index_validate_scan(Relation heapRelation,
 	/* Use a percentage of maintenance_work_mem for tuple store. */
 	int		store_work_mem_part = maintenance_work_mem * debug_cic_validate_store_mem_pct / 100;
 
+	PushActiveSnapshot(GetTransactionSnapshot());
+
 	/*
 	 * Encode TIDs as int8 values for the sort, rather than directly sorting
 	 * item pointers.  This can be significantly faster, primarily because TID
@@ -2006,6 +2022,12 @@ heapam_index_validate_scan(Relation heapRelation,
 	 */
 	tuples_for_check = tuplestore_begin_datum(INT8OID, false, false, store_work_mem_part);
 
+	PopActiveSnapshot();
+	InvalidateCatalogSnapshot();
+
+	Assert(!reset_snapshot || !HaveRegisteredOrActiveSnapshot());
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+
 	/*
 	 * sanity checks
 	 */
@@ -2021,6 +2043,29 @@ heapam_index_validate_scan(Relation heapRelation,
 
 	state->tuplesort = auxState->tuplesort = NULL;
 
+	/*
+	 * Now take the first snapshot that will be used to filter candidate
+	 * tuples. We are going to replace it by newer snapshot every so often
+	 * to propagate horizon.
+	 *
+	 * Beware!  There might still be snapshots in use that treat some transaction
+	 * as in-progress that our temporary snapshot treats as committed.
+	 *
+	 * If such a recently-committed transaction deleted tuples in the table,
+	 * we will not include them in the index; yet those transactions which
+	 * see the deleting one as still-in-progress will expect such tuples to
+	 * be there once we mark the index as valid.
+	 *
+	 * We solve this by waiting for all endangered transactions to exit before
+	 * we mark the index as valid, for that reason limitXmin is supported.
+	 *
+	 * We also set ActiveSnapshot to this snap, since functions in indexes may
+	 * need a snapshot.
+	 */
+	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	PushActiveSnapshot(snapshot);
+	limitXmin = snapshot->xmin;
+
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
@@ -2054,6 +2099,7 @@ heapam_index_validate_scan(Relation heapRelation,
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		block_number = BufferGetBlockNumber(buf);
+		page_read_counter++;
 
 		i = 0;
 		while ((off = tuples[i]) != InvalidOffsetNumber)
@@ -2124,6 +2170,21 @@ heapam_index_validate_scan(Relation heapRelation,
 		}
 
 		ReleaseBuffer(buf);
+		if (reset_snapshot &&
+			debug_cic_validate_snapshot_pages > 0 &&
+			page_read_counter % debug_cic_validate_snapshot_pages == 0)
+		{
+			PopActiveSnapshot();
+			UnregisterSnapshot(snapshot);
+			/* to make sure we propagate xmin */
+			InvalidateCatalogSnapshot();
+			Assert(!TransactionIdIsValid(MyProc->xmin));
+
+			snapshot = RegisterSnapshot(GetLatestSnapshot());
+			PushActiveSnapshot(snapshot);
+			/* Advance limitXmin so we wait for all snapshots seen so far */
+			limitXmin = TransactionIdNewer(limitXmin, snapshot->xmin);
+		}
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -2133,11 +2194,23 @@ heapam_index_validate_scan(Relation heapRelation,
 	read_stream_end(read_stream);
 	tuplestore_end(tuples_for_check);
 
+	/*
+	 * Drop the latest snapshot.  We must do this before waiting out other
+	 * snapshot holders, else we will deadlock against other processes also
+	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
+	 * they must wait for.
+	 */
+	PopActiveSnapshot();
+	UnregisterSnapshot(snapshot);
+	InvalidateCatalogSnapshot();
+	Assert(!reset_snapshot || MyProc->xmin == InvalidTransactionId);
 	FreeAccessStrategy(bstrategy);
 
 	/* These may have been pointing to the now-gone estate */
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
+
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index c461f8dc02d..ef192fb99c2 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -191,14 +191,16 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 			 * Add target TID to pending list if the redirection could have
 			 * happened since VACUUM started.  (If xid is invalid, assume it
 			 * must have happened before VACUUM started, since REINDEX
-			 * CONCURRENTLY locks out VACUUM.)
+			 * CONCURRENTLY locks out VACUUM, if myXmin is invalid it is
+			 * validation scan.)
 			 *
 			 * Note: we could make a tighter test by seeing if the xid is
 			 * "running" according to the active snapshot; but snapmgr.c
 			 * doesn't currently export a suitable API, and it's not entirely
 			 * clear that a tighter test is worth the cycles anyway.
 			 */
-			if (TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
+			if (!TransactionIdIsValid(bds->myXmin) ||
+					TransactionIdFollowsOrEquals(dt->xid, bds->myXmin))
 				spgAddPendingTID(bds, &dt->pointer);
 		}
 		else
@@ -808,7 +810,6 @@ spgvacuumscan(spgBulkDeleteState *bds)
 	/* Finish setting up spgBulkDeleteState */
 	initSpGistState(&bds->spgstate, index);
 	bds->pendingList = NULL;
-	bds->myXmin = GetActiveSnapshot()->xmin;
 	bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
 
 	/*
@@ -959,6 +960,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	bds.stats = stats;
 	bds.callback = callback;
 	bds.callback_state = callback_state;
+	if (info->validate_index)
+		bds.myXmin = InvalidTransactionId;
+	else
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 	spgvacuumscan(&bds);
 
@@ -999,6 +1004,7 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		bds.stats = stats;
 		bds.callback = dummy_callback;
 		bds.callback_state = NULL;
+		bds.myXmin = GetActiveSnapshot()->xmin;
 
 		spgvacuumscan(&bds);
 	}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4edf68aced2..49adcb152cf 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -69,6 +69,7 @@
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/proc.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -3538,8 +3539,9 @@ IndexCheckExclusion(Relation heapRelation,
  * insert their new tuples into it. At the same moment we clear "indisready" for
  * auxiliary index, since it is no more required to be updated.
  *
- * We then take a new reference snapshot, any tuples that are valid according
- * to this snap, but are not in the index, must be added to the index.
+ * We then take a new snapshot, any tuples that are valid according
+ * to this snap, but are not in the index, must be added to the index. In
+ * order to propagate xmin we reset that snapshot every so often.
  * (Any tuples committed live after the snap will be inserted into the
  * index by their originating transaction.  Any tuples committed dead before
  * the snap need not be indexed, because we will wait out all transactions
@@ -3552,7 +3554,7 @@ IndexCheckExclusion(Relation heapRelation,
  * TIDs of both auxiliary and target indexes, and doing a "merge join" against
  * the TID lists to see which tuples from auxiliary index are missing from the
  * target index.  Thus we will ensure that all tuples valid according to the
- * reference snapshot are in the index. Notice we need to do bulkdelete in the
+ * latest snapshot are in the index. Notice we need to do bulkdelete in the
  * particular order: auxiliary first, target last.
  *
  * Building a unique index this way is tricky: we might try to insert a
@@ -3565,21 +3567,24 @@ IndexCheckExclusion(Relation heapRelation,
  * before it declares a uniqueness error.
  *
  * After completing validate_index(), we wait until all transactions that
- * were alive at the time of the reference snapshot are gone; this is
- * necessary to be sure there are none left with a transaction snapshot
- * older than the reference (and hence possibly able to see tuples we did
- * not index).  Then we mark the index "indisvalid" and commit.  Subsequent
- * transactions will be able to use it for queries.
+ * were alive at the time of the latest snapshot used during validation are
+ * gone; this is necessary to be sure there are none left with a transaction
+ * snapshot older than that (and hence possibly able to see tuples we did
+ * not index).  The snapshot is periodically refreshed during the heap scan
+ * to propagate the xmin horizon, so limitXmin tracks the most recent one.
+ * Then we mark the index "indisvalid" and commit.  Subsequent transactions
+ * will be able to use it for queries.
  *
  * Also, some actions to concurrent drop the auxiliary index are performed.
  */
-void
-validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
+TransactionId
+validate_index(Oid heapId, Oid indexId, Oid auxIndexId)
 {
 	Relation	heapRelation,
 				indexRelation,
 				auxIndexRelation;
 	IndexInfo  *indexInfo;
+	TransactionId limitXmin;
 	IndexVacuumInfo ivinfo, auxivinfo;
 	ValidateIndexState state, auxState;
 	Oid			save_userid;
@@ -3592,6 +3597,16 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	int			main_work_mem_part = (int)((int64) maintenance_work_mem * 8 / 10);
 	int			aux_work_mem_part = maintenance_work_mem / 10;
 
+	/*
+	 * Under REPEATABLE READ or SERIALIZABLE (possible via
+	 * default_transaction_isolation), GetLatestSnapshot() returns the
+	 * transaction-level snapshot and xmin stays pinned.  Periodic snapshot
+	 * refresh is pointless in that case, so skip it.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	bool		reset_snapshot = XactIsoLevel <= XACT_READ_COMMITTED;
+#endif
+
 	{
 		const int	progress_index[] = {
 			PROGRESS_CREATEIDX_PHASE,
@@ -3629,8 +3644,12 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	 * Fetch info needed for index_insert.  (You might think this should be
 	 * passed in from DefineIndex, but its copy is long gone due to having
 	 * been built in a previous transaction.)
+	 *
+	 * We might need snapshot for index expressions or predicates.
 	 */
+	PushActiveSnapshot(GetTransactionSnapshot());
 	indexInfo = BuildIndexInfo(indexRelation);
+	PopActiveSnapshot();
 
 	/* mark build is concurrent just for consistency */
 	indexInfo->ii_Concurrent = true;
@@ -3666,6 +3685,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 										   NULL, TUPLESORT_NONE);
 	auxState.htups = auxState.itups = auxState.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	(void) index_bulk_delete(&auxivinfo, NULL,
 							 validate_index_callback, &auxState);
 	/* If aux index is empty, merge may be skipped */
@@ -3685,7 +3707,13 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		index_close(indexRelation, NoLock);
 		table_close(heapRelation, NoLock);
 
-		return;
+		PushActiveSnapshot(GetTransactionSnapshot());
+		limitXmin = GetActiveSnapshot()->xmin;
+		PopActiveSnapshot();
+		InvalidateCatalogSnapshot();
+
+		Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+		return limitXmin;
 	}
 
 	state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator,
@@ -3694,6 +3722,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 											NULL, TUPLESORT_NONE);
 	state.htups = state.itups = state.tups_inserted = 0;
 
+	/* tuplesort_begin_datum may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	/* ambulkdelete updates progress metrics */
 	(void) index_bulk_delete(&ivinfo, NULL,
 							 validate_index_callback, &state);
@@ -3713,19 +3744,24 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 		pgstat_progress_update_multi_param(3, progress_index, progress_vals);
 	}
 	tuplesort_performsort(state.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+
 	tuplesort_performsort(auxState.tuplesort);
+	/* tuplesort_performsort may require catalog snapshot */
+	InvalidateCatalogSnapshot();
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
 
 	/*
 	 * Now merge both indexes
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
 								 PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXMERGE);
-	table_index_validate_scan(heapRelation,
-							  indexRelation,
-							  indexInfo,
-							  snapshot,
-							  &state,
-							  &auxState);
+	limitXmin = table_index_validate_scan(heapRelation,
+										  indexRelation,
+										  indexInfo,
+										  &state,
+										  &auxState);
 
 	/* Tuple sort closed by table_index_validate_scan */
 	Assert(state.tuplesort == NULL && auxState.tuplesort == NULL);
@@ -3748,6 +3784,9 @@ validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot)
 	index_close(auxIndexRelation, NoLock);
 	index_close(indexRelation, NoLock);
 	table_close(heapRelation, NoLock);
+
+	Assert(!reset_snapshot || !TransactionIdIsValid(MyProc->xmin));
+	return limitXmin;
 }
 
 /*
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 46c4ccc6789..a700068f8a2 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -596,7 +596,6 @@ DefineIndex(ParseState *pstate,
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -1816,32 +1815,11 @@ DefineIndex(ParseState *pstate,
 	/* Tell concurrent index builds to ignore us, if index qualifies */
 	if (safe_index)
 		set_indexsafe_procflags();
-
-	/*
-	 * Now take the "reference snapshot" that will be used by validate_index()
-	 * to filter candidate tuples.  Beware!  There might still be snapshots in
-	 * use that treat some transaction as in-progress that our reference
-	 * snapshot treats as committed.  If such a recently-committed transaction
-	 * deleted tuples in the table, we will not include them in the index; yet
-	 * those transactions which see the deleting one as still-in-progress will
-	 * expect such tuples to be there once we mark the index as valid.
-	 *
-	 * We solve this by waiting for all endangered transactions to exit before
-	 * we mark the index as valid.
-	 *
-	 * We also set ActiveSnapshot to this snap, since functions in indexes may
-	 * need a snapshot.
-	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
-	PushActiveSnapshot(snapshot);
 	/*
 	 * Merge content of auxiliary and target indexes - insert any missing index entries.
 	 */
-	validate_index(tableId, indexRelationId, auxIndexRelationId, snapshot);
-	limitXmin = snapshot->xmin;
+	limitXmin = validate_index(tableId, indexRelationId, auxIndexRelationId);
 
-	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
 	 * are holding back our process's advertised xmin; in particular, if
@@ -1863,8 +1841,8 @@ DefineIndex(ParseState *pstate,
 	/*
 	 * The index is now valid in the sense that it contains all currently
 	 * interesting tuples.  But since it might not contain tuples deleted just
-	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.
+	 * before the last snapshot during validating was taken, we have to wait
+	 * out any transactions that might have older snapshots.
 	 */
 	INJECTION_POINT("define-index-before-set-valid", NULL);
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
@@ -4433,7 +4411,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
-		Snapshot	snapshot;
 
 		StartTransactionCommand();
 
@@ -4448,13 +4425,6 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		if (newidx->safe)
 			set_indexsafe_procflags();
 
-		/*
-		 * Take the "reference snapshot" that will be used by validate_index()
-		 * to filter candidate tuples.
-		 */
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
-		PushActiveSnapshot(snapshot);
-
 		/*
 		 * Update progress for the index to build, with the correct parent
 		 * table involved.
@@ -4466,16 +4436,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId, snapshot);
-
-		/*
-		 * We can now do away with our active snapshot, we still need to save
-		 * the xmin limit to wait for older snapshots.
-		 */
-		limitXmin = snapshot->xmin;
-
-		PopActiveSnapshot();
-		UnregisterSnapshot(snapshot);
+		limitXmin = validate_index(newidx->tableId, newidx->indexId, newidx->auxIndexId);
 
 		/*
 		 * To ensure no deadlocks, we must commit and start yet another
@@ -4485,10 +4446,13 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		CommitTransactionCommand();
 		StartTransactionCommand();
 
+		/* We should now definitely not be advertising any xmin. */
+		Assert(!TransactionIdIsValid(MyProc->xmin));
+
 		/*
 		 * The index is now valid in the sense that it contains all currently
 		 * interesting tuples.  But since it might not contain tuples deleted
-		 * just before the reference snap was taken, we have to wait out any
+		 * just before the latest snap was taken, we have to wait out any
 		 * transactions that might have older snapshots.
 		 *
 		 * Because we don't take a snapshot or Xid in this transaction,
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3477866d729..6b5b130f2aa 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -640,6 +640,15 @@
   boot_val => 'DEFAULT_ASSERT_ENABLED',
 },
 
+{ name => 'debug_cic_validate_snapshot_pages', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Refresh snapshot every N pages during CIC validation (0 to disable).',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'debug_cic_validate_snapshot_pages',
+  boot_val => '4096',
+  min => '0',
+  max => '1000000',
+},
+
 { name => 'debug_cic_validate_store_mem_pct', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
   short_desc => 'Percentage of maintenance_work_mem used for CIC validation tuplestore.',
   flags => 'GUC_NOT_IN_SAMPLE',
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index da3598663bc..9298f68d18a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -739,12 +739,11 @@ typedef struct TableAmRoutine
 										   TableScanDesc scan);
 
 	/* see table_index_validate_scan for reference about parameters */
-	void		(*index_validate_scan) (Relation table_rel,
-										Relation index_rel,
-										IndexInfo *index_info,
-										Snapshot snapshot,
-										ValidateIndexState *state,
-										ValidateIndexState *aux_state);
+	TransactionId		(*index_validate_scan) (Relation table_rel,
+												Relation index_rel,
+												IndexInfo *index_info,
+												ValidateIndexState *state,
+												ValidateIndexState *aux_state);
 
 
 	/* ------------------------------------------------------------------------
@@ -1911,20 +1910,18 @@ table_index_build_range_scan(Relation table_rel,
  * Note: it is responsibility of that function to close sortstates in
  * both `state` and `auxstate`.
  */
-static inline void
+static inline TransactionId
 table_index_validate_scan(Relation table_rel,
 						  Relation index_rel,
 						  IndexInfo *index_info,
-						  Snapshot snapshot,
 						  ValidateIndexState *state,
 						  ValidateIndexState *auxstate)
 {
-	table_rel->rd_tableam->index_validate_scan(table_rel,
-											   index_rel,
-											   index_info,
-											   snapshot,
-											   state,
-											   auxstate);
+	return table_rel->rd_tableam->index_validate_scan(table_rel,
+													  index_rel,
+													  index_info,
+													  state,
+													  auxstate);
 }
 
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 55a4ab26b34..923aadbab43 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -415,6 +415,21 @@ NormalTransactionIdOlder(TransactionId a, TransactionId b)
 	return b;
 }
 
+/* return the newer of the two IDs */
+static inline TransactionId
+TransactionIdNewer(TransactionId a, TransactionId b)
+{
+	if (!TransactionIdIsValid(a))
+		return b;
+
+	if (!TransactionIdIsValid(b))
+		return a;
+
+	if (TransactionIdFollows(a, b))
+		return a;
+	return b;
+}
+
 /* return the newer of the two IDs */
 static inline FullTransactionId
 FullTransactionIdNewer(FullTransactionId a, FullTransactionId b)
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 3239e5c716f..def7352a859 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -159,7 +159,7 @@ extern void index_build(Relation heapRelation,
 						bool parallel,
 						bool progress);
 
-extern void validate_index(Oid heapId, Oid indexId, Oid auxIndexId, Snapshot snapshot);
+extern TransactionId validate_index(Oid heapId, Oid indexId, Oid auxIndexId);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 8c2b3a9c5e7..2ad3deff9cd 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -272,6 +272,7 @@ extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
 extern PGDLLIMPORT int debug_cic_validate_store_mem_pct;
+extern PGDLLIMPORT int debug_cic_validate_snapshot_pages;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2d6abb15a89..758c5884ff5 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3382,6 +3382,9 @@ DROP INDEX aux_index_ind6;
 --------+---------+-----------+----------+---------
  c1     | integer |           |          | 
 
+SET default_transaction_isolation = 'repeatable read';
+CREATE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+SET default_transaction_isolation = 'read committed';
 DROP TABLE aux_index_tab5;
 -- Check handling of indexes with expressions and predicates.  The
 -- definitions of the rebuilt indexes should match the original
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index fd96d80abbc..65dd58b947d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1400,6 +1400,10 @@ DROP INDEX aux_index_ind6;
 -- Make sure auxiliary index dropped too
 \d aux_index_tab5
 
+SET default_transaction_isolation = 'repeatable read';
+CREATE INDEX CONCURRENTLY aux_index_ind6 ON aux_index_tab5 (c1);
+SET default_transaction_isolation = 'read committed';
+
 DROP TABLE aux_index_tab5;
 
 -- Check handling of indexes with expressions and predicates.  The
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
@ 2026-04-13 01:05  Josh Kupershmidt <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  1 sibling, 0 replies; 10+ messages in thread

From: Josh Kupershmidt @ 2026-04-13 01:05 UTC (permalink / raw)
  To: Mihail Nikalayeu <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; Antonin Houska <[email protected]>; Hannu Krosing <[email protected]>; Sergey Sargsyan <[email protected]>; Álvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Melanie Plageman <[email protected]>

On Tue, Apr 7, 2026 at 7:20 PM Mihail Nikalayeu <[email protected]>
wrote:

> Hello, Josh!
>
> Your review looks a bit LLM-generated, but anyway - thanks for review! :)
> Especially because at least one point seems to be valid.
>
> > We're leaving behind two invalid indexes now that the user has to figure
> > out how to drop in case of an error - that seems like it could be
> confusing for the user.
> > Could we have some better way (error handler, background worker) try to
> perform this cleanup automatically?
> > If not, we should at least tell the user clearly in the error message
> that both
> > invalid indexes are left behind (i.e. "idx" and "idx_ccaux" in the
> example)
>
> Commit 0005 adds automatic dropping of auxiliary indexes when the
> original index is reindexed or dropped. Also, documentation reflects
> the ccaux index (similar to ccnew).
>

Well, we auto-drop the aux index if the user figures out that they should
drop the main index first.


> > Docs are inconsistent or confusing about whether there's one or two
> indexes left behind in case of error
> > - e.g. "command will fail but leave behind *an* invalid index and its
> associated auxiliary index"
> > somewhat implying there is only one invalid index, and somehow the
> auxiliary index is valid?
>
> Auxiliary index is never marked as valid; I'm not sure we need to
> highlight it here. Or do you have an idea how to rephrase?
>

For example in this doc change hunk:

@@ -664,12 +665,19 @@ postgres=# \d tab
  col    | integer |           |          |
 Indexes:
     "idx" btree (col) INVALID
+    "idx_ccaux" stir (col) INVALID
 </programlisting>

-    The recommended recovery
-    method in such cases is to drop the index and try again to perform
-    <command>CREATE INDEX CONCURRENTLY</command>.  (Another possibility is
-    to rebuild the index with <command>REINDEX INDEX
CONCURRENTLY</command>).
+    The recommended recovery method in such cases is to drop the index with
+    <command>DROP INDEX</command>. The auxiliary index (suffixed with
+    <literal>_ccaux</literal>) will be automatically dropped when the main
+    index is dropped. After dropping the indexes, you can try again to
perform
+    <command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
to
+    rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>,
+    which will also handle cleanup of any invalid auxiliary indexes.)
+    If the only invalid index is one suffixed <literal>_ccaux</literal>,
+    the recommended recovery method is just <literal>DROP INDEX</literal>
+    for that index.
    </para>

The output we're showing the user from psql is two INVALID indexes, and
we're keeping the original doc suggestion on the first line that "The
recommended recovery method in such cases is to drop the index with DROP
INDEX". The next sentence clarifies a bit that there's an auxiliary index
that "will be automatically dropped". But now it's on the user to figure
out which index is which, and drop the right one.


> > Similarly, when the doc mentions e.g. "drop the index" - it's not
> necessarily clear which index
> > we're talking about when there are two invalid indexes left behind that
> the user will see with `\d`
>
> In one commit it says: "method in such cases is to drop these indexes
> and try again to perform".
> After 0005 "The auxiliary index (suffixed with
> <literal>_ccaux</literal>) will be automatically dropped when the main
> index is dropped".
> It seems clear to me, but feel free to provide your variant.
>
> >  * It would be nice to guard against users trying arbitrary CREATE INDEX
> ... USING stir(...) with a clear error
>
> It will fail with "Building STIR indexes is not supported".
>

Sorry, you are right, this is handled with a good error.


>
> > One of the testcases (line 2478 of patch 0004) does `DELETE FROM
> concur_reindex_tab4 WHERE c1 = 1;`
> > but the table `concur_reindex_tab4` looks like it has been dropped a few
> lines above that?
>
> Hm, yep, I'll fix it.
>
> > The StirPageGetFreeSpace macro from patch 0002 reads
> `StirPageGetMaxOffset(page)`
> > which seems like it could cause an unsafe read of opaque->maxoff if used
> on the metapage
>
> But it was never used for the metapage.
>

Yes, but I think it'd be better not to leave a possible foot-gun around for
the next developer. Even just adding an AssertMacro like:

#define StirPageGetMaxOffset(page) (AssertMacro(!StirPageIsMeta(page)),
StirPageGetOpaque(page)->maxoff)

There are some other asserts for some of the trickier bookkeeping that
happens in this patch that I think would help check the code, and make it
easier to understand as well. E.g. adding an assertion check at the end of
StirPageAddItem(), and inside stirbulkdelete() (I tried calling it around
L499 of stir.c, just before 'while (itup < itupEnd)').

/*
 * Validate that maxoff and pd_lower are consistent on a STIR data page.
 *
 * On a freshly initialized empty page, pd_lower is SizeOfPageHeaderData
 * (set by PageInit).  After the first insert, pd_lower is computed from
 * PageGetContents which uses MAXALIGN(SizeOfPageHeaderData).
 */
static inline void
StirPageValidate(Page page)
{
Assert(!StirPageIsMeta(page));
Assert(StirPageGetOpaque(page)->maxoff == 0
  ? ((PageHeader) page)->pd_lower == SizeOfPageHeaderData
  : ((PageHeader) page)->pd_lower ==
    MAXALIGN(SizeOfPageHeaderData) +
    StirPageGetOpaque(page)->maxoff * sizeof(StirTuple));
}


>
> > A comment explains "No predicate evaluation is needed here" , i.e. we
> are skipping predicate
> > evaluation in the validation scan step, assuming that the
> > auxiliary index contains only qualifying TIDs. Is this really
> bulletproof for e.g. partial indexes which may
> > no longer satisfy the predicate at the time of the validation scan due
> to conflicting HOT updates?
>
> Conflicting HOT updates are not possible because the catalog contains
> the new index definition from the start of the process.
> Or do you mean a different scenario?
>

Sorry for the false alarm, I believe you are right - I had to double
check RelationGetIndexAttrBitmap(), but I believe this is safe based on
the hotblockingattrs bitmap.

Overall, this is a nice improvement for CIC/RC that I think should help
particularly on large, busy systems.

Thanks,
Josh


^ permalink  raw  reply  [nested|flat] 10+ messages in thread

end of thread, other threads:[~2026-04-13 01:05 UTC | newest]

Thread overview: 10+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24 19:48 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements Mihail Nikalayeu <[email protected]>
2026-03-09 00:09 ` Mihail Nikalayeu <[email protected]>
2026-03-23 22:08   ` Mihail Nikalayeu <[email protected]>
2026-03-28 19:17     ` Mihail Nikalayeu <[email protected]>
2026-03-31 22:11       ` Mihail Nikalayeu <[email protected]>
2026-04-06 18:21         ` Mihail Nikalayeu <[email protected]>
2026-04-07 01:42           ` Josh Kupershmidt <[email protected]>
2026-04-07 23:19             ` Mihail Nikalayeu <[email protected]>
2026-04-11 16:56               ` Mihail Nikalayeu <[email protected]>
2026-04-13 01:05               ` Josh Kupershmidt <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox